Repository permissions

Sourcegraph can be configured to enforce repository permissions from code hosts. The currently supported methods are:

GitHub / GitHub Enterprise
GitLab
Bitbucket Server / Bitbucket Data Center
Unified SSO
Explicit permissions API

For most supported repository permissions enforcement methods, Sourcegraph syncs permissions in the background.

If the Sourcegraph instance is configured to sync repositories from multiple code hosts, setting up permissions for each code host will make repository permissions apply holistically on Sourcegraph, so long as users log in from each code host - learn more.

GitHub

Prerequisite: Add GitHub as an authentication provider.

Then, add or edit a GitHub connection and include the authorization field:

{
  // The GitHub URL used to set up the GitHub authentication provider must match this URL.
  "url": "https://github.com",
  "token": "$PERSONAL_ACCESS_TOKEN",
  "authorization": {}
}

A token that has the prerequisite scopes and both read and write access to all relevant repositories is required in order to list collaborators for each repository to perform a complete sync.

Trigger permissions sync from GitHub webhooks

Sourcegraph 3.22+

Sourcegraph can improve how up to date synchronized permissions stay by initiating syncs when receiving webhooks from GitHub for events related to user and repo permissions - learn more about webhooks and permissions sync.

To set up webhooks, follow the guide in the GitHub Code Host Docs. These events will enqueue permissions syncs for the repositories or users mentioned, meaning things like publicising / privatising repos, or adding collaborators will be reflected in your Sourcegraph searches more quickly. For this to work the user must have logged in via the GitHub OAuth provider.

The events we consume are:

Teams and organizations permissions caching

Experimental Sourcegraph 3.31+

For GitHub providers, Sourcegraph can leverage caching of GitHub team and organization permissions - learn more about permissions caching.

This caching behaviour can be enabled via the authorization.groupsCacheTTL field:

{
   "url": "https://github.example.com",
   "token": "$PERSONAL_ACCESS_TOKEN",
   "authorization": {
     "groupsCacheTTL": 72, // hours
   }
}

In the corresponding authorization provider in site configuration, the allowGroupsPermissionsSync field must be set as well for the correct auth scopes to be requested from users:

{
  // ...
  "auth.providers": [
    {
      "type": "github",
      "url": "https://github.example.com",
      "allowGroupsPermissionsSync": true,
    }
  ]
}

A token that has the prerequisite scopes and both read and write access to all relevant repositories and organizations is required to fetch repository and team permissions and team memberships is required and cache them across syncs. Read-only access will not work with cached permissions sync, but will work with regular GitHub permissions sync (with some drawbacks).

When enabling this feature, we currently recommend a default groupsCacheTTL of 72 (hours, or 3 days). A lower value can be set if your teams and organizations change frequently, though the chosen value must be at least several hours for the cache to be leveraged in the event of being rate-limited (which takes an hour to recover from).

Cache invaldiation happens automatically on certain webhook events, so it is recommended that to configure webhook support when using cached permissions sync. Caches can also be manually invalidated if necessary.

GitLab

GitLab permissions can be configured in three ways:

Set up GitLab as an OAuth sign-on provider for Sourcegraph (recommended)
Use a GitLab administrator (sudo-level) personal access token in conjunction with another SSO provider (recommended only if the first option is not possible)
Assume username equivalency between Sourcegraph and GitLab (warning: this is generally unsafe and should only be used if you are using strictly http-header authentication).

OAuth application

Prerequisite: Add GitLab as an authentication provider.

Then, add or edit a GitLab connection and include the authorization field:

{
  "url": "https://gitlab.com",
  "token": "$PERSONAL_ACCESS_TOKEN",
  "authorization": {
    "identityProvider": {
      "type": "oauth"
    }
  }
}

Administrator (sudo-level) access token

This method requires administrator access to GitLab so that Sourcegraph can access the admin GitLab Users API endpoint. For each GitLab user, this endpoint provides the user ID that comes from the authentication provider, so Sourcegraph can associate a user in its system to a user in GitLab.

Prerequisite: Add the SAML or OpenID Connect authentication provider you use to sign into GitLab.

Then, add or edit a GitLab connection using an administrator (sudo-level) personal access token, and include the authorization field:

{
  "url": "https://gitlab.com",
  "token": "$PERSONAL_ACCESS_TOKEN",
  "authorization": {
    "identityProvider": {
      "type": "external",
      "authProviderID": "$AUTH_PROVIDER_ID",
      "authProviderType": "$AUTH_PROVIDER_TYPE",
      "gitlabProvider": "$AUTH_PROVIDER_GITLAB_ID"
    }
  }
}

$AUTH_PROVIDER_ID and $AUTH_PROVIDER_TYPE identify the authentication provider to use and should match the fields specified in the authentication provider config (auth.providers). The authProviderID can be found in the configID field of the auth provider config.

$AUTH_PROVIDER_GITLAB_ID should match the identities.provider returned by the admin GitLab Users API endpoint.

Username

Prerequisite: Ensure that http-header is the only authentication provider type configured for Sourcegraph. If this is not the case, then it will be possible for users to escalate privileges, because Sourcegraph usernames are mutable.

Add or edit a GitLab connection and include the authorization field:

{
  "url": "https://gitlab.com",
  "token": "$PERSONAL_ACCESS_TOKEN",
  "authorization": {
    "identityProvider": {
      "type": "username"
    }
  }
}

Bitbucket Server / Bitbucket Data Center

Enforcing Bitbucket Server / Bitbucket Data Center permissions can be configured via the authorization setting in its configuration.

Prerequisites

You have the exact same user accounts, with matching usernames, in Sourcegraph and Bitbucket Server / Bitbucket Data Center. This can be accomplished by configuring an external authentication provider that mirrors user accounts from a central directory like LDAP or Active Directory. The same should be done on Bitbucket Server / Bitbucket Data Center with external user directories.
Ensure you have set auth.enableUsernameChanges to false in the site config to prevent users from changing their usernames and escalating their privileges.

Setup

This section walks you through the process of setting up an Application Link between Sourcegraph and Bitbucket Server / Bitbucket Data Center and configuring the Sourcegraph Bitbucket Server / Bitbucket Data Center configuration with authorization settings. It assumes the above prerequisites are met.

As an admin user, go to the "Application Links" page. You can use the sidebar navigation in the admin dashboard, or go directly to https://bitbucketserver.example.com/plugins/servlet/applinks/listApplicationLinks.

Write Sourcegraph's external URL in the text area (e.g. https://sourcegraph.example.com) and click Create new link. Click Continue even if Bitbucket Server / Bitbucket Data Center warns you about the given URL not responding.

Write Sourcegraph as the Application Name and select Generic Application as the Application Type. Leave everything else unset and click Continue.

Now click the edit button in the Sourcegraph Application Link that you just created and select the Incoming Authentication panel.

Generate a Consumer Key in your terminal with echo sourcegraph$(openssl rand -hex 16). Copy this command's output and paste it in the Consumer Key field. Write Sourcegraph in the Consumer Name field.

Generate an RSA key pair in your terminal with openssl genrsa -out sourcegraph.pem 4096 && openssl rsa -in sourcegraph.pem -pubout > sourcegraph.pub. Copy the contents of sourcegraph.pub and paste them in the Public Key field.

Scroll to the bottom and check the Allow 2-Legged OAuth checkbox, then write your admin account's username in the Execute as field and, lastly, check the Allow user impersonation through 2-Legged OAuth checkbox. Press Save.

Go to your Sourcegraph's Manage code hosts page (i.e. https://sourcegraph.example.com/site-admin/external-services) and either edit or create a new Bitbucket Server / Bitbucket Data Center connection. Add the following settings:

{
	// Other config goes here
	"authorization": {
		"identityProvider": {
			"type": "username"
		},
		"oauth": {
			"consumerKey": "<KEY GOES HERE>",
			"signingKey": "<KEY GOES HERE>"
		}
	}
}

Copy the Consumer Key you generated before to the oauth.consumerKey field and the output of the command base64 sourcegraph.pem | tr -d '\n' to the oauth.signingKey field. Finally, save the configuration. You're done!

Fast permission sync with Bitbucket Server plugin

By installing the Bitbucket Server plugin, you can make use of the fast permission sync feature that allows using Bitbucket Server / Bitbucket Data Center permissions on larger instances.

Background permissions syncing

Sourcegraph 3.17+

Sourcegraph syncs permissions in the background by default to better handle repository permissions at scale for GitHub, GitLab, and Bitbucket Server / Bitbucket Data Center code hosts. Rather than syncing a user's permissions when they log in and potentially blocking them from seeing search results, Sourcegraph syncs these permissions asynchronously in the background, opportunistically refreshing them in a timely manner.

Sourcegraph's background permissions syncing is a 2-way sync that combines data from both types of sync for each configured code host to populate the database tables Sourcegraph uses as its source-of-truth for what repositories a user has access to:

User-centric permissions syncs update the complete list of repositories a user has access to, from the user's view. This typically uses authentication associated with the user where available.
Repository-centric permissions syncs update the complete list of all users that have access to a repository, from the repository's view. This may require elevated permissions to request from a code host.

Both types of sync happen repeatedly and continuously based on a variety of events and criteria.

Background permissions syncing enables:

More predictable load on the code host API due to maintaining a schedule of permission updates, though this can mean it can take a long time for a sync to complete.
Permissions are quickly synced for new repositories and users added to the Sourcegraph instance.
Users who sign up on the Sourcegraph instance can immediately get search results from some repositories they have access to on the code host as we begin to incrementally sync their permissions.

Complete sync vs incremental sync

The two types of sync, user-centric and repository-centric, means that each user or repository can be in one of two states:

Complete sync means a user has completed user-centric permissions sync (or a repository has completed a repository-centric sync), which indicates the most accurate permissions from the code host has been persisted to Sourcegraph for the user (or vice versa for repositories).
Incremental sync means a user has not yet completed a recent user-centric permissions sync, but has been recently granted some permissions from a repository-centric sync (or vice versa for repositories).
- For example, if a user has not had a user-centric permissions sync, but has been granted permissions from one or more repository-centric syncs, the user will have only completed an incremental sync. In this state, a user might not have access to all repositories they should have access to, but will incrementally receive more access as repository-centric syncs complete.
- It is possible to be in an incremental sync state where a user or repository has effectively completed a complete sync, and all access rules are aligned with what is in the code host - for example, if a user completed a complete sync and a single repository is added, the user will be granted access to that repository through incremental sync, so the user will have full access to everything the user should have access to despite being in an incremental sync state.

The state of permissions is repeatedly and continuously updated in the background.

Checking permissions sync state

The state of an user or repository's permissions can be checked in the UI by:

For users: navigating to /users/$USER/settings/permissions
For repositories: navigating to /$CODEHOST/$REPO/-/settings/permissions

The GraphQL API can also be used:

query {
  user(username: "user") {
    permissionsInfo {
      syncedAt
      updatedAt
    }
  }
  repository(name: "repository") {
    permissionsInfo {
      syncedAt
      updatedAt
    }
  }
}

In the GraphQL API, syncedAt indicates the last complete sync and updatedAt indicates the last incremental sync. If syncedAt is more recent than updatedAt, the user or repository is in a state of complete sync - learn more.

Permissions sync scheduling

A variety of heuristics are used to determine when a user or a repository should be scheduled for a permissions sync (either user-centric or repo-centric respectively) to ensure the permissions data Sourcegraph has is up to date. Scheduling of syncs happens repeatedly and continuously in the background for both users and repositories.

For example, permissions syncs may be scheduled:

When a user or repository is created
When certain interactions happen, such as when a user logs in or a repository is visited
When a user's or repository's permissions are deemed stale (i.e. some amount of time has passed since the last complete sync for a user or repository)
When a relevant webhook is configured and received
When a manual sync is scheduled

When a sync is scheduled, it is added to a queue that is steadily processed to avoid overloading the code host - a sync might not happen immediately. Prioritization of permissions sync also happens to, for example, ensure users or repositories with no permissions get processed first.

There are variety of options in the site configuration to tune how the permissions sync requests are scheduled and processed:

{
  // Time interval (in seconds) of how often each component picks up authorization changes in external services.
  "permissions.syncScheduleInterval": 15,
  // Number of user permissions to schedule for syncing in single scheduler iteration.
  "permissions.syncOldestUsers": 10,
  // Number of repo permissions to schedule for syncing in single scheduler iteration.
  "permissions.syncOldestRepos": 10,
  // Don't sync a user's permissions if they have synced within the last n seconds.
  "permissions.syncUsersBackoffSeconds": 60,
  // Don't sync a repo's permissions if it has synced within the last n seconds.
  "permissions.syncReposBackoffSeconds": 60,
  // The maximum number of user-centric permissions syncing jobs that can be spawned concurrently.
  // Service restart is required to take effect for changes.
  "permissions.syncUsersMaxConcurrency": 1,
}

Manually scheduling a sync

Permissions syncs are typically scheduled automatically. However, a sync can be manually scheduled through the UI in by site admins:

For users: navigating to /users/$USER/settings/permissions and clicking "Schedule now"
For repositories: navigating to /$CODEHOST/$REPO/-/settings/permissions and clicking "Schedule now"

The GraphQL API can also be used to schedule a sync:

mutation {
  scheduleUserPermissionsSync(user: "userid") {
    alwaysNil
  }
  scheduleRepositoryPermissionsSync(repository: "repositoryid") {
    alwaysNil
  }
}

Permissions sync duration

When syncing permissions from code hosts with large numbers of users and repositories, it can take some time to complete mirroring repository permissions from a code host for every user and every repository, typically due to rate limits on a code host that limits how quickly Sourcegraph can query for repository permissions. This is generally not a problem for fresh installations, since admins should only make the instance available after it's ready, but for existing installations, active users may not see the repositories they expect in search results because the initial permissions syncing hasn't finished yet.

Since Sourcegraph syncs permissions in the background, while the initial sync for all repositories and users is happening, users will gradually see more and more search results from repositories they have access to.

To further mitigate long sync times and API request load, Sourcegraph can also leverage provider-specific optimizations.

Provider-specific optimizations

Each provider can implement optimizations to improve sync performance and up-to-dateness of permissions - please refer to the relevant provider documentation on this page for more details.

Triggering syncs with webhooks

Some permissions providers in Sourcegraph can leverage code host webhooks to help trigger a permissions sync on relevant events, which helps ensure permissions data in Sourcegraph is up to date.

To see if your provider supports triggering syncs with webhooks, please refer to the relevant provider documentation on this page. For example, the GitHub provider supports webhook events.

Permissions caching

Experimental Sourcegraph 3.31+

Some permissions providers in Sourcegraph can leverage caching mechanisms to reduce the number of API calls used when syncing permissions. This can significantly reduce the amount of time it takes to perform a full permissions sync due to reduced instances of being rate limited by the code host, and is useful for code hosts with very large numbers of users and repositories.

To see if your provider supports permissions caching, please refer to the relevant provider documentation on this page. For example, the GitHub provider supports teams and organizations permissions caching.

Note that this can mean that permissions can be out of date. To configure caching behaviour, please refer to the relevant provider documentation on this page. To force a bypass of caches during a sync, you can manually queue users or repositories for sync with the invalidateCaches options via the Sourcegraph GraphQL API:

mutation {
  scheduleUserPermissionsSync(user: "userid", options: {invalidateCaches: true}) {
    alwaysNil
  }
}

Pending permissions

Pending permissions are created and stored when the repo permissions fetched from the code host contain users which are not yet having accounts on Sourcegraph. This information is stored for the purpose of immediate repo access for such users after joining Sourcegraph. During the process of user creation, user_pending_permissions is queried and if there are any permissions for the user being created, then these permissions are moved to user_permissions table and this user is ready to go in no time. Without pending permissions, new users will have to wait for their permissions sync to complete.

As soon as a new user is created on Sourcegraph, pending permissions (repo_pending_permissions and user_pending_permissions) are used to populate "ordinary" permissions (repo_permissions and user_permissions tables), after which the user_pending_permissions is cleared (however, repo_pending_permissions is not for performance concerns and user IDs are monotonically increasing and would never repeat).

External code host user to Sourcegraph user mapping

The user_pending_permissions table has a bind_id column which is an ID of the user of the external code host, for example a username for Bitbucket Server, a GraphID for GitHub or a user ID for GitLab.

User pending permission is a composite entity comprising:

service_type (e.g. github, gitlab, bitbucketServer)
service_id (ID of the code host, e.g. https://github.com/, https://gitlab.com/)
permission (access level, e.g. "read")
object_type (type of what is enumerated in object_ids_ints column; for now it is repos)
bind_id

All of which are included as a unique constraint. This entity is addressed in user_ids_ints column of repo_pending_permissions table by id. Please see this godoc for more information.

Overall, one entry of user_pending_permissions table means that "There is a user with bind_id ID of this exact (service_id) external code host of this (service_type) type with such permissions for this (object_ids_ints) set of repos".

Repo pending permissions

repo_pending_permissions table maps user_pending_permissions entities to repo ID along with the permission type (currently only read is supported). Each row of the table maps a repo ID to an array of user_pending_permissions entries. It is designed as an inverted user_pending_permissions for more performant CRUD operations (see the DB migration description in this commit).

Explicit permissions API

Sourcegraph exposes a set of GraphQL APIs to explicitly set repository permissions as an alternative to the code-host-specific repository permissions sync mechanisms.

To enable the permissions API, add the following to the site configuration:

"permissions.userMapping": {
    "enabled": true,
    "bindID": "email"
}

The bindID value specifies how to uniquely identify users when setting permissions:

email: You can set permissions for users by specifying their email addresses (which must be verified emails associated with their Sourcegraph user account).
username: You can set permissions for users by specifying their Sourcegraph usernames.

If the permissions API is enabled, all other repository permissions mechanisms are disabled.

After you enable the permissions API, you must set permissions to allow users to view repositories (site admins bypass all permissions checks and can always view all repositories).

Setting a repository as unrestricted

Sometimes it can be useful to mark a repository as unrestricted, meaning that it is available to all Sourcegraph users. This can be done with the setRepositoryPermissionsUnrestricted mutation. Marking a repository as unrestricted will disregard any previously set explicit or synced permissions. Setting unrestricted back to false will restore the previous behaviour.

For example:

mutation {
  setRepositoryPermissionsUnrestricted(repositories: ["<repo ID>", "<repo ID>", "<repo ID>"], unrestricted: true)
}

Setting repository permissions for users

Setting the permissions for a repository can be accomplished with 2 GraphQL API calls.

First, obtain the ID of the repository from its name:

query {
  repository(name: "github.com/owner/repo") {
    id
  }
}

Next, set the list of users allowed to view the repository:

mutation {
  setRepositoryPermissionsForUsers(
    repository: "<repo ID>",
    userPermissions: [
      { bindID: "user@example.com" }
    ]) {
    alwaysNil
  }
}

Now, only the users specified in the userPermissions parameter will be allowed to view the repository. Sourcegraph automatically enforces these permissions for all operations. (Site admins bypass all permissions checks and can always view all repositories.)

You can call setRepositoryPermissionsForUsers repeatedly to set permissions for each repository, and whenever you want to change the list of authorized users.

Listing a user's authorized repositories

You may query the set of repositories visible to a particular user with the authorizedUserRepositories GraphQL API mutation, which accepts a username or email parameter to specify the user:

query {
  authorizedUserRepositories(email: "user@example.com", first: 100) {
    nodes {
      name
    }
    totalCount
  }
}

Bitbucket project based permissions

Sourcegraph supports setting project wide permissions for Bitbucket code host connections.

Setting repository permissions for a project

This API lets site admins set the same permissions for all the users across all the repositories under the project.

First, obtain the project key from the Bitbucket code host.

Next, get the code host ID. Visit the Manage code hosts page from the site admin panel in the Sourcegraph instance and click on "Edit" for the code host under which the above project is located. Copy the ID from the URL. For example in the URL https://sourcegraph.example.com/site-admin/external-services/RXh0ZXJuYWxTZXJ2aWNlOjMwNjczNg==, the code host ID is RXh0ZXJuYWxTZXJ2aWNlOjMwNjczNg==.

Next, set the list of users allowed to access all repositories under the project:

mutation {
    setRepositoryPermissionsForBitbucketProject(
        projectKey: "<project key>",
        codeHost: "<code host ID>",
        userPermissions: [
            { bindID: "user@example.com" }
        ]
    ) {
      alwaysNil
    }
}

This will return an empty respoinse immediately while also enqueuing a background task to set permissions for all the repositories that belong to the project as identified by the projectKey in the API request.

Querying project permissions task status

To get the state of currently queued or running tasks you can run the following query with a list of project keys (one or more):

query {
    bitbucketProjectPermissionJobs(projectKeys: ["<project key 1>", "<project key 2>"]) {
    nodes{
      InternalJobID,
      State,
      FailureMessage,
      QueuedAt,
      StartedAt,
      FinishedAt,
      ProcessAfter,
      ExternalServiceID,
      Permissions{
        bindID,
        permission,
      },
      Unrestricted,
    }
  }
}

The API also supports filtering against task status, which can be one of the following:

queued
canceled
errored
failed
completed

Additionally, the API supports users to control the number of tasks returned in the output by using the argument count upto an upper limit of 500 with a default value of 100.

Here's an example with all the query arguments in the API call:

query {
    bitbucketProjectPermissionJobs(projectKeys: ["a", "b"], status: "queued", count: 200) {
    nodes{
      InternalJobID,
      State,
      FailureMessage,
      QueuedAt,
      StartedAt,
      FinishedAt,
      ProcessAfter,
      ExternalServiceID,
      Permissions{
        bindID,
        permission,
      },
      Unrestricted,
    }
  }
}

Permissions for multiple code hosts

If the Sourcegraph instance is configured to sync repositories from multiple code hosts (regardless of whether they are the same code host, e.g. GitHub + GitHub or GitHub + GitLab), Sourcegraph will enforce access to repositories from each code host with authorization enabled, so long as:

users log in to Sourcegraph at least once from each code host's authentication provider
users have the same verified email in Sourcegraph (under "User settings" > "Emails") as any of the emails on the user account from the code host at the time of the initial log in via that code host

To attach a user's Sourcegraph account to all relevant code host accounts, a specific sign-in flow needs to be utilized when users are creating an account and signing into Sourcegraph for the first time.

Sign in to Sourcegraph using the one of the code host's authentication provider
Once signed in, sign out and return to the sign in page
On the sign in page, sign in again using the next code host's authentication provider
Once repeated across all relevant code hosts, users should now have access to repositories on all code hosts and have all repository permissions enforced.