Using your own PostgreSQL server

You can use your own PostgreSQL v12+ server with Sourcegraph if you wish. For example, you may prefer this if you already have existing backup infrastructure around your own PostgreSQL server, wish to use Amazon RDS, etc.

Please review the PostgreSQL documentation for a complete list of requirements.

General recommendations

If you choose to set up your own PostgreSQL server, please note we strongly recommend each database to be set up in different servers and/or hosts. We suggest either:

  1. Deploy codeintel-db alongside the other Sourcegraph containers, i.e. not as a managed PostgreSQL instance.
  2. Deploy a separate PostgreSQL instance. The primary reason to not use the same Postgres instance for this data is because code graph data can take up a significant of space (given the amount of indexed repositories is large) and the performance of the database may impact the performance of the general application database. You'll most likely want to be able to scale their resources independently.

We also recommend having backups for the codeintel-db as a best practice. The reason behind this recommendation is that codeintel-db data is uploaded via CI systems. If data is lost, Sourcegraph cannot automatically rebuild it from the repositories, which means you'd have to wait until it is re-uploaded from your CI systems.

Instructions

The addition of PG* environment variables to your Sourcegraph deployment files will instruct Sourcegraph to target an external PostgreSQL server. To externalize the frontend database, use the following standard PG* variables:

  • PGHOST
  • PGPORT
  • PGUSER
  • PGPASSWORD
  • PGDATABASE
  • PGSSLMODE

To externalize the code navigation database, use the following prefixed CODEINTEL_PG* variables:

  • CODEINTEL_PGHOST
  • CODEINTEL_PGPORT
  • CODEINTEL_PGUSER
  • CODEINTEL_PGPASSWORD
  • CODEINTEL_PGDATABASE
  • CODEINTEL_PGSSLMODE

sourcegraph/server

Add the following to your docker run command:

docker run [...] -e PGHOST=psql1.mycompany.org -e PGUSER=sourcegraph -e PGPASSWORD=secret -e PGDATABASE=sourcegraph -e PGSSLMODE=require -e CODEINTEL_PGHOST=psql2.mycompany.org -e CODEINTEL_PGUSER=sourcegraph -e CODEINTEL_PGPASSWORD=secret -e CODEINTEL_PGDATABASE=sourcegraph-codeintel -e CODEINTEL_PGSSLMODE=require sourcegraph/server:5.2.5

Docker Compose

  1. Add/modify the following environment variables to all of the sourcegraph-frontend-* services, the sourcegraph-frontend-internal service, and the migrator service (for Sourcegraph versions 3.37+) in docker-compose.yaml:

    sourcegraph-frontend-0:
      # ...
      environment:
        # ...
        - 'PGHOST=psql1.mycompany.org'
        - 'PGUSER=sourcegraph'
        - 'PGPASSWORD=secret'
        - 'PGDATABASE=sourcegraph'
        - 'PGSSLMODE=require'
        - 'CODEINTEL_PGHOST=psql2.mycompany.org'
        - 'CODEINTEL_PGUSER=sourcegraph'
        - 'CODEINTEL_PGPASSWORD=secret'
        - 'CODEINTEL_PGDATABASE=sourcegraph-codeintel'
        - 'CODEINTEL_PGSSLMODE=require'
      # ...
    

    See "Environment variables in Compose" for other ways to pass these environment variables to the relevant services (including from the command line, a .env file, etc.).

  2. Comment out / remove the internal pgsql and codeintel-db services in docker-compose.yaml since Sourcegraph is using the external one now.

    # # Description: PostgreSQL database for various data.
    # #
    # # Disk: 128GB / persistent SSD
    # # Ports exposed to other Sourcegraph services: 5432/TCP 9187/TCP
    # # Ports exposed to the public internet: none
    # #
    # pgsql:
    # container_name: pgsql
    # image: 'index.docker.io/sourcegraph/postgres-11.4:19-11-14_b084311b@sha256:072481559d559cfd9a53ad77c3688b5cf583117457fd452ae238a20405923297'
    # cpus: 4
    # mem_limit: '2g'
    # healthcheck:
    #    test: '/liveness.sh'
    #    interval: 10s
    #    timeout: 1s
    #    retries: 3
    #    start_period: 15s
    # volumes:
    #    - 'pgsql:/data/'
    # networks:
    #     - sourcegraph
    # restart: always
    
    # # Description: PostgreSQL database for code navigation data.
    # #
    # # Disk: 128GB / persistent SSD
    # # Ports exposed to other Sourcegraph services: 5432/TCP 9187/TCP
    # # Ports exposed to the public internet: none
    # #
    # codeintel-db:
    #   container_name: codeintel-db
    #   image: 'index.docker.io/sourcegraph/codeintel-db@sha256:63090799b34b3115a387d96fe2227a37999d432b774a1d9b7966b8c5d81b56ad'
    #   cpus: 4
    #   mem_limit: '2g'
    #   healthcheck:
    #     test: '/liveness.sh'
    #     interval: 10s
    #     timeout: 1s
    #     retries: 3
    #     start_period: 15s
    #   volumes:
    #     - 'codeintel-db:/data/'
    #   networks:
    #     - sourcegraph
    #   restart: always
    

Kubernetes

Update the PG* and CODEINTEL_PG* environment variables in the sourcegraph-frontend deployment YAML file to point to the external frontend (pgsql) and code navigation (codeintel-db) PostgreSQL instances, respectively. Again, these must not point to the same database or the Sourcegraph instance will refuse to start.

You are then free to remove the now unused pgsql and codeintel-db services and deployments from your cluster.

Version requirements

Please refer to our Postgres documentation to learn about version requirements.

Caveats

Most standard PostgreSQL environment variables may be specified (PGPORT, etc). See http://www.postgresql.org/docs/current/static/libpq-envars.html for a full list.


Usage with AWS RDS IAM Auth

For AWS RDS for Postgres, you have the option to use IAM database authentication to avoid using static database credentials. Learn more from AWS documentation.

In order to enable IAM Auth, you first need to:

For every services that require postgres database connection, ensure below environment variables are configured:

  • PG_CONNECTION_UPDATER=EC2_ROLE_CREDENTIALS
  • PGSSLMODE=require
  • PGHOST=<>
  • PGPORT=<>
  • PGUSER=<> - this should be the database accounts created above
  • PGDATABASE=<>
  • CODEINTEL_PGSSLMODE=require
  • CODEINTEL_PGPORT=<>
  • CODEINTEL_PGUSER=<> this should be the database accounts created above
  • CODEINTEL_PGDATABASE=<>
  • CODEINSIGHTS_PGSSLMODE=require
  • CODEINSIGHTS_PGHOST=<>
  • CODEINSIGHTS_PGPORT=<>
  • CODEINSIGHTS_PGUSER=<> this should be the database accounts created above
  • CODEINSIGHTS_PGDATABASE=<>

Usage with PgBouncer

PgBouncer is a lightweight connections pooler for PostgreSQL. It allows more clients to connect with the PostgreSQL database without running into connection limits.

When PgBouncer is used, we need to include statement_cache_mode=describe in the PostgreSQL connection url. This can be done by configuring the PGDATASOURCE and CODEINSIGHTS_PGDATASOURCE environment variables to postgres://username:[email protected]:5432/sg?statement_cache_mode=describe

sourcegraph/server

Add the following to your docker run command:

docker run [...] -e PGDATASOURCE="postgres://username:[email protected]:5432/sg?statement_cache_mode=describe" -e CODEINSIGHTS_PGDATASOURCE="postgres://username:[email protected]:5432/sg?statement_cache_mode=describe" sourcegraph/server:5.2.5

Docker Compose

  1. Add/modify the following environment variables to all of the sourcegraph-frontend-* services, the sourcegraph-frontend-internal service, and the migrator service (for Sourcegraph versions 3.37+) in docker-compose.yaml:

    sourcegraph-frontend-0:
      # ...
      environment:
        # ...
        - 'PGDATASOURCE=postgres://username:[email protected]:5432/sg?statement_cache_mode=describe'
        - 'CODEINSIGHTS_PGDATASOURCE=postgres://username:[email protected]:5432/sg?statement_cache_mode=describe'
      # ...
    

    See "Environment variables in Compose" for other ways to pass these environment variables to the relevant services (including from the command line, a .env file, etc.).

  2. Comment out / remove the internal pgsql and codeintel-db services in docker-compose.yaml since Sourcegraph is using the external one now.

    # # Description: PostgreSQL database for various data.
    # #
    # # Disk: 128GB / persistent SSD
    # # Ports exposed to other Sourcegraph services: 5432/TCP 9187/TCP
    # # Ports exposed to the public internet: none
    # #
    # pgsql:
    # container_name: pgsql
    # image: 'index.docker.io/sourcegraph/postgres-11.4:19-11-14_b084311b@sha256:072481559d559cfd9a53ad77c3688b5cf583117457fd452ae238a20405923297'
    # cpus: 4
    # mem_limit: '2g'
    # healthcheck:
    #    test: '/liveness.sh'
    #    interval: 10s
    #    timeout: 1s
    #    retries: 3
    #    start_period: 15s
    # volumes:
    #    - 'pgsql:/data/'
    # networks:
    #     - sourcegraph
    # restart: always
    
    # # Description: PostgreSQL database for code navigation data.
    # #
    # # Disk: 128GB / persistent SSD
    # # Ports exposed to other Sourcegraph services: 5432/TCP 9187/TCP
    # # Ports exposed to the public internet: none
    # #
    # codeintel-db:
    #   container_name: codeintel-db
    #   image: 'index.docker.io/sourcegraph/codeintel-db@sha256:63090799b34b3115a387d96fe2227a37999d432b774a1d9b7966b8c5d81b56ad'
    #   cpus: 4
    #   mem_limit: '2g'
    #   healthcheck:
    #     test: '/liveness.sh'
    #     interval: 10s
    #     timeout: 1s
    #     retries: 3
    #     start_period: 15s
    #   volumes:
    #     - 'codeintel-db:/data/'
    #   networks:
    #     - sourcegraph
    #   restart: always
    

Kubernetes

Create a new Secret to store the PgBouncer credentials.

apiVersion: v1
kind: Secret
metadata:
  name: sourcegraph-pgbouncer-credentials
data:
  # notes: secrets data has to be base64-encoded
  password: ""
apiVersion: v1
kind: Secret
metadata:
  name: sourcegraph-codeintel-pgbouncer-credentials
data:
  # notes: secrets data has to be base64-encoded
  password: ""

Update the environment variables in the sourcegraph-frontend deployment YAML.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sourcegraph-frontend
spec:
  template:
    spec:
      containers:
      - name: frontend
        env:
        - name: PGDATABASE
          value: sg
        - name: PGHOST
          value: sourcegraph-pgbouncer
        - name: PGPORT
          value: "5432"
        - name: PGSSLMODE
          value: disable
        - name: PGUSER
          value: sg
        - name: PGPASSWORD
          valueFrom:
            secretKeyRef:
              name: sourcegraph-pgbouncer-credentials
              key: password
        - name: PGDATASOURCE
          value: postgres://$(PGUSER):$(PGPASSWORD)@$(PGHOST):$(PGPORT)/$(PGDATABASE)?statement_cache_mode=describe
        - name: CODEINTEL_PGDATABASE
          value: sg-codeintel
        - name: CODEINTEL_PGHOST
          value: sourcegraph-codeintel-pgbouncer.mycompany.com
        - name: CODEINTEL_PGPORT
          value: "5432"
        - name: CODEINTEL_PGSSLMODE
          value: disable
        - name: CODEINTEL_PGUSER
          value: sg
        - name: CODEINTEL_PGPASSWORD
          valueFrom:
            secretKeyRef:
              name: sourcegraph-codeintel-pgbouncer-credentials
              key: password
        - name: CODEINSIGHTS_PGDATASOURCE
          value: postgres://$(CODEINTEL_PGUSER):$(CODEINTEL_PGPASSWORD)@$(CODEINTEL_PGHOST):$(CODEINTEL_PGPORT)/$(CODEINTEL_PGDATABASE)?statement_cache_mode=describe

Postgres Permissions and Database Migrations

There is a tight coupling between the respective database service accounts for the Frontend DB, CodeIntel DB and Sourcegraph database migrations.

By default, the migrations that Sourcegraph runs expect SUPERUSER permissions. Sourcegraph migrations contain SQL that enable extensions and modify roles.

This may not be acceptable in all environments. At minimum we expect that the PGUSER and CODEINTEL_PGUSER have the ALL permissions on PGDATABASE and CODEINTEL_PGDATABASE respectively.

ALL privileges on the Database object include:

  • SELECT
  • INSERT
  • UPDATE
  • DELETE
  • TRUNCATE
  • REFERENCES
  • TRIGGER
  • CREATE
  • CONNECT
  • TEMPORARY
  • EXECUTE
  • USAGE

Using restricted permissions for pgsql (frontend DB)

Sourcegraph requires some initial setup that requires SUPERUSER permissions. A database administrator needs to perform the necessary actions on behalf of Sourcegraph migrations as SUPERUSER.

Update these variables to match your deployment of the Sourcegraph frontend database following the guidance from the instructions section. This database is called pgsql in the Docker Compose and Kubernetes deployments.

PGHOST=psql
PGUSER=sourcegraph
PGPASSWORD=secret
PGDATABASE=sourcegraph

The SQL script below is intended to be run from by a database administrator with SUPERUSER priviledges against the Frontend Database. It creates a database, user, and configures necesasry permissions for use by the Sourcegraph frontend services.

# Create the application database
CREATE DATABASE $PGDATABASE;

# Create the application service user
CREATE USER $PGUSER with encrypted password '$PGPASSWORD';

# Give the application service permissions to the application database
GRANT ALL PRIVILEGES ON DATABASE $PGDATABASE to $PGUSER;

# Select the application database
\c $PGDATABASE;

# Install necessary extensions
CREATE extension citext; 
CREATE extension hstore; 
CREATE extension pg_stat_statements;
CREATE extension pg_trgm;
CREATE extension pgcrypto; 
CREATE extension intarray;

After the database is configured, Sourcegraph will attempt to run migrations. There are a few migrations that may fail as they attempt to run actions that require SUPERUSER permissions.

These failures must be interpreted by the database administrator and resolved using guidance from How to Troubleshoot a Dirty Database. Generally-speaking this will involve looking up the migration source code and manually applying the necessary SQL code.

Initial Schema Creation

The first migration fails since it attempts to add COMMENTs to installed extensions. You may see the following error message:

failed to run migration for schema "frontend": failed upgrade migration 1528395834: ERROR: current transaction is aborted, commands ignored until end of transaction block (SQLSTATE 25P02)

In this case, locate the UP migration 1528395834 and apply all SQL after the final COMMENT ON EXTENSION command following the dirty database procedure.

Dropping the sg_service role

The sg_service database role is a legacy role that should be removed from all Sourcegraph installations at this time. Migration remove_sg_service_role attempts to enforce this with a DROP ROLE command. The PGUSER does not have permissions to perform this action, therefore the migration fails. You can safely skip this migration.


Using restricted permissions for CodeIntel DB

CodeIntel requires some initial setup that requires SUPERUSER permissions. A database administrator needs to perform the necessary actions on behalf of Sourcegraph migrations as SUPERUSER.

CODEINTEL_PGHOST=psql2
CODEINTEL_PGUSER=sourcegraph
CODEINTEL_PGPASSWORD=secret
CODEINTEL_PGDATABASE=sourcegraph-codeintel
CODEINTEL_PGSSLMODE=require

The SQL script below is intended to be run from by a database administrator with SUPERUSER priviledges against the CodeIntel Database. It creates a database, user, and configures necesasry permissions for use by the Sourcegraph frontend services.

# Create the CodeIntel database
CREATE DATABASE $CODEINTEL_PGDATABASE;

# Create the CodeIntel service user
CREATE USER $CODEINTEL_PGUSER with encrypted password '$CODEINTEL_PGPASSWORD';

# Give the CodeIntel  permissions to the application database
GRANT ALL PRIVILEGES ON DATABASE $CODEINTEL_PGDATABASE to $CODEINTEL_PGUSER;

# Select the application database
\c $CODEINTEL_PGDATABASE;

# Install necessary extensions
CREATE extension citext; 
CREATE extension hstore; 
CREATE extension pg_stat_statements;
CREATE extension pg_trgm;
CREATE extension pgcrypto; 
CREATE extension intarray;

After the database is configured, Sourcegraph will attempt to run migrations, this time using the CodeIntel DB. There are a few migrations that may fail as they attempt to run actions that require SUPERUSER permissions.

These failures must be intepreted by the database administrator and resolved using guidance from How to Troubleshoot a Dirty Database. Generally-speaking this will involve looking up the migration source code and manually applying the necessary SQL code. The codeintel_schema_migrations table should be consulted for dirty migrations in this case.

Initial CodeIntel schema creation

Like the failure in the Sourcegraph DB (pgsql) migrations, the CodeIntel initial migration attempts to COMMENT on an extension. Resolve this in a similar manner by executing the SQL in the 1000000015_squashed_migrations.up migration after the COMMENT SQL statement.

The following error is a nudge to check the codeintel_schema_migrations table in $CODEINTEL_PGDATABASE.

Failed to connect to codeintel database: 1 error occurred:
	* dirty database: schema is marked as dirty but no migrator instance appears to be running

The target schema is marked as dirty and no other migration operation is seen running on this schema. The last migration operation over this schema has failed (or, at least, the migrator instance issuing that migration has died). Please contact [email protected] for further assistance.