Using your own PostgreSQL server

You can use your own PostgreSQL v16+ server with Sourcegraph if you wish. For example, you may prefer this if you already have existing backup infrastructure around your own PostgreSQL server, wish to use Amazon RDS, etc.

Please review the PostgreSQL documentation for a complete list of requirements.

As of version 6.0.0 The Sourcegraph application will require PostgreSQL v16+, earlier versions worked with 12 or greater.

General recommendations

If you choose to set up your own PostgreSQL server, please note we strongly recommend each database to be set up in different servers and/or hosts. We suggest either:

Deploy codeintel-db alongside the other Sourcegraph containers, i.e. not as a managed PostgreSQL instance.
Deploy a separate PostgreSQL instance. The primary reason to not use the same Postgres instance for this data is because code graph data can take up a significant of space (given the amount of indexed repositories is large) and the performance of the database may impact the performance of the general application database. You'll most likely want to be able to scale their resources independently.

We also recommend having backups for the codeintel-db as a best practice. The reason behind this recommendation is that codeintel-db data is uploaded via CI systems. If data is lost, Sourcegraph cannot automatically rebuild it from the repositories, which means you'd have to wait until it is re-uploaded from your CI systems.

Instructions

The addition of PG* environment variables to your Sourcegraph deployment files will instruct Sourcegraph to target an external PostgreSQL server. To externalize the frontend database, use the following standard PG* variables:

PGHOST
PGPORT
PGUSER
PGPASSWORD
PGDATABASE
PGSSLMODE

To externalize the code navigation database, use the following prefixed CODEINTEL_PG* variables:

CODEINTEL_PGHOST
CODEINTEL_PGPORT
CODEINTEL_PGUSER
CODEINTEL_PGPASSWORD
CODEINTEL_PGDATABASE
CODEINTEL_PGSSLMODE

If you have configured both the frontend (pgsql) and code navigation (codeintel-db) databases with the same values, the Sourcegraph instance will refuse to start. Each database should either be configured to point to distinct hosts (recommended), or configured to point to distinct databases on the same host.

sourcegraph/server

Add the following to your docker run command:

SHELL
docker run [...]
-e PGHOST=psql1.mycompany.org
-e PGUSER=sourcegraph
-e PGPASSWORD=secret
-e PGDATABASE=sourcegraph
-e PGSSLMODE=require
-e CODEINTEL_PGHOST=psql2.mycompany.org
-e CODEINTEL_PGUSER=sourcegraph
-e CODEINTEL_PGPASSWORD=secret
-e CODEINTEL_PGDATABASE=sourcegraph-codeintel
-e CODEINTEL_PGSSLMODE=require
sourcegraph/server:6.4.1203

Docker Compose

Add/modify the following environment variables to all of the sourcegraph-frontend-* services, the sourcegraph-frontend-internal service, and the migrator service (for Sourcegraph versions 3.37+) in docker-compose.yaml:

YAML
sourcegraph-frontend-0:
  environment:
    - 'PGHOST=psql1.mycompany.org'
    - 'PGUSER=sourcegraph'
    - 'PGPASSWORD=secret'
    - 'PGDATABASE=sourcegraph'
    - 'PGSSLMODE=require'
    - 'CODEINTEL_PGHOST=psql2.mycompany.org'
    - 'CODEINTEL_PGUSER=sourcegraph'
    - 'CODEINTEL_PGPASSWORD=secret'
    - 'CODEINTEL_PGDATABASE=sourcegraph-codeintel'
    - 'CODEINTEL_PGSSLMODE=require'

See Environment variables in Compose for other ways to pass these environment variables to the relevant services (including from the command line, a .env file, etc.).

Comment out / remove the internal pgsql and codeintel-db services in docker-compose.yaml since Sourcegraph is using the external one now.

Kubernetes

Update the PG* and CODEINTEL_PG* environment variables in the sourcegraph-frontend deployment YAML file to point to the external frontend (pgsql) and code navigation (codeintel-db) PostgreSQL instances, respectively. Again, these must not point to the same database or the Sourcegraph instance will refuse to start.

You are then free to remove the now unused pgsql and codeintel-db services and deployments from your cluster.

Version requirements

Please refer to our Postgres documentation to learn about version requirements.

Caveats

If your PostgreSQL server does not support SSL, set PGSSLMODE=disable instead of PGSSLMODE=require. Note that this is potentially insecure.

Most standard PostgreSQL environment variables may be specified (PGPORT, etc). See http://www.postgresql.org/docs/current/static/libpq-envars.html for a full list.

On Mac/Windows, if trying to connect to a PostgreSQL server on the same host machine, remember that Sourcegraph is running inside a Docker container inside of the Docker virtual machine. You may need to specify your actual machine IP address and not localhost or 127.0.0.1 as that refers to the Docker VM itself.

Usage with AWS RDS IAM Auth

For AWS RDS for Postgres, you have the option to use IAM database authentication to avoid using static database credentials. Learn more from AWS documentation.

In order to enable IAM Auth, you first need to:

enabled IAM authentication on the RDS instance
created the database account using IAM authentication
created IAM policy for IAM database access
created IAM roles for your compute workload and grant the role with the above policy
ensured your compute resources can assume those IAM roles
- For EKS (k8s deployment), use IAM roles for service accounts
- For EC2 (docker-compose deployment), use IAM roles for Amazon EC2

For every services that require postgres database connection, ensure below environment variables are configured:

PG_CONNECTION_UPDATER=EC2_ROLE_CREDENTIALS
PGSSLMODE=require
PGHOST=<>
PGPORT=<>
PGUSER=<> - this should be the database accounts created above
PGDATABASE=<>
CODEINTEL_PGSSLMODE=require
CODEINTEL_PGPORT=<>
CODEINTEL_PGUSER=<> this should be the database accounts created above
CODEINTEL_PGDATABASE=<>
CODEINSIGHTS_PGSSLMODE=require
CODEINSIGHTS_PGHOST=<>
CODEINSIGHTS_PGPORT=<>
CODEINSIGHTS_PGUSER=<> this should be the database accounts created above
CODEINSIGHTS_PGDATABASE=<>

Usage with PgBouncer

PgBouncer is a lightweight connections pooler for PostgreSQL. It allows more clients to connect with the PostgreSQL database without running into connection limits.

When PgBouncer is used, we need to include statement_cache_mode=describe in the PostgreSQL connection url. This can be done by configuring the PGDATASOURCE and CODEINSIGHTS_PGDATASOURCE environment variables to postgres://username:password@pgbouncer.mycompany.com:5432/sg?statement_cache_mode=describe

sourcegraph/server

Add the following to your docker run command:

docker run [...] -e PGDATASOURCE="postgres://username:password@sourcegraph-pgbouncer.mycompany.com:5432/sg?statement_cache_mode=describe" -e CODEINSIGHTS_PGDATASOURCE="postgres://username:password@sourcegraph-codeintel-pgbouncer.mycompany.com:5432/sg?statement_cache_mode=describe" sourcegraph/server:6.4.1203

Docker Compose

Add/modify the following environment variables to all of the sourcegraph-frontend-* services, the sourcegraph-frontend-internal service, and the migrator service (for Sourcegraph versions 3.37+) in docker-compose.yaml:

YAML
sourcegraph-frontend-0:
  environment:
    - 'PGDATASOURCE=postgres://username:password@sourcegraph-pgbouncer.mycompany.com:5432/sg?statement_cache_mode=describe'
    - 'CODEINSIGHTS_PGDATASOURCE=postgres://username:password@sourcegraph-codeintel-pgbouncer.mycompany.com:5432/sg?statement_cache_mode=describe'

See "Environment variables in Compose" for other ways to pass these environment variables to the relevant services (including from the command line, a .env file, etc.).

Comment out / remove the internal pgsql and codeintel-db services in docker-compose.yaml since Sourcegraph is using the external one now.

Kubernetes

Create a new Secret to store the PgBouncer credentials.

YAML
apiVersion: v1
kind: Secret
metadata:
  name: sourcegraph-pgbouncer-credentials
data:
  password: "" # note: secrets data has to be base64-encoded

YAML
apiVersion: v1
kind: Secret
metadata:
  name: sourcegraph-codeintel-pgbouncer-credentials
data:
  password: "" # note: secrets data has to be base64-encoded

Update the environment variables in the sourcegraph-frontend deployment YAML.

YAML
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sourcegraph-frontend
spec:
  template:
    spec:
      containers:
      - name: frontend
        env:
        - name: PGDATABASE
          value: sg
        - name: PGHOST
          value: sourcegraph-pgbouncer
        - name: PGPORT
          value: "5432"
        - name: PGSSLMODE
          value: disable
        - name: PGUSER
          value: sg
        - name: PGPASSWORD
          valueFrom:
            secretKeyRef:
              name: sourcegraph-pgbouncer-credentials
              key: password
        - name: PGDATASOURCE
          value: postgres://$(PGUSER):$(PGPASSWORD)@$(PGHOST):$(PGPORT)/$(PGDATABASE)?statement_cache_mode=describe
        - name: CODEINTEL_PGDATABASE
          value: sg-codeintel
        - name: CODEINTEL_PGHOST
          value: sourcegraph-codeintel-pgbouncer.mycompany.com
        - name: CODEINTEL_PGPORT
          value: "5432"
        - name: CODEINTEL_PGSSLMODE
          value: disable
        - name: CODEINTEL_PGUSER
          value: sg
        - name: CODEINTEL_PGPASSWORD
          valueFrom:
            secretKeyRef:
              name: sourcegraph-codeintel-pgbouncer-credentials
              key: password
        - name: CODEINSIGHTS_PGDATASOURCE
          value: postgres://$(CODEINTEL_PGUSER):$(CODEINTEL_PGPASSWORD)@$(CODEINTEL_PGHOST):$(CODEINTEL_PGPORT)/$(CODEINTEL_PGDATABASE)?statement_cache_mode=describe

Postgres Permissions and Database Migrations

There is a tight coupling between the respective database service accounts for the Frontend DB, CodeIntel DB and Sourcegraph database migrations.

By default, the migrations that Sourcegraph runs expect SUPERUSER permissions. Sourcegraph migrations contain SQL that enable extensions and modify roles.

On AWS RDS, you will need to perform the operations below using the rds_superuser role because RDS does not grant SUPERUSER privileges to user database accounts.

This may not be acceptable in all environments. At minimum we expect that the PGUSER and CODEINTEL_PGUSER have the ALL permissions on PGDATABASE and CODEINTEL_PGDATABASE respectively.

ALL privileges on the Database object include:

SELECT
INSERT
UPDATE
DELETE
TRUNCATE
REFERENCES
TRIGGER
CREATE
CONNECT
TEMPORARY
EXECUTE
USAGE

Using restricted permissions for pgsql (frontend DB)

For AWS RDS, refer to the note from this section.

Sourcegraph requires some initial setup that requires SUPERUSER permissions. A database administrator needs to perform the necessary actions on behalf of Sourcegraph migrations as SUPERUSER.

Update these variables to match your deployment of the Sourcegraph frontend database following the guidance from the instructions section. This database is called pgsql in the Docker Compose and Kubernetes deployments.

BASH
PGHOST=psql
PGUSER=sourcegraph
PGPASSWORD=secret
PGDATABASE=sourcegraph

The SQL script below is intended to be run from by a database administrator with SUPERUSER priviledges against the Frontend Database. It creates a database, user, and configures necesasry permissions for use by the Sourcegraph frontend services.

SQL
-- Create the application database
CREATE DATABASE $PGDATABASE;
 
-- Create the application service user
CREATE USER $PGUSER with encrypted password '$PGPASSWORD';
 
-- Give the application service permissions to the application database
GRANT ALL PRIVILEGES ON DATABASE $PGDATABASE to $PGUSER;
 
-- Select the application database
\c $PGDATABASE;
 
-- Install necessary extensions
CREATE extension citext;
CREATE extension hstore;
CREATE extension pg_stat_statements;
CREATE extension pg_trgm;
CREATE extension pgcrypto;
CREATE extension intarray;

After the database is configured, Sourcegraph will attempt to run migrations. There are a few migrations that may fail as they attempt to run actions that require SUPERUSER permissions.

These failures must be interpreted by the database administrator and resolved using guidance from How to Troubleshoot a Dirty Database. Generally-speaking this will involve looking up the migration source code and manually applying the necessary SQL code.

Initial Schema Creation

The first migration fails since it attempts to add COMMENTs to installed extensions. You may see the following error message:

SHELL
failed to run migration for schema "frontend": failed upgrade migration 1528395834: ERROR: current transaction is aborted, commands ignored until end of transaction block (SQLSTATE 25P02)

In this case, locate the UP migration 1528395834 and apply all SQL after the final COMMENT ON EXTENSION command following the dirty database procedure.

Dropping the sg_service role

The sg_service database role is a legacy role that should be removed from all Sourcegraph installations at this time. Migration remove_sg_service_role attempts to enforce this with a DROP ROLE command. The PGUSER does not have permissions to perform this action, therefore the migration fails. You can safely skip this migration.

Using restricted permissions for CodeIntel DB

For AWS RDS, refer to the note from this section.

CodeIntel requires some initial setup that requires SUPERUSER permissions. A database administrator needs to perform the necessary actions on behalf of Sourcegraph migrations as SUPERUSER.

BASH
CODEINTEL_PGHOST=psql2
CODEINTEL_PGUSER=sourcegraph
CODEINTEL_PGPASSWORD=secret
CODEINTEL_PGDATABASE=sourcegraph-codeintel
CODEINTEL_PGSSLMODE=require

The SQL script below is intended to be run from by a database administrator with SUPERUSER priviledges against the CodeIntel Database. It creates a database, user, and configures necesasry permissions for use by the Sourcegraph frontend services.

SQL
-- Create the CodeIntel database
CREATE DATABASE $CODEINTEL_PGDATABASE;
 
-- Create the CodeIntel service user
CREATE USER $CODEINTEL_PGUSER with encrypted password '$CODEINTEL_PGPASSWORD';
 
-- Give the CodeIntel  permissions to the application database
GRANT ALL PRIVILEGES ON DATABASE $CODEINTEL_PGDATABASE to $CODEINTEL_PGUSER;
 
-- Select the application database
\c $CODEINTEL_PGDATABASE;
 
-- Install necessary extensions
CREATE extension citext;
CREATE extension hstore;
CREATE extension pg_stat_statements;
CREATE extension pg_trgm;
CREATE extension pgcrypto;
CREATE extension intarray;

After the database is configured, Sourcegraph will attempt to run migrations, this time using the CodeIntel DB. There are a few migrations that may fail as they attempt to run actions that require SUPERUSER permissions.

These failures must be intepreted by the database administrator and resolved using guidance from How to Troubleshoot a Dirty Database. Generally-speaking this will involve looking up the migration source code and manually applying the necessary SQL code. The codeintel_schema_migrations table should be consulted for dirty migrations in this case.

Initial CodeIntel schema creation

Like the failure in the Sourcegraph DB (pgsql) migrations, the CodeIntel initial migration attempts to COMMENT on an extension. Resolve this in a similar manner by executing the SQL in the 1000000015_squashed_migrations.up migration after the COMMENT SQL statement.

The following error is a nudge to check the codeintel_schema_migrations table in $CODEINTEL_PGDATABASE.

SHELL
Failed to connect to codeintel database: 1 error occurred:
	* dirty database: schema is marked as dirty but no migrator instance appears to be running

The target schema is marked as dirty and no other migration operation is seen running on this schema. The last migration operation over this schema has failed (or, at least, the migrator instance issuing that migration has died). Please contact support@sourcegraph.com for further assistance.