Back up or migrate Sourcegraph data to a new instance

In some circumstances it may be necessary or advantageous to migrate from one Sourcegraph instance or deployment to another. This page describes how to execute such a migration.

Specific guides

Data stores

While much of Sourcegraph's data can be regenerated, some state can be stored in multiple locations.

Configuration JSON

Most parts of Sourcegraph's configuration are managed in the webapp via text editors. These files are typically stored in the Postgres database (described below), but are translated into text for editing in the web UI.

These files are the most essential pieces of information required for a migration to work.

Data Can it be recreated without a backup? Notes
Site configuration No This file contains key configuration that defines how the product works.
Code host connection configuration(s) No Each connection to an external code host has its own short configuration file.
Global settings No Default settings can be set by administrators for all users by editing this file.

Backing up this data is as simple as copy-pasting the text from the files described above on the old Sourcegraph instance into the new one.

Internal database (Postgresql)

Sourcegraph's internal database houses most of Sourcegraph's state. While many of these pieces of data can be restored after a migration, some cannot.

This list is not guaranteed to be complete, but rather representative of the types of data stored here.

Data Can it be recreated without a backup? Notes
Repository metadata (e.g. clone URLs, whether it is a fork or archive, etc.) Yes
User accounts Yes (if using SSO authentication), No if using builtin authentication
Repository permissions Yes
Organizations No
User and org settings No Global settings can be backed up as described above, but user- and org-level settings cannot.
Saved searches No
User-generated access tokens No
Batch Changes No
Code graph metadata Yes (if manually regenerated) This can be regenerated by re-running the indexing and upload process for affected repositories and revisions, but will not be regenerated by default.
User survey responses No
Usage statistics and event logs No Event logs allow admins to track and audit usage, but are not necessary for Sourcegraph to work

Data stored on disk

Git data, search indexes, precise code-intel data, Prometheus metrics, and some other large data sources are stored on disk.

This list is not guaranteed to be complete, but rather representative of the types of data stored here.

Data Can it be recreated without a backup? Notes
Repository (git) data Yes
Search indexes Yes
Code graph data No This can be regenerated by re-running the indexing and upload process for affected repositories and revisions, but will not be regenerated by default.
Prometheus metrics No
blobstore Yes This is where unprocessed uploads are stored.

Ephemeral data (Redis)

Short-lived data, including session data and some usage statistics, are stored in Redis. This data can all be recreated without backups.

External data

Certain categories of data can be stored outside of the Sourcegraph deployment. For example, configuration JSON files can be loaded from disk, and Sourcegraph can connect to external services (PostgreSQL, Redis, S3/GCS) instead of using PostgreSQL, Redis, and blobstore internally.

In these cases, no migration should be necessary—simply re-use the existing external data sources on the new Sourcegraph instance.

Migration and backup options

Option 1: Configuration only

The easiest option is to simply back up or migrate configuration JSON data. Simply back up (by copying) the configuration files listed above and they can be pasted into a new Sourcegraph instance's UI after startup.

Option 2: All Postgres data

This option provides a more complete backup, and ensures that almost all state will be restored. Repositories will have to be recloned and reindexed, so some downtime will be required while these operations complete.

Follow the instructions in our Docker to Docker Compose migration guide to generate a dump of Sourcegraph's Postgres database. Contact us for specific recommendations for your deployment type.

Option 3: All data

Backing up all persistent volumes is the most complete option. Instructions for doing this depends on the deployment method and the cloud host. Contact us to discuss more.

Persistent data backup in Kubernetes

Please use the below table for reference when migrating your data from a Kubernetes Cluster:

Name Recreatable Notes
codeinsights-db Yes
codeintel-db Yes While the data is recreateable, we suggest including the disk during your migration as it often contains a lot of data that would take awhile to regenerate
indexed-search Yes
gitserver Yes
grafana Yes
blobstore Yes
pgsql NO This is the main database of Sourcegraph where most of the data are stored
prometheus YES
redis-cache YES
redis-store YES