Back up or migrate Sourcegraph data to a new instance

In some circumstances it may be necessary or advantageous to migrate from one Sourcegraph instance or deployment to another. This page describes how to execute such a migration.

Specific guides

Migrate from the single Docker image to Docker Compose

Data stores

While much of Sourcegraph's data can be regenerated, some state can be stored in multiple locations.

Configuration JSON

Most parts of Sourcegraph's configuration are managed in the webapp via text editors. These files are typically stored in the Postgres database (described below), but are translated into text for editing in the web UI.

These files are the most essential pieces of information required for a migration to work.

Data	Can it be recreated without a backup?	Notes
Site configuration	No	This file contains key configuration that defines how the product works.
Code host connection configuration(s)	No	Each connection to an external code host has its own short configuration file.
Global settings	No	Default settings can be set by administrators for all users by editing this file.

Backing up this data is as simple as copy-pasting the text from the files described above on the old Sourcegraph instance into the new one.

Internal database (Postgresql)

Sourcegraph's internal database houses most of Sourcegraph's state. While many of these pieces of data can be restored after a migration, some cannot.

This list is not guaranteed to be complete, but rather representative of the types of data stored here.

Data	Can it be recreated without a backup?	Notes
Repository metadata (e.g. clone URLs, whether it is a fork or archive, etc.)	Yes
User accounts	Yes (if using SSO authentication), No if using builtin authentication
Repository permissions	Yes
Organizations	No
User and org settings	No	Global settings can be backed up as described above, but user- and org-level settings cannot.
Saved searches	No
User-generated access tokens	No
Batch Changes	No
Code graph metadata	Yes (if manually regenerated)	This can be regenerated by re-running the indexing and upload process for affected repositories and revisions, but will not be regenerated by default.
User survey responses	No
Usage statistics and event logs	No	Event logs allow admins to track and audit usage, but are not necessary for Sourcegraph to work

Data stored on disk

Git data, search indexes, precise code-intel data, Prometheus metrics, and some other large data sources are stored on disk.

This list is not guaranteed to be complete, but rather representative of the types of data stored here.

Data	Can it be recreated without a backup?	Notes
Repository (git) data	Yes
Search indexes	Yes
Code graph data	No	This can be regenerated by re-running the indexing and upload process for affected repositories and revisions, but will not be regenerated by default.
Prometheus metrics	No
blobstore	Yes	This is where unprocessed uploads are stored.

Ephemeral data (Redis)

Short-lived data, including session data and some usage statistics, are stored in Redis. This data can all be recreated without backups.

External data

Certain categories of data can be stored outside of the Sourcegraph deployment. For example, configuration JSON files can be loaded from disk, and Sourcegraph can connect to external services (PostgreSQL, Redis, S3/GCS) instead of using PostgreSQL, Redis, and blobstore internally.

In these cases, no migration should be necessary—simply re-use the existing external data sources on the new Sourcegraph instance.

Migration and backup options

Option 1: Configuration only

The easiest option is to simply back up or migrate configuration JSON data. Simply back up (by copying) the configuration files listed above and they can be pasted into a new Sourcegraph instance's UI after startup.

Option 2: All Postgres data

This option provides a more complete backup, and ensures that almost all state will be restored. Repositories will have to be recloned and reindexed, so some downtime will be required while these operations complete.

Follow the instructions in our Docker to Docker Compose migration guide to generate a dump of Sourcegraph's Postgres database. Contact us for specific recommendations for your deployment type.

Option 3: All data

Backing up all persistent volumes is the most complete option. Instructions for doing this depends on the deployment method and the cloud host. Contact us to discuss more.

Persistent data backup in Kubernetes

Please use the below table for reference when migrating your data from a Kubernetes Cluster:

Name	Recreatable	Notes
codeinsights-db	Yes
codeintel-db	Yes	While the data is recreateable, we suggest including the disk during your migration as it often contains a lot of data that would take awhile to regenerate
indexed-search	Yes
gitserver	Yes
grafana	Yes
blobstore	Yes
pgsql	NO	This is the main database of Sourcegraph where most of the data are stored
prometheus	YES
redis-cache	YES
redis-store	YES