Deploying Sourcegraph executors

Executors provide a sandbox that can run resource-intensive or untrusted tasks on behalf of the Sourcegraph instance, such as:

Requirements

Our sandboxing model requires the executor binary to be run on machines capable of running Linux KVM extensions. This requires bare-metal machines or machines capable of nested virtualization, which is made available by most popular Cloud providers.

Installation

Executors operate outside of your Sourcegraph instance and must be run separately from your Sourcegraph server deployment method.

Since they must still be able to reach the Sourcegraph instance in order to dequeue and perform work, requests between the Sourcegraph instance and the executors are authenticated via a shared secret.

That means, in order to deploy executors that can talk to the Sourcegraph instance, you need to do the following:

  1. Configure a shared secret in the Sourcegraph instance
  2. Run executors
  3. Confirm executors can reach Sourcegraph instance
  4. Optional: Configuring auto scaling
  5. Optional: Configuring observability

Configure Sourcegraph

Before starting any executors, generate an arbitrary secret string (with at least 20 characters) and set it as the executors.accessToken key in your Sourcegraph instance’s site-config.

Once the access token is set, executors can use that access token to talk to the Sourcegraph instance.

Run executors

There are two ways to install and run executors:

  1. Using our Terraform modules to provision machines on Google Cloud or AWS that run executors
  2. Downloading and running executor binaries yourself

Terraform

We supply Terraform modules to provision machines running executors on common cloud providers (Google Cloud or AWS).

A Terraform definition of executor compute resources will look similar to the following basic, minimal usage. Here, we configure the use of a Terraform module defined in the public registry - no explicit installation or clone step is required to use the modules provided by Sourcegraph.

module "executors" {
  source  = "sourcegraph/executors/<cloud>"
  version = "<version>"

  executor_sourcegraph_external_url            = "<sourcegraph_external_url>"
  executor_sourcegraph_executor_proxy_password = "<shared_secret_configured_in_sourcegraph_instance>"
  executor_queue_name                          = "codeintel" # Type of work (e.g., codeintel, batches)
  executor_instance_tag                        = "codeintel"
  executor_metrics_environment_label           = "prod"
  docker_mirror_static_ip                      = "10.0.1.4"
}

Two variables must be supplied to the module in order for it to contact your Sourcegraph instance:

  • sourcegraph_external_url (Google; AWS): The public URL of your Sourcegraph instance. This corresponds to the externalURL value in your Sourcegraph instance’s site-config and must be resolvable from the provisioned executor compute resources.
  • sourcegraph_executor_proxy_password (Google; AWS): The access token chosen and configured above.

Additional values may need to be supplied for a specific cloud provider. Refer to the relevant Terraform module documentation for specifics.

To deploy executor compute resources defined in the Terraform file above, simply run terraform apply.

See the Examples for more information on how to configure and deploy single or multiple executors.

Binaries

You can also download and run the executor binaries yourself, without using Terraform.

The following dependencies need to be available on the machine on which you want to run the executor binary:

You can also take a look at what goes into our executor machine images, used by our Terraform modules, to see how we run executor binaries.

Once dependencies are met, you can download and run executor binaries:

Step 1: Confirm that virtualization is enabled

The following command checks whether virtualization is enabled on the machine, which is required for our sandboxing model.

If it prints something other than 0, virtualization is enabled.

grep -cw vmx /proc/cpuinfo

Step 2: Download latest binary

Below are the download links for the insiders release (latest) of executors:

Download and setup executor binary:

curl -sfLo executor https://storage.googleapis.com/sourcegraph-artifacts/executor/latest/linux-amd64/executor
chmod +x executor
mv executor /usr/local/bin

Step 3: Generate Ignite base image

This creates the base image that Ignite will use when creating new Firecracker microVMs for each job.

# Change this to use the version of src-cli that's compatible with your Sourcegraph instance.
# See this for details: https://github.com/sourcegraph/src-cli#version-compatible-with-your-sourcegraph-instance
export SRC_CLI_VERSION="3.34.1"
export EXECUTOR_FIRECRACKER_IMAGE="sourcegraph/ignite-ubuntu:insiders"

# Download Dockerfile and build Docker image
mkdir -p /tmp/ignite-ubuntu
curl -sfLo /tmp/ignite-ubuntu/Dockerfile https://raw.githubusercontent.com/sourcegraph/sourcegraph/main/enterprise/cmd/executor/image/ignite-ubuntu/Dockerfile
docker build -t "${EXECUTOR_FIRECRACKER_IMAGE}" --build-arg SRC_CLI_VERSION="${SRC_CLI_VERSION}" /tmp/ignite-ubuntu

# Import Docker image into Ignite
ignite image import --runtime docker "${EXECUTOR_FIRECRACKER_IMAGE}"

# Clean up
docker image rm "${EXECUTOR_FIRECRACKER_IMAGE}"

Step 4: Optional: Pre-heat Ignite by importing kernel image

This step imports the kernel image to avoid the executor having to import it when executing the first job.

# Change this to match the version of Ignite you're using.
export IGNITE_VERSION=v0.10.0
export KERNEL_IMAGE="weaveworks/ignite-kernel:5.10.51"

ignite kernel import --runtime docker "${KERNEL_IMAGE}"
docker pull "weaveworks/ignite:${IGNITE_VERSION}"

Step 5: Setup required environment variables

Env var Example value Description
EXECUTOR_FRONTEND_URL http://sourcegraph.example.com The external URL of the Sourcegraph instance.
EXECUTOR_FRONTEND_PASSWORD our-shared-secret The shared secret configured in the Sourcegraph instannce under executors.accessToken
EXECUTOR_QUEUE_NAME batches The name of the queue to pull jobs from to. Possible values: batches and codeintel
# Example:
export EXECUTOR_QUEUE_NAME=batches
export EXECUTOR_FRONTEND_URL=http://sourcegraph.example.com
export EXECUTOR_FRONTEND_PASSWORD=hunter2hunter2hunter2

Step 6: Start executor

/usr/local/bin/executor

Confirm executors are working

If executor instances boot correctly and can authenticate with the Sourcegraph frontend, they will show up in the Executors page under Site Admin > Maintenance.

Executor list in UI

Examples

The following are complete examples of provisioning a single executor type using each of the provided Terraform modules. In these examples, executors pull from the queue codeintel, which controls auto-indexing.

The following are complete examples of provisioning multiple executor types using the provided Terraform submodules. In these examples, two pools of executors pull from the codeintel and batches queues, which control auto-indexing and server-side batch changes, respectively.

Example installation

Let’s walk through setting up a single executor VM on GCP and indexing a repository.

  1. Install Terraform 0.13.7 (must match the version listed in .tool-versions):

    brew install tfenv
    tfenv install 0.13.7
    tfenv use 0.13.7
    
  2. Install gcloud

  3. Run gcloud auth application-default login

  4. Open your Sourcegraph instance in your browser, click your profile in the top right, click Site admin, expand Configuration, click Site configuration, and set:

    • "externalURL": "<URL>" to a URL that is accessible from the GCP VM that will be created later (e.g. a public URL such as https://sourcegraph.acme.com)
    • "executors.accessToken": "<new long secret>" to a new long secret (e.g. cat /dev/random | base64 | head -c 20)
    • "codeIntelAutoIndexing.enabled": true
  5. Download the example files and change these:

    • project: your GCP project name and change region and zone if needed
    • executor_sourcegraph_external_url: this must match externalURL you set in your site config
    • executor_sourcegraph_executor_proxy_password: this must match executors.accessToken you set in your site config
  6. Run terraform init to download the Sourcegraph executor modules

  7. Run terraform apply and enter “yes” to create the executor VM

  8. Go back to the site admin page, expand Maintenance, click Executors, and check to see if your executor shows up in the list with a green dot 🟢. If it’s not there:

    • Make sure terraform apply exited with code 0 and did not print any errors

    • Make sure a GCP VM was created:

      $ gcloud compute instances list
      NAME                                          ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
      sourcegraph-executor-h0rv                     us-central1-c  n1-standard-4               10.0.1.16    ...            RUNNING
      sourcegraph-executors-docker-registry-mirror  us-central1-c  n1-standard-2               10.0.1.2     ...            RUNNING
      
    • Make sure the executor service is running:

      [email protected]:~$ systemctl status executor
      🟢 executor.service - User code executor
      Loaded: loaded (/etc/systemd/system/executor.service; enabled; vendor preset: enabled)
      Active: active (running) since Thu 2021-11-18 02:28:48 UTC; 19s ago
      
    • Make sure there are no errors in the executor service logs:

      [email protected]:~$ journalctl -u executor | less
      Nov 18 02:31:01 sourcegraph-executor-h0rv executor[2465]: t=2021-11-18T02:31:01+0000 lvl=dbug msg="TRACE internal" host=... path=/.executors/queue/codeintel/dequeue code=204 duration=92.131237ms
      Nov 18 02:31:01 sourcegraph-executor-h0rv executor[2465]: t=2021-11-18T02:31:01+0000 lvl=dbug msg="TRACE internal" host=... path=/.executors/queue/codeintel/canceled code=200 duration=90.630467ms
      Nov 18 02:31:02 sourcegraph-executor-h0rv executor[2465]: t=2021-11-18T02:31:02+0000 lvl=dbug msg="TRACE internal" host=... path=/.executors/queue/codeintel/dequeue code=204 duration=91.269106ms
      Nov 18 02:31:02 sourcegraph-executor-h0rv executor[2465]: t=2021-11-18T02:31:02+0000 lvl=dbug msg="TRACE internal" host=... path=/.executors/queue/codeintel/canceled code=200 duration=161.469685ms
      
    • Make sure the EXECUTOR_FRONTEND_URL and EXECUTOR_FRONTEND_PASSWORD in /etc/systemd/system/executor.env are correct

    • Make sure the VM can hit your externalURL:

      [email protected]:~$ curl <your externalURL here>
      <a href="/sign-in?returnTo=%2F">Found</a>
      
  9. Go back to the site admin page, expand Code intelligence, click Configuration, click Create new policy, and fill in:

    • Name: LSIF
    • Click add a repository pattern
    • Repository pattern #1: set this to an existing repository on your Sourcegraph instance (e.g. github.com/gorilla/mux)
    • Type: HEAD
    • Auto-indexing: Enabled
  10. Go to that repository’s page, click Code Intelligence, click Auto-indexing, and check to see if an indexing job has appeared. If nothing is there:

    • Try clicking Enqueue
    • Try setting a higher update frequency: PRECISE_CODE_INTEL_AUTO_INDEXING_TASK_INTERVAL=10s
    • Try setting a lower delay: PRECISE_CODE_INTEL_AUTO_INDEXING_REPOSITORY_PROCESS_DELAY=10s
  11. Once you have a completed indexing job, click Uploads and check to see that the LSIF dump has been uploaded.

  12. Once the dump has been uploaded, you should see the PRECISE badge in the hover popover! 🎉

  13. Optionally, add .terraform, terraform.tfstate, and terraform.tfstate.backup to your .gitignore.

Configuring auto scaling

Auto scaling of executor instances can help to increase concurrency of jobs, without paying for unused resources. With auto scaling, you can scale down to 0 instances when no workload exist and scale up as far as you like and your cloud provider can support. Auto scaling needs to be configured separately.

Auto scaling makes use of the auto-scaling capabilities of the respective cloud provider (AutoScalingGroups on AWS and Instance Groups on GCP). Sourcegraph’s worker service publishes a scaling metric (that is, the number of jobs in queue) to the cloud providers. Then, based on that reported value, the auto scalers add and remove compute resources to match the required amount of compute. The autoscaler will attempt to hold 1 instance running per each jobs_per_instance_scaling items in queue.

For example, if jobs_per_instance_scaling is set to 20 and the queue size is currently 400, then 20 instances would be determined as required to handle the load. You might want to tweak this number based on the machine type, concurrency per machine and desired processing speed.

With the Terraform variables min_replicas and max_replicas in the Terraform modules linked to above, you can configure the minimum and maximum number of compute machines to be run at a given time.

For auto scaling to work, two things must be true:

  1. min_replicas must be >= 0 and max_replicas must be > min_replicas.
  2. The Sourcegraph instance (its worker service, specifically) needs to publish scaling metrics to the used cloud provider.

For the latter to work, the Sourcegraph instance needs to be configured with the correct credentials that allow it to access the cloud provider.

The credentials submodule in both our AWS and GCP executor modules exists for that purpose. When used, the credentials module sets up the credentials on the cloud provider and returns them in the Terraform outputs.

Here’s an example of how one would use the credentials submodule:

module "my-credentials" {
  source  = "sourcegraph/executors/<cloud>//modules/credentials"

  # Find the latest version here:
  # - https://github.com/sourcegraph/terraform-google-executors/tags
  # - https://github.com/sourcegraph/terraform-aws-executors/tags
  version = "<version>"

  region          = <region>
  resource_prefix = ""
}

# For Google:
output "metric_writer_credentials_file" {
  value = module.my-credentials.metric_writer_credentials_file
}

# For AWS:
output "metric_writer_access_key_id" {
  value = module.my-credentials.metric_writer_access_key_id
}

output "metric_writer_secret_key" {
  value = module.my-credentials.metric_writer_secret_key
}

After a terraform apply, a terraform output should yield something like this:

# For AWS:
metric_writer_access_key_id = <THE_ACCESS_KEY_TO_CONFIGURE>
metric_writer_secret_key    = <THE_SECRET_KEY_TO_CONFIGURE>

# For Google:
metric_writer_credentials_file = <THE_CREDENTIALS_FILE_CONTENT_BASE64_ENCODED>

These outputs can then be used to configure the Sourcegraph instance, which is explained in the following sections for the different cloud providers.

Google

The GCE auto-scaling groups configured by the Sourcegraph Terraform module respond to changes in metric values written to Cloud Monitoring. The target Sourcegraph instance is expected to continuously write these values.

To write the scaling metric to Cloud Monitoring, the worker service must have defined the following environment variables:

  • EXECUTOR_METRIC_ENVIRONMENT_LABEL: Must use the value that was set as metrics_environment_label when provisioning executors
  • EXECUTOR_METRIC_GCP_PROJECT_ID: The GCP project ID
  • Option 1: EXECUTOR_METRIC_GOOGLE_APPLICATION_CREDENTIALS_FILE_CONTENT: The base64-decoded output of the metric_writer_credentials_file from above.
  • Option 2: EXECUTOR_METRIC_GOOGLE_APPLICATION_CREDENTIALS_FILE: Path to a file containing the base64-decoded metric_writer_credentials_file output from above.

AWS

The EC2 auto-scaling groups configured by the Sourcegraph Terraform module respond to changes in metric values written to CloudWatch. The target Sourcegraph instance is expected to continuously write these values.

To write the scaling metric to CloudWatch, the worker service must have defined the following environment variables:

  • EXECUTOR_METRIC_ENVIRONMENT_LABEL: Must use the same value as metrics_environment_label
  • EXECUTOR_METRIC_AWS_NAMESPACE: Must be set to sourcegraph-executor
  • EXECUTOR_METRIC_AWS_REGION
  • EXECUTOR_METRIC_AWS_ACCESS_KEY_ID
  • EXECUTOR_METRIC_AWS_SECRET_ACCESS_KEY

Testing auto scaling

Once these are set, and the worker service has been restarted, you should be able to find the scaling metrics in your cloud providers dashboards.

To test if the metric is correctly reported into the Cloud provider:

  • On Google Cloud, this can be found in the Metrics explorer. Select Resource type: Global and then Metric: custom/executors/queue/size. You should see values reported here. 0 is also an indicator that it works correct.

  • On AWS, this can be found in the CloudWatch metrics section. Under All metrics, select the namespace sourcegraph-executor and then the metric environment, queueName. Make sure there are entries returned.

Next, you can test whether the number of executors rises and shrinks as load spikes occur. Keep in mind that auto-scaling is not a real-time operation on most cloud providers and usually takes a short moment and can have some delays between the metric going down and the desired machine count adjusting.

Configuring observability

Sourcegraph ships with dashboards that can display executor metrics. We highly encourage setting this up to help make informed decisions on scaling and to make debugging easier.

In order to do that, the Prometheus instance bundled with your Sourcegraph deployment must be able to scrape the executor metrics endpoint.

That requires two things:

  1. Provide Prometheus with service account credentials that allow it to get a list of active compute instances from the cloud provider.
  2. Add additional scrape jobs to Prometheus.

To add service account credentials, you can use the credentials submodule in both our AWS and GCP executor modules.

module "credentials" {
  source  = "sourcegraph/executors/<cloud>//modules/credentials"
  version = "<version>"

  region          = <region>
  resource_prefix = ""
}

# For Google:
output "instance_scraper_credentials_file" {
  value = module.my-credentials.instance_scraper_credentials_file
}

# For AWS:
output "instance_scraper_access_key_id" {
  value = module.my-credentials.instance_scraper_access_key_id
}

output "instance_scraper_access_secret_key" {
  value = module.my-credentials.instance_scraper_access_secret_key
}

Just as with auto scaling, you use the credentials submodule to get properly configured credentials in the Terraform outputs. When applied, this will yield something like this:

# For AWS:
instance_scraper_access_key_id     = <THE_ACCESS_KEY_TO_CONFIGURE>
instance_scraper_access_secret_key = <THE_SECRET_KEY_TO_CONFIGURE>

# For Google:
instance_scraper_credentials_file = <THE_CREDENTIALS_FILE_CONTENT>

Now we can use these credentials for the different cloud providers.

Google

Credentials need to be added to the Prometheus container of your Sourcegraph deployment and a new scrape job needs to be added.

In a Kubernetes deployment, credentials can be added by mounting the credentials file obtained from the credentials module in the last step, and pointing to it from an environment variable.

Step 1: Create a secret called prometheus-secrets containing the credentials file content:

apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: prometheus-secrets
data:
  # The Terraform output for `instance_scraper_credentials_file`
  GCP_ACCOUNT_JSON: <THE_CREDENTIALS_FILE_CONTENT>

Step 2: Modify the Prometheus deployment manifest:

containers:
- name: prometheus
  # [...]
  env:
  - name: GOOGLE_APPLICATION_CREDENTIALS
    value: /credentials/google_application_credentials.json
  volumeMounts:
  - mountPath: /credentials/google_application_credentials.json
    name: credentials
    subPath: google_application_credentials.json
    readOnly: true
volumes:
- name: credentials
  secret:
    secretName: prometheus-secrets
    items:
    - key: GCP_ACCOUNT_JSON
      path: google_application_credentials.json

Step 3: Add the following scraping job that uses GCE service discovery configuration to the Prometheus configuration. To do that, you can edit the Prometheus ConfigMap and modify the contents of the prometheus.yml file. Under scrape_configs: add the following and make sure to replace {GCP_PROJECT}, {GCP_ZONE} and {INSTANCE_TAG}. The {INSTANCE_TAG} value must be the same as instance_tag.

- job_name: 'sourcegraph-executors'
  metrics_path: /proxy
  params:
    module: [executor]
  gce_sd_configs: &executor_gce_config
    - project: {GCP_PROJECT} # Change this to the GCP project ID
      port: 9999
      zone: {GCP_ZONE} # Change this to the GCP zone
      filter: '(labels.executor_tag = {INSTANCE_TAG})' # Change {INSTANCE_TAG} to the `executor_instance_tag` set in the Terraform modules
  relabel_configs: &executor_relabel_config
    - source_labels: [__meta_gce_public_ip]
      target_label: __address__
      replacement: "${1}${2}:9999"
      separator: ''
    - source_labels: [__meta_gce_zone]
      regex: ".+/([^/]+)"
      target_label: zone
      separator: ''
    - source_labels: [__meta_gce_project]
      target_label: project
    - source_labels: [__meta_gce_instance_name]
      target_label: instance
      separator: ''
    - regex: "__meta_gce_metadata_(image_.+)"
      action: labelmap
- job_name: 'sourcegraph-executor-nodes'
  metrics_path: /proxy
  params:
    module: [node]
  gce_sd_configs: *executor_gce_config
  relabel_configs: *executor_relabel_config
# If you've also used the Terraform modules to provision Docker registry
# mirrors for executors:
- job_name: 'sourcegraph-executors-docker-registry-mirrors'
  metrics_path: /proxy
  params:
    module: [registry]
  gce_sd_configs: &gce_executor_mirror_config
    - project: {GCP_PROJECT} # Change this to the GCP project ID
      port: 9999
      zone: {GCP_ZONE} # Change this to the GCP zone
      filter: '(labels.executor_tag = {INSTANCE_TAG}-docker-mirror)' # Change {INSTANCE_TAG} to the `executor_instance_tag` set in the Terraform modules
  relabel_configs: *executor_relabel_config
- job_name: 'sourcegraph-executors-docker-registry-mirror-nodes'
  metrics_path: /proxy
  params:
    module: [node]
  gce_sd_configs: *gce_executor_mirror_config
  relabel_configs: *executor_relabel_config

Step 4: Restart Prometheus.

If you currently have any executors or Docker registry mirrors running, you should start seeing metrics on the Executors dashboard in Grafana. Alternatively, you can check if the executors can be scraped, by port-forwarding the Prometheus UI to your local machine and checkin in the UI.

AWS

Credentials need to be added to the Prometheus container of your Sourcegraph deployment and a new scrape job needs to be added.

In a Kubernetes deployment, credentials can be added by setting the two secrets obtained from the credentials module in the last step as environment variables.

Step 1: Modify the Prometheus deployment manifest:

containers:
- name: prometheus
  # [...]
  env:
  - name: AWS_ACCESS_KEY_ID
    # The Terraform output for `instance_scraper_access_key_id`
    value: <THE_ACCESS_KEY_TO_CONFIGURE>
  - name: AWS_SECRET_ACCESS_KEY
    # The Terraform output for `instance_scraper_access_secret_key`
    value: <THE_SECRET_KEY_TO_CONFIGURE>

Step 2: Add the following scraping job that uses EC2 service discovery configuration to the Prometheus configuration. To do that, you can edit the Prometheus ConfigMap and modify the contents of the prometheus.yml file. Under scrape_configs: add the following and make sure to replace {AWS_REGION} and {INSTANCE_TAG}. The {INSTANCE_TAG} value must be the same as instance_tag.

- job_name: 'sourcegraph-executors'
  metrics_path: /proxy
  params:
    module: [executor]
  ec2_sd_configs: &executor_ec2_config
    - region: {AWS_REGION} # Change this to the AWS region
      port: 9999
      filters:
        - name: tag:executor_tag
          values: [{INSTANCE_TAG}] # Change {INSTANCE_TAG} to the `executor_instance_tag` set in the Terraform modules
  relabel_configs: &executor_relabel_config
    - source_labels: [__meta_ec2_public_ip]
      target_label: __address__
      replacement: "${1}${2}:9999"
      separator: ''
    - source_labels: [__meta_ec2_availability_zone]
      regex: ".+/([^/]+)"
      target_label: zone
      separator: ''
    - source_labels: [__meta_ec2_instance_id]
      target_label: instance
      separator: ''
    - source_labels: [__meta_ec2_ami]
      target_label: version
- job_name: 'sourcegraph-executor-nodes'
  metrics_path: /proxy
  params:
    module: [node]
  ec2_sd_configs: *executor_ec2_config
  relabel_configs: *executor_relabel_config
# If you've also used the Terraform modules to provision Docker registry
# mirrors for executors:
- job_name: 'sourcegraph-executors-docker-registry-mirrors'
  metrics_path: /proxy
  params:
    module: [registry]
  ec2_sd_configs: &ec2_executor_mirror_config
    - region: {AWS_REGION}
      port: 9999
      filters:
        - name: tag:executor_tag
          values: [{INSTANCE_TAG}-docker-mirror]
  relabel_configs: *executor_relabel_config
- job_name: 'sourcegraph-executors-docker-registry-mirror-nodes'
  metrics_path: /proxy
  params:
    module: [node]
  ec2_sd_configs: *ec2_executor_mirror_config
  relabel_configs: *executor_relabel_config

Step 3: Restart Prometheus.

If you currently have any executors or Docker registry mirrors running, you should start seeing metrics on the Executors dashboard in Grafana. Alternatively, you can check if the executors can be scraped, by port-forwarding the Prometheus UI to your local machine and checkin in the UI.