Continuous integration development

This document covers information about contributing to Sourcegraph's continuous integration tools.

Pipeline generator

The source code of Sourcegraph's Buildkite pipelines generator is in /dev/ci. Internally, the pipeline generator determines what gets run over contributions based on:

  1. Run types, determined by branch naming conventions, tags, and environment variables
  2. Diff types, determined by what files have been changed in a given branch

The above factors are then used to determine the appropriate operations, composed of step options, that translate into steps in the resulting pipeline.

If you are looking to modify the pipeline, some good rules of thumbs for which construct to look at for implementing something are:

  • Adding a new check? Try a new operation or additional step options.
  • Adding a set of changes to run when particular files are changed? Start with a new or updated diff type.
  • Adding an entirely new pipeline type for the sourcegraph/sourcegraph repository? Take a look at how run types are implemented.
  • Does your check or test need a secret? Take a look at how to manage secrets.

Run types

Diff types

Operations

Developing PR checks

To create a new check that can run on pull requests on relevant files, refer to how diff types work to get started.

Then, you can add a new check to CoreTestOperations. Make sure to follow the best practices outlined in docstring.

For more advanced pipelines, see Run types.

Step options

Each operation is composed of steps that are built via step options, defined as implementations of the StepOpt interface. The core step option is Cmd, which defines a command to run when added to a pipeline via AddStep:

func addGoBuild(pipeline *bk.Pipeline) {
  pipeline.AddStep(":go: Build",
    bk.Cmd("./dev/ci/go-build.sh"),
  )
}

Creating annotations

Annotations are used to present the viewer notices about the build and they get rendered in the Buildkite UI as well as when one executes sg ci status. The pipeline generator provides an API for this that, at a high level, works like this:

  1. In your script, leave a file in ./annotations:
if [ $EXIT_CODE -ne 0 ]; then
  echo -e "$OUT" >./annotations/docsite
fi
  1. In your pipeline operation, replace the usual bk.Cmd with bk.AnnotatedCmd:
  pipeline.AddStep(":memo: Check and build docsite",
    bk.AnnotatedCmd("./dev/check/docsite.sh", bk.AnnotatedCmdOpts{
      Annotations: &bk.AnnotationOpts{},
    }))
  1. That's it!

Linters implemented in sg automatically generate annotations with the sg lint --annotations flag.

Part of the annotation that gets generated also includes a link to view the job output and, if the build is on the main branch, a link to view the job logs on Grafana.

If you don't include a file extension in the annotation file, then the contents of the file are rendered terminal output. An annotation can be rendered as Markdown instead by using the .md extension, for example:

echo -e "$OUT" >./annotations/docsite.md

For more details about best practices and additional features and capabilities, please refer to the bk.AnnotatedCmd docstring.

Caching build artefacts

For caching artefacts in steps to speed up steps, see How to cache CI artefacts.

Cached artefacts are automatically expired after 30 days (by an object lifecycle policy on the bucket).

Observability

Failure logs

Every failure in the sourcegraph/sourcegraph CI pipeline for main also uploads logs using sg to Loki. We do not publish data for successful builds or branch builds (for those, you can refer to our build traces).

For a brief overview, check out the CI dashboard, which is a set of graphs based on the contents of uploaded logs.

Some annotations also have a link "View Grafana logs" which will take one to Grafana cloud with a pre-populated query to view the log output of a failure (if any). For more about querying logs, refer to the handbook page: Grafana Cloud - CI logs.

Pipeline command tracing

Every successful build of the sourcegraph/sourcegraph repository comes with an annotation pointing at the full trace of the build on Honeycomb.io. See the Buildkite board on Honeycomb for an overview.

Individual commands are tracked from the perspective of a given step:

  pipeline.AddStep(":memo: Check and build docsite", /* ... */)

Will result in a single trace span for the ./dev/check/docsite.sh script. But the following will have individual trace spans for each pnpm commands:

  pipeline.AddStep(fmt.Sprintf(":%s: Puppeteer tests for %s extension", browser, browser),
    // ...
    bk.Cmd("pnpm install --frozen-lockfile --fetch-timeout 60000"),
    bk.Cmd("pnpm --filter @sourcegraph/browser -s run build"),

Therefore, it's beneficial for tracing purposes to split the step in multiple commands, if possible.

Buildkite infrastructure

Our continuous integration system is composed of two parts, a central server controled by Buildkite and agents that are operated by Sourcegraph within our own infrastructure. In order to provide strong isolation across builds, to prevent a previous build to create any effect on the next one, our agents are stateless jobs.

When a build is dispatched by Buildkite, each individual job will be assigned to an agent in a pristine state. Each agent will execute its assigned job, automatically report back to Buildkite and finally shuts itself down. A fresh agent will then be created and will stand in line for the next job.

This means that our agents are totally stateless, exactly like the runners used in GitHub actions.

Also see Flaky infrastructure, Continous integration infrastructure, and the Continuous integration changelog.

Pipeline setup

To set up Buildkite to use the rendered pipeline, add the following step in the pipeline settings:

go run ./dev/ci/gen-pipeline.go | buildkite-agent pipeline upload

Managing secrets

The term secret refers to authentication credentials like passwords, API keys, tokens, etc. which are used to access a particular service. To add a secret:

  1. Use Google Cloud Secret manager to add it to the sourcegraph-ci project.
  2. Inject it at deployment time as an environment variable in the CI agents via adding it to the Buildkite GSM configuration.
  3. Run terraform apply in the buildkite/kubernetes/ folder. It will make it sure the secret is available in the nodes environment.
  4. A Kubernetes Job gets dispatched onto a node, we need to tell Kubernetes to make the variable available to the Job by updating the Job manifest for the buildkite stateless agent.
  5. Run kubectl -f apply buildkite-agent-stateless/buildkite-agent.Job.yaml in the buildkite/kubernetes folder.

Note: Jobs are created dynamically and it might take a while for the new Job manifest to be picked up.

Our CI pipeline must never leak secrets:

  1. Use an environment variable name with one of the following suffixes to ensure it gets redacted in the logs: *_PASSWORD, *_SECRET, *_TOKEN, *_ACCESS_KEY, *_SECRET_KEY, *_CREDENTIALS
  2. While environment variables can be assigned when declaring steps, they should never be used for secrets, because they won't get redacted, even if they match one of the above patterns.

Creating scheduled builds

You can schedule builds with build schedules, which automatically create builds at the specified intervals. They are useful to create, for example, nightly builds.

  1. Go to Pipeline Settings in buildkite and then click New Schedule

new schedule

  1. Complete the form to create a new build where you can define the intervals with the Cron Interval field. Check out the Buildkite Docs to see a list of predefined intervals.

cron interval