Developing batch changes

Getting started

Welcome, new batch change developer! This section will give you a rough overview of what Batch Changes is and how it works.

What are batch changes?

Before diving into the technical part of batch changes, make sure to read up on what batch changes are, what they're not and what we want them to be:

  1. Look at the batch changes product page.
  2. Watch the 2min demo video

Next: create your first batch change!

Creating a batch change locally

  1. Since Batch Changes is an enterprise-only feature, make sure to start your local environment with sg start (which defaults to sg start enterprise).
  2. Go through the Quickstart for Batch Changes to create a batch change in your local environment. See "Testing repositories" for a list of repositories in which you can safely publish changesets.
  3. Now combine what you just did with some background information by reading the following:

Code walkthrough

To give you a rough overview where each part of the code lives, let's take a look at which code gets executed when you

  1. run src batch preview -f your-batch-spec.yaml
  2. click on the preview link
  3. click Apply to publish changesets on the code hosts

It starts in src-cli:

  1. src batch preview starts the "preview" command in src-cli
  2. That executes your batch spec, which means it parses it, validates it, resolves the namespace, prepares the docker images, and checks which workspaces are required
  3. Then, for each repository (or workspace in each repository), it runs the steps in the batch spec by downloading a repository archive, creating a workspace in which to execute the steps, and then starting the Docker containers.
  4. If changes were produced in a repository, these changes are turned into a ChangesetSpec (a specification of what a changeset should look like on the code host—title, body, commit, etc.) and uploaded to the Sourcegraph instance
  5. src batch preview's last step is then to create a BatchSpec on the Sourcegraph instance, which is a collection of the ChangesetSpecs that you can then preview or apply

When you then click the "Preview the batch change" link that src-cli printed, you'll land on the preview page in the web frontend:

  1. The BatchChangePreviewPage component then sends a GraphQL request to the backend to query the BatchSpecByID.
  2. Once you hit the Apply button, the component uses the applyBatchChange to apply the batch spec and create a batch change.
  3. You're then redirected to the BatchChangeDetailsPage component that shows you you're newly-created batch change.

In the backend, all Batch Changes related GraphQL queries and mutations start in the Resolver package:

  1. The CreateChangesetSpec and CreateBatchSpec mutations that src-cli called to create the changeset and batch specs are defined here.
  2. When you clicked Apply the ApplyBatchChange resolver was executed to create the batch change.
  3. Most of that doesn't happen in the resolver layer, but in the service layer: here is the (*Service).ApplyBatchChange method that talks to the database to create an entry in the batch_changes table.
  4. The most important thing that happens in (*Service).ApplyBatchChange is that it calls the rewirer to wire the entries in the changesets table to the correct changeset_specs.
  5. Once that is done, the changesets are created or updated to point to the new changeset_specs that you created with src-cli.

After that you can look at your new batch change in the UI while the rest happens asynchronously in the background:

  1. In a background process (which is started in (enterprise/cmd/repo-updater](https://github.com/sourcegraph/sourcegraph/blob/e7f26c0d7bc965892669a5fc9835ec65211943aa/enterprise/cmd/repo-updater/main.go#L58)) a worker is running that monitors the changesets the table.
  2. Once a changeset has been rewired to a new changeset_spec and reset, this worker, called the Reconciler, fetches the changeset from the database and "reconciles" its current state (not published yet) with its desired state ("published on code host X, with this diff, that title and this body")
  3. To do that, the Reconciler looks at the changeset's current and previous ChangesetSpec to determine a plan for what it should do ("publish", "push a commit", "update title", etc.)
  4. Once it has the plan, it hands over to the Executor which executes the plan.
  5. To push a commit to the code host, the Executor sends a request to the gitserver service
  6. To create or update a pull request or merge request on the code host it builds a ChangesetSource which is a wrapper around the GitHub, Bitbucket Server, Bitbucket Data Center and GitLab HTTP clients.

While that is going on in the background the BatchChangeDetailsPage component is polling the GraphQL to get the current state of the Batch Change and its changesets.

Once all instances of the Reconciler worker are done determining plans and executing them, you'll see that your changesets have been published on the code hosts.

Glossary

Batch changes introduce a lot of new names, GraphQL queries & mutations, and database tables. This section tries to explain the most common names and provide a mapping between the GraphQL types and their internal counterpart in the Go backend.

GraphQL type Go type Database table Description
Changeset batches.Changeset changesets A changeset is a generic abstraction for pull requests and merge requests.
BatchChange batches.BatchChange batch_changes A batch change is a collection of changesets. The central entity.
BatchSpec batches.BatchSpec batch_specs A batch spec describes the desired state of a single batch change.
ChangesetSpec batches.ChangesetSpec changeset_specs A changeset spec describes the desired state of a single changeset.
ExternalChangeset batches.Changeset changesets Changeset is the unified name for pull requests/merge requests/etc. on code hosts.
ChangesetEvent batches.ChangesetEvent changeset_events A changeset event is an event on a code host, e.g. a comment or a review on a pull request on GitHub. They are created by syncing the changesets from the code host on a regular basis and by accepting webhook events and turning them into changeset events.

Structure of the Go backend code

The following is a list of Go packages in the sourcegraph/sourcegraph repository and short explanations of what each package does:

  • enterprise/internal/batches/types:

    Type definitions of common batches types, such as BatchChange, BatchSpec, Changeset, etc. A few helper functions and methods, but no real business logic.

  • enterprise/internal/batches:

    The hook InitBackgroundJobs injects Batch Changes code into enterprise/repo-updater. This is the "glue" in "glue code".

  • enterprise/internal/batches/background

    Another bit of glue code that starts background goroutines: the changeset reconciler, the stuck-reconciler resetter, the old-changeset-spec expirer.

  • enterprise/internal/batches/rewirer:

    The ChangesetRewirer maps existing/new changesets to the matching ChangesetSpecs when a user applies a batch spec.

  • enterprise/internal/batches/state:

    All the logic concerned with calculating a changesets state at a given point in time, taking into account its current state, past events synced from regular code host APIs, and events received via webhooks.

  • enterprise/internal/batches/search:

    Parsing text-field input for changeset searches and turning them into database-queryable structures.

  • enterprise/internal/batches/search/syntax:

    The old Sourcegraph-search-query parser we inherited from the search team a week or two back (the plan is not to keep it, but switch to the new one when we have time)

  • cmd/frontend/internal/batches/resolvers:

    The GraphQL resolvers that are injected into the enterprise/frontend in cmd/frontend/internal/batches/init.go. They mostly concern themselves with input/argument parsing/validation, (bulk-)reading (and paginating) from the database via the batches/store, but delegate most business logic to batches/service.

  • cmd/frontend/internal/batches/resolvers/apitest:

    A package that helps with testing the resolvers by defining types that match the GraphQL schema.

  • enterprise/internal/batches/testing:

    Common testing helpers we use across enterprise/internal/batches/* to create test data in the database, verify test output, etc.

  • enterprise/internal/batches/reconciler:

    The reconciler is what gets kicked off by the workerutil.Worker initialised in batches/background when a changeset is enqueued. It's the heart of the declarative model of batches: compares changeset specs, creates execution plans, executes those.

  • enterprise/internal/batches/syncer:

    This contains everything related to "sync changeset data from the code host to sourcegraph". The Syncer is started in the background, keeps state in memory (rate limit per external service), and syncs changesets either periodically (according to heuristics) or when directly enqueued from the resolvers.

  • enterprise/internal/batches/service:

    This is what's often called the "service layer" in web architectures and contains a lot of the business logic: creating a batch change and validating whether the user can create one, applying new batch specs, calling the rewirer, deleting batch changes, closing batch changes, etc.

  • cmd/frontend/internal/batches/webhooks:

    These webhooks endpoints are injected by InitFrontend into the frontend and implement the cmd/frontend/webhooks interfaces.

  • enterprise/internal/batches/store:

    This is the batch changes Store that takes enterprise/internal/batches/types types and writes/reads them to/from the database. This contains everything related to SQL and database persistence, even some complex business logic queries.

  • enterprise/internal/batches/sources:

    This package contains the abstraction layer of code host APIs that live in internal/extsvc/*. It provides a generalized interface ChangesetSource and implementations for each of our supported code hosts.

Diving into the code as a backend developer

  1. Read through ./cmd/frontend/graphqlbackend/batches.go to get an overview of the batch changes GraphQL API.
  2. Read through ./enterprise/internal/batches/types/*.go to see all batch changes related type definitions.
  3. Compare that with the GraphQL definitions in ./cmd/frontend/graphqlbackend/batches.graphql.
  4. Start reading through ./enterprise/internal/batches/resolvers/resolver.go to see how the main mutations are implemented (look at CreateBatchChange and ApplyBatchChange to see how the two main operations are implemented).
  5. Then start from the other end, enterprise/cmd/repo-updater/main.go. enterpriseInit() creates two sets of batch change goroutines:
  6. batches.NewSyncRegistry creates a pool of syncers to pull changes from code hosts.
  7. batches.RunWorkers creates a set of reconciler workers to push changes to code hosts as batch changes are applied.

Testing repositories

Batch changes create changesets (PRs) on code hosts. For testing Batch Changes locally we recommend to use the following repositories:

If you're lacking permissions to publish changesets in one of these repositories, feel free to reach out to a team member.

GitHub testing account

To use the sd9 GitHub testing account:

  1. Find the GitHub sd9 user in 1Password
  2. Copy the Campaigns Testing Token
  3. Change your dev-private/enterprise/dev/external-services-config.json to only contain a GitHub config with the token, like this:
{
  "GITHUB": [
    {
      "authorization": {},
      "url": "https://github.com",
      "token": "<TOKEN>",
      "repositoryQuery": ["affiliated"]
    }
  ]
}

Batch Spec examples

Take a look at the following links to see some examples of batch changes and the batch specs that produced them:

Server-side execution

Database tables

There are currently (Sept '21) four tables at the heart of the server-side execution of batch specs:

batch_specs. These are the batch_specs we already have, but in server-side mode they are created through a special mutation that also creates a batch_spec_resolution_job, see below.

batch_spec_resolution_jobs. These are worker jobs that are created through the GraphQL when a user wants to kick of a server-side execution. Once a batch_spec_resolution_job is created a worker will pick them up, load the corresponding batch_spec and resolve its on part into RepoWorkspaces: a combination of repository, commit, path, steps, branch, etc. For each RepoWorkspace they create a batch_spec_workspace in the database.

batch_spec_workspace. Each batch_spec_workspace represents a unit of work for a src batch exec invocation inside the executor. Once src batch exec has successfully executed, these batch_spec_workspaces will contain references to changeset_specs and those in turn will be updated to point to the batch_spec that kicked all of this off.

batch_spec_workspace_execution_jobs. These are the worker jobs that get picked up the executor and lead to src batch exec being called. Each batch_spec_workspace_execution_job points to one batch_spec_workspace. This extra table lets us separate the workspace data from the execution of src batch exec. Separation of these two tables is the result of us running into tricky concurrency problems where workers were modifying table rows that the GraphQL layer was reading (or even modifying).

Here's a diagram of their relationship: