Life of an LSIF upload

This document describes how an LSIF data file is uploaded to a Sourcegraph instance and processed. This document does not cover the data file generation, which is covered in the user docs, on lsif.dev, and in the documentation for individual indexers.

Uploading

Data files are uploaded via the lsif upload command in the Sourcegraph command line utility. This command gzip-encodes the file on-disk and sends it to the Sourcegraph instance via an unauthenticated HTTP POST. This request is handled by a proxy handler, which will redirect the file to precise-code-intel-api-server via the lsif-server client.

This handler exists only in the enterprise version of the product. The OSS version does not register this route and will return 404 for all requests.

Prior to proxying the payload to precise-code-intel-api-server, the frontend will ensure that the target repository is cloned and the target commit exists. This latter operation may may cause a remote fetch to occur in gitserver.

Additionally, this endpoint will authorize a request via a code host token when LsifEnforceAuth is true in the site settings. This is enabled in particular for the dot-com deployment so that LSIF uploads to a public repository are only allowed from requests using an access token with collaborator access to that repository. It is not generally expected for a private instance to enable this setting. See an example of the auth flow here.

Processing

Once the upload payload is received via the precise-code-intel-api-server upload endpoint, it is written to disk and an unprocessed LSIF upload record is added to the lsif_uploads table in Postgres.

Each upload record has a state which can be one of the following:

  • queued
  • processing
  • completed
  • errored

The precise-code-intel-worker process will poll for queued uploads. Once it selects (and locks) an upload for processing, it sets its state temporarily to processing, converts the raw LSIF input on disk into a SQLite database that can be used by the precise-code-intel-api-server to answer code intelligence queries. On success, the upload's state is set to completed. On failure, the upload's state is set to errored along with an error message and a stacktrace. An upload in the completed state is visible to the precise-code-intel-api-server to answer queries.

See life of a code intelligence query for additional documentation on how the SQLite data file is read.