Writing an indexer

This page describes the SCIP Code Intelligence Protocol and how you can write an indexer to emit SCIP.

At a high level, you need to follow these steps:

  1. Familiarize yourself with the SCIP protobuf schema.
  2. Import or generate SCIP bindings.
  3. Generate minimal index with occurrence information.
  4. Test your indexer using scip CLI's snapshot subcommand.
  5. Progressively add support for more features with tests.

If you run into problems or have questions for any of these steps,please open an issue on the SCIP issue tracker.

Let's go over each step one-by-one.

Understanding the SCIP protobuf schema

The SCIP protobuf schema describes the structureof a SCIP index in a machine-readable format.

The main structure is an Index which consists of a list of documentsalong with some metadata.Optionally, an index can also providehover documentation for external symbolsthat will not be indexed.

A Document has a unique path relative to the project root.It also has a list of occurrences,which attach information to source ranges,as well as a list of symbols that are definedin the document.

The information covered by an Occurrence can be syntactic or semantic:

  • Syntactic information such as the syntax_kind fieldis used for highlighting.
  • Semantic information such as the symbol and symbol_role fieldsare used to power code navigation featureslike Go to definition and Find references.

Occurrences also allow attaching diagnostic information,which can be used by static analysis tools.

For more details, see the doc commentsin the SCIP protobuf schema.

You may also find it helpfulto see how existing indexers emit information.For example, you can take a look atthe scip-typescript or scip-java codeto see how they emit SCIP indexes.

Importing or generating SCIP bindings

The SCIP repository contains bindings for several languages.

Depending on your indexer's implementation language,you can import the bindings directly using your language's package manager,or by using git submodules.One benefit of this approach is that you do not need tohave a protobuf toolchain to generate code from the schema.This also makes it easier to bump the version of SCIP to pick upnewer changes to the schema.

Alternately, you can vendor the SCIP protobuf schema into your repositoryand set up Protobuf generation yourself.This has the benefit of being able to control the processfrom end-to-end, at the cost of making updates a bit more cumbersome.

Newer Sourcegraph versions will maintain backwards compatibilitywith older SCIP versions, so there is no risk of not being ableto upload SCIP indexes if a vendored schema has not been updatedin a while.

Generating minimal index with occurrence information

As a first pass,we recommend generating occurrences for a subset of declarationsand checking that the generation works from end-to-end.

In the context of an indexer,this typically involves using a compiler frontend or a language server as a library.First, run the compiler pipeline until semantic analysis is completed.Next, perform a top-down traversal of ASTs for all files,recording information about different kinds of occurrences.

At the end, write a conversion pass from the intermediatedata to SCIP using the SCIP bindings.

As a convention, indexers should use index.scip as the default filenamefor the output. The Sourcegraph CLI recognizes this filename and usesit as the default upload path.

You can inspect the Protobuf output using protoc:

# assuming scip.proto and index.scip are in the current directory
protoc --decode=scip.Index scip.proto < index.scip

For robust testing,we recommend making sure that the result of indexing is deterministic.One potential source of issues here is non-determinsticiteration over the key-value pairs of a hash table.If re-running your indexer changes the order in which occurrences are emitted,snapshot testing may report different results.

Snapshot testing with scip CLI

One of the key design criteria for SCIPwas that it should be easy to understand an index fileand test an indexer for correctness.

The scip CLI has a snapshot subcommandwhich can be used for golden testing.It snapshot command inspects an index fileand regenerates the source code,attaching comments describing occurrence information.

Here is slightly cleaned up snippet from runningscip snapshot on the index generated byrunning scip-typescript over itself:

  function scriptElementKind(
//         ^^^^^^^^^^^^^^^^^ definition scip-typescript npm @sourcegraph/scip-typescript 0.2.0 src/FileIndexer.ts/scriptElementKind().
    node: ts.Node,
//  ^^^^ definition scip-typescript npm @sourcegraph/scip-typescript 0.2.0 src/FileIndexer.ts/scriptElementKind().(node)
//        ^^ reference local 1
//           ^^^^ reference scip-typescript npm typescript 4.6.2 lib/typescript.d.ts/ts/Node#
    sym: ts.Symbol
//  ^^^ definition scip-typescript npm @sourcegraph/scip-typescript 0.2.0 src/FileIndexer.ts/scriptElementKind().(sym)
//  documentation ```ts
//       ^^ reference local 1
//          ^^^^^^ reference scip-typescript npm typescript 4.6.2 lib/typescript.d.ts/ts/Symbol#
  ): ts.ScriptElementKind {
//   ^^ reference local 1
//      ^^^^^^^^^^^^^^^^^ reference scip-typescript npm typescript 4.6.2 lib/typescript.d.ts/ts/ScriptElementKind#

The carets and contextual information make it easy to visually check that:

  • Occurrences are being emitted for the right source ranges.
  • Occurrences have the expected symbol strings.The exact syntax for the symbol strings is describedin the doc comment for Symbol in the SCIP Protobuf schema.
  • Symbols correspond to the right package.For example, the ScriptElementKind is defined in thetypescript package (the compiler) whereasscriptElementKind is defined in @sourcegraph/scip-typescript.

Progressively adding support for language features

We recommend adding support for different features in the following order:

  1. Emit occurrences and symbols for a single file.
    • Iterate over different kinds of entities (functions, classes, properties etc.)
  2. Emit hover documentation for entities.If the markup is in a format other than CommonMark,we recommend addressing that difference after addressing other features.
  3. Add support for implementation relationships, enabling Find implementations.
  4. (Optional) If the hover documentation uses markup in a format other than CommonMark,implement a conversion from the custom markup language to CommonMark.