LSIF is a file format for precomputed code intelligence data. It provides fast and precise code intelligence but needs to be periodically generated and uploaded to your Sourcegraph instance. LSIF is opt-in: repositories for which you have not uploaded LSIF data will continue to use the built-in code intelligence.
Precise code intelligence using LSIF is supported in Sourcegraph 3.8 and up.
For users who have a language server deployed, LSIF will take priority over the language server when LSIF data exists for a repository.
Follow our LSIF quickstart guide to manually generate and upload LSIF data for your repository. After you are satisfied with the result, you can upload LSIF data to a Sourcegraph instance using your existing continuous integration infrastructure, or using GitHub Actions.
Go to your global settings at https://sourcegraph.example.com/site-admin/global-settings and enable LSIF:
"codeIntel.lsif": true
After uploading LSIF files, your Sourcegraph instance will use these files to respond to code intelligence requests (such as for hovers, definitions, and references). When LSIF data does not exist for a particular file in a repository, Sourcegraph will fall back to built-in code intelligence.
You may occasionally see results from basic code intelligence even when you have uploaded LSIF data. Such results are indicated with a tooltip. This can happen in the following scenarios:
The bulk of LSIF data is stored on-disk, and as code intelligence data for a commit ages it becomes less useful. Sourcegraph will automatically remove the least recently uploaded data if the amount of disk space falls above a threshold. This value can be changed via the DBS_DIR_MAXIMUM_SIZE_BYTES
environment variable. The default value of this variable is 10737418240
, which is 1024 * 1024 * 1024 * 10
bytes, or 10
gigabytes.
Global find-references is a resource-intensive operation that's sensitive to the number of packages for which you have uploaded LSIF data into your Sourcegraph instance. Improvements to this are planned for Sourcegraph 3.10 (see the RFC).
Do not upload more than 10-40 LSIF dumps to your Sourcegraph instance or you risk harming other parts of Sourcegraph. We are working to validate its performance at scale and eliminate this concern.
The following table gives a rough estimate for the space and time requirements for indexing and conversion. These repositories are a representative sample of public Go repositories available on GitHub. The working tree size is the size of the clone at the given commit (without git history), the number of files indexed, and the number of lines of Go code in the repository. The index size gives the size of the uncompressed LSIF output of the indexer. The conversion size gives the total amount of disk space occupied after uploading the dump to a Sourcegraph instance.
Repository | Working tree size | Index time | Index size | Processing time | Post-processing size |
---|---|---|---|---|---|
bigcache | 216KB, 32 files, 2.585k loc | 1.18s | 3.5MB | 0.45s | 0.6MB |
sqlc | 396KB, 24 files, 7.041k loc | 1.53s | 7.2MB | 1.62s | 1.6MB |
nebula | 700KB, 71 files, 10.704k loc | 2.48s | 16MB | 1.63s | 2.9MB |
cayley | 5.6MB, 226 files, 36.346k loc | 5.58s | 51MB | 4.68s | 11MB |
go-ethereum | 27MB, 945 files, 317.664k loc | 20.53s | 255MB | 77.40s | 50MB |
kubernetes | 301MB, 4577 files, 1.550m loc | 1.21m | 910MB | 80.06s | 162MB |
aws-sdk-go | 119MB, 1759 files, 1.067m loc | 8.20m | 1.3GB | 155.82s | 358MB |
Cross-repository code intelligence will only be powered by LSIF when both repositories have LSIF data. When the current file has LSIF data and the other repository doesn't, there will be no code intelligence results (we're working on fallback to fuzzy code intelligence for 3.10).
To learn more, check out our lightning talk about LSIF from GopherCon 2019 or the introductory blog post: