Telemetry export architecture

This page outlines the architecture and components involved in Sourcegraph's new telemetry export system.

In the lifecycle of an event, events are first stored then exported to Telemetry Gateway.

See testing events for a summary of how to observe your events during development.

Storing events

Once recorded, telemetry events are stored in two places:

  1. The structured event_logs table, for use in admin analytics, translated from the Telemetry Gateway format on a best-effort basis.
  2. The unstructured telemetry_events_export_queue table, which stores raw event payloads in Protobuf wire format for export.
    1. This table only retains events until they are marked as exported. Once exported, they are pruned after the duration specified by TELEMETRY_GATEWAY_EXPORTER_EXPORTED_EVENTS_RETENTION.

The "tee" store, including the translation from Telemetry Gateway event schema to the event_logs table, is implemented in internal/telemetry/teestore.

Note that before events are stored into telemetry_events_export_queue, sensitive attributes are stripped - this means that the contents of telemetry_events_export_queue are exactly what gets exported from an instance.

Exporting events

The telemetrygatewayexporter running in the worker service spawns a set of background jobs that handle:

  1. Reporting metrics on the telemetry_events_export_queue
  2. Cleaning up already-exported entries in the telemetry_events_export_queue
  3. Exporting batches of not-yet-exported entries in the telemetry_events_export_queue to the Telemetry Gateway service

When exporting events, we explicitly only mark an event as successfully exported when the Telemetry Gateway returns a response with a particular event's generated ID. This ensures we always export events at least once.

Telemetry Gateway

The Telemetry Gateway is a managed Sourcegraph service that ingests event exports from all Sourcegraph instances, and handles manipulating the events and publishing raw payloads to a Pub/Sub topic. It exposes a gRPC API defined in telemetrygateway/v1 - see exported events schema.

From the gRPC API, the Telemetry Gateway constructs raw JSON events to publish to a designated Pub/Sub topic that eventually makes its way into BigQuery.

Also see How to set up Telemetry Gateway locally.

For details about live Telemetry Gateway deployments, refer to the handbook Telemetry Gateway page.