Sourcegraph cAdvisor

We ship a custom cAdvisor image as part of the standard Sourcegraph Kubernetes and docker-compose distribution. cAdvisor exports container monitoring metrics scraped by Prometheus and visualized in Grafana.

The image is defined in docker-images/cadvisor.

Monitoring

Monitoring on cAdvisor metrics is defined in the monitoring generator. cAdvisor observables are generally defined as shared observables.

When adding monitoring on cAdvisor metrics, please ensure that the metric can be identified (if not, it is likely the metric is not supported).

Identifying containers

How relevant containers are identified from exported cAdvisor metrics is documented in CadvisorNameMatcher, which generates the label matcher for monitoring observables.

Because cAdvisor run on a machine and exports container metrics, standard strategies for identifying what container a metric belongs to (such as Prometheus scrape target labels) cannot be used, because all the metrics look like they belong to cAdvisor. Making things complicated is how containers are identified on various environments (namely Kubernetes and docker-compose) varies, sometimes due to characteristics of the environments and sometimes due to naming inconsistencies within Sourcegraph. Variations in how cAdvisor generates the name label it provides also makes things difficult (in some environments, it cannot generate one at all!). This means that cAdvisor can pick up non-Sourcegraph metrics, which can be problematic - see known issues for more details and current workarounds.

Available metrics

Exported metrics are documented in the cAdvisor Prometheus metrics list. In the list, the column -disable_metrics parameter indicates the "group" the metric belongs in.

Container runtime and deployment environment compatability for various metrics seem to be grouped by these groups - before using a metric, ensure that the metric is supported in all relevant environments (for example, both Docker and containerd container runtimes). Support is generally poorly documented, but a search through the cAdvisor repository issues might provide some hints.

Known issues

cAdvisor can pick up non-Sourcegraph metrics (can cause issues with our built-in observability and, in extreme cases, cause cAdvisor and Prometheus performance issues if the number of metrics is very large) due to how we currently identitify containers: sourcegraph#17365 (Kubernetes workaround)
Metrics issues
- disk metrics are not available in containerd: cadvisor#2785
- diskIO metrics do not seem to be available in Kubernetes: sourcegraph#12163