Alert solutions

This document contains possible solutions for when you find alerts are firing in Sourcegraph's monitoring. If your alert isn't mentioned here, or if the solution doesn't help, contact us for assistance.

To learn more about Sourcegraph's alerting and how to set up alerts, see our alerting guide.

frontend: 99th_percentile_search_request_duration

99th percentile successful search request duration over 5m (search)

Descriptions:

  • warning frontend: 20s+ 99th percentile successful search request duration over 5m

Possible solutions:

  • Get details on the exact queries that are slow by configuring "observability.logSlowSearches": 20, in the site configuration and looking for frontend warning logs prefixed with slow search request for additional details.
  • Check that most repositories are indexed by visiting https://sourcegraph.example.com/site-admin/repositories?filter=needs-index (it should show few or no results.)
  • Kubernetes: Check CPU usage of zoekt-webserver in the indexed-search pod, consider increasing CPU limits in the indexed-search.Deployment.yaml if regularly hitting max CPU utilization.
  • Docker Compose: Check CPU usage on the Zoekt Web Server dashboard, consider increasing cpus: of the zoekt-webserver container in docker-compose.yml if regularly hitting max CPU utilization.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_99th_percentile_search_request_duration"
]

frontend: 90th_percentile_search_request_duration

90th percentile successful search request duration over 5m (search)

Descriptions:

  • warning frontend: 15s+ 90th percentile successful search request duration over 5m

Possible solutions:

  • Get details on the exact queries that are slow by configuring "observability.logSlowSearches": 15, in the site configuration and looking for frontend warning logs prefixed with slow search request for additional details.
  • Check that most repositories are indexed by visiting https://sourcegraph.example.com/site-admin/repositories?filter=needs-index (it should show few or no results.)
  • Kubernetes: Check CPU usage of zoekt-webserver in the indexed-search pod, consider increasing CPU limits in the indexed-search.Deployment.yaml if regularly hitting max CPU utilization.
  • Docker Compose: Check CPU usage on the Zoekt Web Server dashboard, consider increasing cpus: of the zoekt-webserver container in docker-compose.yml if regularly hitting max CPU utilization.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_90th_percentile_search_request_duration"
]

frontend: hard_timeout_search_responses

hard timeout search responses every 5m (search)

Descriptions:

  • warning frontend: 2%+ hard timeout search responses every 5m for 15m0s
  • critical frontend: 5%+ hard timeout search responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_hard_timeout_search_responses",
  "critical_frontend_hard_timeout_search_responses"
]

frontend: hard_error_search_responses

hard error search responses every 5m (search)

Descriptions:

  • warning frontend: 2%+ hard error search responses every 5m for 15m0s
  • critical frontend: 5%+ hard error search responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_hard_error_search_responses",
  "critical_frontend_hard_error_search_responses"
]

frontend: partial_timeout_search_responses

partial timeout search responses every 5m (search)

Descriptions:

  • warning frontend: 5%+ partial timeout search responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_partial_timeout_search_responses"
]

frontend: search_alert_user_suggestions

search alert user suggestions shown every 5m (search)

Descriptions:

  • warning frontend: 5%+ search alert user suggestions shown every 5m for 15m0s

Possible solutions:

  • This indicates your user`s are making syntax errors or similar user errors.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_search_alert_user_suggestions"
]

frontend: page_load_latency

90th percentile page load latency over all routes over 10m (cloud)

Descriptions:

  • critical frontend: 2s+ 90th percentile page load latency over all routes over 10m

Possible solutions:

  • Confirm that the Sourcegraph frontend has enough CPU/memory using the provisioning panels.
  • Trace a request to see what the slowest part is: https://docs.sourcegraph.com/admin/observability/tracing
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_frontend_page_load_latency"
]

frontend: blob_load_latency

90th percentile blob load latency over 10m (cloud)

Descriptions:

  • critical frontend: 5s+ 90th percentile blob load latency over 10m

Possible solutions:

  • Confirm that the Sourcegraph frontend has enough CPU/memory using the provisioning panels.
  • Trace a request to see what the slowest part is: https://docs.sourcegraph.com/admin/observability/tracing
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_frontend_blob_load_latency"
]

frontend: 99th_percentile_search_codeintel_request_duration

99th percentile code-intel successful search request duration over 5m (code-intel)

Descriptions:

  • warning frontend: 20s+ 99th percentile code-intel successful search request duration over 5m

Possible solutions:

  • Get details on the exact queries that are slow by configuring "observability.logSlowSearches": 20, in the site configuration and looking for frontend warning logs prefixed with slow search request for additional details.
  • Check that most repositories are indexed by visiting https://sourcegraph.example.com/site-admin/repositories?filter=needs-index (it should show few or no results.)
  • Kubernetes: Check CPU usage of zoekt-webserver in the indexed-search pod, consider increasing CPU limits in the indexed-search.Deployment.yaml if regularly hitting max CPU utilization.
  • Docker Compose: Check CPU usage on the Zoekt Web Server dashboard, consider increasing cpus: of the zoekt-webserver container in docker-compose.yml if regularly hitting max CPU utilization.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_99th_percentile_search_codeintel_request_duration"
]

frontend: 90th_percentile_search_codeintel_request_duration

90th percentile code-intel successful search request duration over 5m (code-intel)

Descriptions:

  • warning frontend: 15s+ 90th percentile code-intel successful search request duration over 5m

Possible solutions:

  • Get details on the exact queries that are slow by configuring "observability.logSlowSearches": 15, in the site configuration and looking for frontend warning logs prefixed with slow search request for additional details.
  • Check that most repositories are indexed by visiting https://sourcegraph.example.com/site-admin/repositories?filter=needs-index (it should show few or no results.)
  • Kubernetes: Check CPU usage of zoekt-webserver in the indexed-search pod, consider increasing CPU limits in the indexed-search.Deployment.yaml if regularly hitting max CPU utilization.
  • Docker Compose: Check CPU usage on the Zoekt Web Server dashboard, consider increasing cpus: of the zoekt-webserver container in docker-compose.yml if regularly hitting max CPU utilization.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_90th_percentile_search_codeintel_request_duration"
]

frontend: hard_timeout_search_codeintel_responses

hard timeout search code-intel responses every 5m (code-intel)

Descriptions:

  • warning frontend: 2%+ hard timeout search code-intel responses every 5m for 15m0s
  • critical frontend: 5%+ hard timeout search code-intel responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_hard_timeout_search_codeintel_responses",
  "critical_frontend_hard_timeout_search_codeintel_responses"
]

frontend: hard_error_search_codeintel_responses

hard error search code-intel responses every 5m (code-intel)

Descriptions:

  • warning frontend: 2%+ hard error search code-intel responses every 5m for 15m0s
  • critical frontend: 5%+ hard error search code-intel responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_hard_error_search_codeintel_responses",
  "critical_frontend_hard_error_search_codeintel_responses"
]

frontend: partial_timeout_search_codeintel_responses

partial timeout search code-intel responses every 5m (code-intel)

Descriptions:

  • warning frontend: 5%+ partial timeout search code-intel responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_partial_timeout_search_codeintel_responses"
]

frontend: search_codeintel_alert_user_suggestions

search code-intel alert user suggestions shown every 5m (code-intel)

Descriptions:

  • warning frontend: 5%+ search code-intel alert user suggestions shown every 5m for 15m0s

Possible solutions:

  • This indicates a bug in Sourcegraph, please open an issue.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_search_codeintel_alert_user_suggestions"
]

frontend: 99th_percentile_search_api_request_duration

99th percentile successful search API request duration over 5m (search)

Descriptions:

  • warning frontend: 50s+ 99th percentile successful search API request duration over 5m

Possible solutions:

  • Get details on the exact queries that are slow by configuring "observability.logSlowSearches": 20, in the site configuration and looking for frontend warning logs prefixed with slow search request for additional details.
  • If your users are requesting many results with a large count: parameter, consider using our search pagination API.
  • Check that most repositories are indexed by visiting https://sourcegraph.example.com/site-admin/repositories?filter=needs-index (it should show few or no results.)
  • Kubernetes: Check CPU usage of zoekt-webserver in the indexed-search pod, consider increasing CPU limits in the indexed-search.Deployment.yaml if regularly hitting max CPU utilization.
  • Docker Compose: Check CPU usage on the Zoekt Web Server dashboard, consider increasing cpus: of the zoekt-webserver container in docker-compose.yml if regularly hitting max CPU utilization.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_99th_percentile_search_api_request_duration"
]

frontend: 90th_percentile_search_api_request_duration

90th percentile successful search API request duration over 5m (search)

Descriptions:

  • warning frontend: 40s+ 90th percentile successful search API request duration over 5m

Possible solutions:

  • Get details on the exact queries that are slow by configuring "observability.logSlowSearches": 15, in the site configuration and looking for frontend warning logs prefixed with slow search request for additional details.
  • If your users are requesting many results with a large count: parameter, consider using our search pagination API.
  • Check that most repositories are indexed by visiting https://sourcegraph.example.com/site-admin/repositories?filter=needs-index (it should show few or no results.)
  • Kubernetes: Check CPU usage of zoekt-webserver in the indexed-search pod, consider increasing CPU limits in the indexed-search.Deployment.yaml if regularly hitting max CPU utilization.
  • Docker Compose: Check CPU usage on the Zoekt Web Server dashboard, consider increasing cpus: of the zoekt-webserver container in docker-compose.yml if regularly hitting max CPU utilization.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_90th_percentile_search_api_request_duration"
]

frontend: hard_timeout_search_api_responses

hard timeout search API responses every 5m (search)

Descriptions:

  • warning frontend: 2%+ hard timeout search API responses every 5m for 15m0s
  • critical frontend: 5%+ hard timeout search API responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_hard_timeout_search_api_responses",
  "critical_frontend_hard_timeout_search_api_responses"
]

frontend: hard_error_search_api_responses

hard error search API responses every 5m (search)

Descriptions:

  • warning frontend: 2%+ hard error search API responses every 5m for 15m0s
  • critical frontend: 5%+ hard error search API responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_hard_error_search_api_responses",
  "critical_frontend_hard_error_search_api_responses"
]

frontend: partial_timeout_search_api_responses

partial timeout search API responses every 5m (search)

Descriptions:

  • warning frontend: 5%+ partial timeout search API responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_partial_timeout_search_api_responses"
]

frontend: search_api_alert_user_suggestions

search API alert user suggestions shown every 5m (search)

Descriptions:

  • warning frontend: 5%+ search API alert user suggestions shown every 5m

Possible solutions:

  • This indicates your user`s search API requests have syntax errors or a similar user error. Check the responses the API sends back for an explanation.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_search_api_alert_user_suggestions"
]

frontend: codeintel_resolvers_99th_percentile_duration

99th percentile successful resolver duration over 5m (code-intel)

Descriptions:

  • warning frontend: 20s+ 99th percentile successful resolver duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_resolvers_99th_percentile_duration"
]

frontend: codeintel_resolvers_errors

resolver errors every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ resolver errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_resolvers_errors"
]

frontend: codeintel_api_99th_percentile_duration

99th percentile successful codeintel API operation duration over 5m (code-intel)

Descriptions:

  • warning frontend: 20s+ 99th percentile successful codeintel API operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_api_99th_percentile_duration"
]

frontend: codeintel_api_errors

code intel API errors every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ code intel API errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_api_errors"
]

frontend: codeintel_dbstore_99th_percentile_duration

99th percentile successful database store operation duration over 5m (code-intel)

Descriptions:

  • warning frontend: 20s+ 99th percentile successful database store operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_dbstore_99th_percentile_duration"
]

frontend: codeintel_dbstore_errors

database store errors every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ database store errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_dbstore_errors"
]

frontend: codeintel_upload_workerstore_99th_percentile_duration

99th percentile successful upload worker store operation duration over 5m (code-intel)

Descriptions:

  • warning frontend: 20s+ 99th percentile successful upload worker store operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_upload_workerstore_99th_percentile_duration"
]

frontend: codeintel_upload_workerstore_errors

upload worker store errors every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ upload worker store errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_upload_workerstore_errors"
]

frontend: codeintel_index_workerstore_99th_percentile_duration

99th percentile successful index worker store operation duration over 5m (code-intel)

Descriptions:

  • warning frontend: 20s+ 99th percentile successful index worker store operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_index_workerstore_99th_percentile_duration"
]

frontend: codeintel_index_workerstore_errors

index worker store errors every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ index worker store errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_index_workerstore_errors"
]

frontend: codeintel_lsifstore_99th_percentile_duration

99th percentile successful LSIF store operation duration over 5m (code-intel)

Descriptions:

  • warning frontend: 20s+ 99th percentile successful LSIF store operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_lsifstore_99th_percentile_duration"
]

frontend: codeintel_lsifstore_errors

lSIF store errors every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ lSIF store errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_lsifstore_errors"
]

frontend: codeintel_uploadstore_99th_percentile_duration

99th percentile successful upload store operation duration over 5m (code-intel)

Descriptions:

  • warning frontend: 20s+ 99th percentile successful upload store operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_uploadstore_99th_percentile_duration"
]

frontend: codeintel_uploadstore_errors

upload store errors every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ upload store errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_uploadstore_errors"
]

frontend: codeintel_gitserverclient_99th_percentile_duration

99th percentile successful gitserver client operation duration over 5m (code-intel)

Descriptions:

  • warning frontend: 20s+ 99th percentile successful gitserver client operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_gitserverclient_99th_percentile_duration"
]

frontend: codeintel_gitserverclient_errors

gitserver client errors every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ gitserver client errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_gitserverclient_errors"
]

frontend: codeintel_commit_graph_queue_size

commit graph queue size (code-intel)

Descriptions:

  • warning frontend: 100+ commit graph queue size

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_commit_graph_queue_size"
]

frontend: codeintel_commit_graph_queue_growth_rate

commit graph queue growth rate over 30m (code-intel)

Descriptions:

  • warning frontend: 5+ commit graph queue growth rate over 30m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_commit_graph_queue_growth_rate"
]

frontend: codeintel_commit_graph_updater_99th_percentile_duration

99th percentile successful commit graph updater operation duration over 5m (code-intel)

Descriptions:

  • warning frontend: 20s+ 99th percentile successful commit graph updater operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_commit_graph_updater_99th_percentile_duration"
]

frontend: codeintel_commit_graph_updater_errors

commit graph updater errors every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ commit graph updater errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_commit_graph_updater_errors"
]

frontend: codeintel_janitor_errors

janitor errors every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ janitor errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_janitor_errors"
]

frontend: codeintel_background_upload_resets

upload records re-queued (due to unresponsive worker) every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ upload records re-queued (due to unresponsive worker) every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_background_upload_resets"
]

frontend: codeintel_background_upload_reset_failures

upload records errored due to repeated reset every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ upload records errored due to repeated reset every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_background_upload_reset_failures"
]

frontend: codeintel_background_index_resets

index records re-queued (due to unresponsive indexer) every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ index records re-queued (due to unresponsive indexer) every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_background_index_resets"
]

frontend: codeintel_background_index_reset_failures

index records errored due to repeated reset every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ index records errored due to repeated reset every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_background_index_reset_failures"
]

frontend: codeintel_indexing_errors

indexing errors every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ indexing errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_indexing_errors"
]

frontend: codeintel_autoindex_enqueuer_errors

index enqueuer errors every 5m (code-intel)

Descriptions:

  • warning frontend: 20+ index enqueuer errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_codeintel_autoindex_enqueuer_errors"
]

frontend: internal_indexed_search_error_responses

internal indexed search error responses every 5m (search)

Descriptions:

  • warning frontend: 5%+ internal indexed search error responses every 5m for 15m0s

Possible solutions:

  • Check the Zoekt Web Server dashboard for indications it might be unhealthy.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_internal_indexed_search_error_responses"
]

frontend: internal_unindexed_search_error_responses

internal unindexed search error responses every 5m (search)

Descriptions:

  • warning frontend: 5%+ internal unindexed search error responses every 5m for 15m0s

Possible solutions:

  • Check the Searcher dashboard for indications it might be unhealthy.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_internal_unindexed_search_error_responses"
]

frontend: internal_api_error_responses

internal API error responses every 5m by route (cloud)

Descriptions:

  • warning frontend: 5%+ internal API error responses every 5m by route for 15m0s

Possible solutions:

  • May not be a substantial issue, check the frontend logs for potential causes.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_internal_api_error_responses"
]

frontend: 99th_percentile_gitserver_duration

99th percentile successful gitserver query duration over 5m (cloud)

Descriptions:

  • warning frontend: 20s+ 99th percentile successful gitserver query duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_99th_percentile_gitserver_duration"
]

frontend: gitserver_error_responses

gitserver error responses every 5m (cloud)

Descriptions:

  • warning frontend: 5%+ gitserver error responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_gitserver_error_responses"
]

frontend: observability_test_alert_warning

warning test alert metric (distribution)

Descriptions:

  • warning frontend: 1+ warning test alert metric

Possible solutions:

  • This alert is triggered via the triggerObservabilityTestAlert GraphQL endpoint, and will automatically resolve itself.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_observability_test_alert_warning"
]

frontend: observability_test_alert_critical

critical test alert metric (distribution)

Descriptions:

  • critical frontend: 1+ critical test alert metric

Possible solutions:

  • This alert is triggered via the triggerObservabilityTestAlert GraphQL endpoint, and will automatically resolve itself.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_frontend_observability_test_alert_critical"
]

frontend: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (cloud)

Descriptions:

  • warning frontend: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the (frontend|sourcegraph-frontend) container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_container_cpu_usage"
]

frontend: container_memory_usage

container memory usage by instance (cloud)

Descriptions:

  • warning frontend: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of (frontend|sourcegraph-frontend) container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_container_memory_usage"
]

frontend: container_restarts

container restarts (cloud)

Descriptions:

  • warning frontend: 1+ container restarts
  • critical frontend: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod (frontend|sourcegraph-frontend) (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p (frontend|sourcegraph-frontend).
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' (frontend|sourcegraph-frontend) (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the (frontend|sourcegraph-frontend) container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs (frontend|sourcegraph-frontend) (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_container_restarts",
  "critical_frontend_container_restarts"
]

frontend: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (cloud)

Descriptions:

  • warning frontend: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the (frontend|sourcegraph-frontend) service.
  • Docker Compose: Consider increasing cpus: of the (frontend|sourcegraph-frontend) container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_provisioning_container_cpu_usage_long_term"
]

frontend: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (cloud)

Descriptions:

  • warning frontend: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the (frontend|sourcegraph-frontend) service.
  • Docker Compose: Consider increasing memory: of the (frontend|sourcegraph-frontend) container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_provisioning_container_memory_usage_long_term"
]

frontend: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (cloud)

Descriptions:

  • warning frontend: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the (frontend|sourcegraph-frontend) container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_provisioning_container_cpu_usage_short_term"
]

frontend: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (cloud)

Descriptions:

  • warning frontend: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of (frontend|sourcegraph-frontend) container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_provisioning_container_memory_usage_short_term"
]

frontend: go_goroutines

maximum active goroutines (cloud)

Descriptions:

  • warning frontend: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_go_goroutines"
]

frontend: go_gc_duration_seconds

maximum go garbage collection duration (cloud)

Descriptions:

  • warning frontend: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_go_gc_duration_seconds"
]

frontend: pods_available_percentage

percentage pods available (cloud)

Descriptions:

  • critical frontend: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_frontend_pods_available_percentage"
]

gitserver: disk_space_remaining

disk space remaining by instance (cloud)

Descriptions:

  • warning gitserver: less than 25% disk space remaining by instance
  • critical gitserver: less than 15% disk space remaining by instance

Possible solutions:

  • Provision more disk space: Sourcegraph will begin deleting least-used repository clones at 10% disk space remaining which may result in decreased performance, users having to wait for repositories to clone, etc.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_disk_space_remaining",
  "critical_gitserver_disk_space_remaining"
]

gitserver: running_git_commands

running git commands (cloud)

Descriptions:

  • warning gitserver: 50+ running git commands for 2m0s
  • critical gitserver: 100+ running git commands for 5m0s

Possible solutions:

  • Check if the problem may be an intermittent and temporary peak using the "Container monitoring" section at the bottom of the Git Server dashboard.
  • Single container deployments: Consider upgrading to a Docker Compose deployment which offers better scalability and resource isolation.
  • Kubernetes and Docker Compose: Check that you are running a similar number of git server replicas and that their CPU/memory limits are allocated according to what is shown in the Sourcegraph resource estimator.
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_running_git_commands",
  "critical_gitserver_running_git_commands"
]

gitserver: repository_clone_queue_size

repository clone queue size (cloud)

Descriptions:

  • warning gitserver: 25+ repository clone queue size

Possible solutions:

  • If you just added several repositories, the warning may be expected.
  • Check which repositories need cloning, by visiting e.g. https://sourcegraph.example.com/site-admin/repositories?filter=not-cloned
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_repository_clone_queue_size"
]

gitserver: repository_existence_check_queue_size

repository existence check queue size (cloud)

Descriptions:

  • warning gitserver: 25+ repository existence check queue size

Possible solutions:

  • Check the code host status indicator for errors: on the Sourcegraph app homepage, when signed in as an admin click the cloud icon in the top right corner of the page.
  • Check if the issue continues to happen after 30 minutes, it may be temporary.
  • Check the gitserver logs for more information.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_repository_existence_check_queue_size"
]

gitserver: frontend_internal_api_error_responses

frontend-internal API error responses every 5m by route (cloud)

Descriptions:

  • warning gitserver: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs gitserver for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs gitserver for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_frontend_internal_api_error_responses"
]

gitserver: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (cloud)

Descriptions:

  • warning gitserver: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the gitserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_container_cpu_usage"
]

gitserver: container_memory_usage

container memory usage by instance (cloud)

Descriptions:

  • warning gitserver: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of gitserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_container_memory_usage"
]

gitserver: container_restarts

container restarts (cloud)

Descriptions:

  • warning gitserver: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod gitserver (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p gitserver.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' gitserver (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the gitserver container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs gitserver (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_container_restarts"
]

gitserver: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (cloud)

Descriptions:

  • warning gitserver: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the gitserver service.
  • Docker Compose: Consider increasing cpus: of the gitserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_provisioning_container_cpu_usage_long_term"
]

gitserver: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (cloud)

Descriptions:

  • warning gitserver: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the gitserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_provisioning_container_cpu_usage_short_term"
]

gitserver: go_goroutines

maximum active goroutines (cloud)

Descriptions:

  • warning gitserver: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_go_goroutines"
]

gitserver: go_gc_duration_seconds

maximum go garbage collection duration (cloud)

Descriptions:

  • warning gitserver: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_go_gc_duration_seconds"
]

gitserver: pods_available_percentage

percentage pods available (cloud)

Descriptions:

  • critical gitserver: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_gitserver_pods_available_percentage"
]

github-proxy: github_proxy_waiting_requests

number of requests waiting on the global mutex (cloud)

Descriptions:

  • warning github-proxy: 100+ number of requests waiting on the global mutex for 5m0s

Possible solutions:

  •   						- **Check github-proxy logs for network connection issues.
      						- **Check github status.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_github_proxy_waiting_requests"
]

github-proxy: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (cloud)

Descriptions:

  • warning github-proxy: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the github-proxy container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_container_cpu_usage"
]

github-proxy: container_memory_usage

container memory usage by instance (cloud)

Descriptions:

  • warning github-proxy: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of github-proxy container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_container_memory_usage"
]

github-proxy: container_restarts

container restarts (cloud)

Descriptions:

  • warning github-proxy: 1+ container restarts
  • critical github-proxy: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod github-proxy (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p github-proxy.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' github-proxy (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the github-proxy container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs github-proxy (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_container_restarts",
  "critical_github-proxy_container_restarts"
]

github-proxy: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (cloud)

Descriptions:

  • warning github-proxy: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the github-proxy service.
  • Docker Compose: Consider increasing cpus: of the github-proxy container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_provisioning_container_cpu_usage_long_term"
]

github-proxy: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (cloud)

Descriptions:

  • warning github-proxy: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the github-proxy service.
  • Docker Compose: Consider increasing memory: of the github-proxy container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_provisioning_container_memory_usage_long_term"
]

github-proxy: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (cloud)

Descriptions:

  • warning github-proxy: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the github-proxy container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_provisioning_container_cpu_usage_short_term"
]

github-proxy: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (cloud)

Descriptions:

  • warning github-proxy: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of github-proxy container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_provisioning_container_memory_usage_short_term"
]

github-proxy: go_goroutines

maximum active goroutines (cloud)

Descriptions:

  • warning github-proxy: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_go_goroutines"
]

github-proxy: go_gc_duration_seconds

maximum go garbage collection duration (cloud)

Descriptions:

  • warning github-proxy: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_go_gc_duration_seconds"
]

github-proxy: pods_available_percentage

percentage pods available (cloud)

Descriptions:

  • critical github-proxy: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_github-proxy_pods_available_percentage"
]

postgres: connections

active connections (cloud)

Descriptions:

  • warning postgres: less than 5 active connections for 5m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_postgres_connections"
]

postgres: transaction_durations

maximum transaction durations (cloud)

Descriptions:

  • warning postgres: 300ms+ maximum transaction durations for 5m0s
  • critical postgres: 500ms+ maximum transaction durations for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_postgres_transaction_durations",
  "critical_postgres_transaction_durations"
]

postgres: postgres_up

database availability (cloud)

Descriptions:

  • critical postgres: less than 0 database availability for 5m0s

Possible solutions:

  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_postgres_postgres_up"
]

postgres: pg_exporter_err

errors scraping postgres exporter (cloud)

Descriptions:

  • warning postgres: 1+ errors scraping postgres exporter for 5m0s

Possible solutions:

  • Ensure the Postgres exporter can access the Postgres database. Also, check the Postgres exporter logs for errors.
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_postgres_pg_exporter_err"
]

postgres: migration_in_progress

active schema migration (cloud)

Descriptions:

  • critical postgres: 1+ active schema migration for 5m0s

Possible solutions:

  • The database migration has been in progress for 5 or more minutes - please contact Sourcegraph if this persists.
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_postgres_migration_in_progress"
]

postgres: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (cloud)

Descriptions:

  • warning postgres: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the (pgsql|codeintel-db) service.
  • Docker Compose: Consider increasing cpus: of the (pgsql|codeintel-db) container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_postgres_provisioning_container_cpu_usage_long_term"
]

postgres: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (cloud)

Descriptions:

  • warning postgres: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the (pgsql|codeintel-db) service.
  • Docker Compose: Consider increasing memory: of the (pgsql|codeintel-db) container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_postgres_provisioning_container_memory_usage_long_term"
]

postgres: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (cloud)

Descriptions:

  • warning postgres: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the (pgsql|codeintel-db) container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_postgres_provisioning_container_cpu_usage_short_term"
]

postgres: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (cloud)

Descriptions:

  • warning postgres: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of (pgsql|codeintel-db) container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_postgres_provisioning_container_memory_usage_short_term"
]

postgres: pods_available_percentage

percentage pods available (cloud)

Descriptions:

  • critical postgres: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_postgres_pods_available_percentage"
]

precise-code-intel-worker: upload_queue_size

queue size (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 100+ queue size

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_upload_queue_size"
]

precise-code-intel-worker: upload_queue_growth_rate

queue growth rate over 30m (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 5+ queue growth rate over 30m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_upload_queue_growth_rate"
]

precise-code-intel-worker: job_errors

job errors errors every 5m (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 20+ job errors errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_job_errors"
]

precise-code-intel-worker: codeintel_dbstore_99th_percentile_duration

99th percentile successful database store operation duration over 5m (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 20s+ 99th percentile successful database store operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_codeintel_dbstore_99th_percentile_duration"
]

precise-code-intel-worker: codeintel_dbstore_errors

database store errors every 5m (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 20+ database store errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_codeintel_dbstore_errors"
]

precise-code-intel-worker: codeintel_workerstore_99th_percentile_duration

99th percentile successful worker store operation duration over 5m (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 20s+ 99th percentile successful worker store operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_codeintel_workerstore_99th_percentile_duration"
]

precise-code-intel-worker: codeintel_workerstore_errors

worker store errors every 5m (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 20+ worker store errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_codeintel_workerstore_errors"
]

precise-code-intel-worker: codeintel_lsifstore_99th_percentile_duration

99th percentile successful LSIF store operation duration over 5m (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 20s+ 99th percentile successful LSIF store operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_codeintel_lsifstore_99th_percentile_duration"
]

precise-code-intel-worker: codeintel_lsifstore_errors

lSIF store errors every 5m (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 20+ lSIF store errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_codeintel_lsifstore_errors"
]

precise-code-intel-worker: codeintel_uploadstore_99th_percentile_duration

99th percentile successful upload store operation duration over 5m (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 20s+ 99th percentile successful upload store operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_codeintel_uploadstore_99th_percentile_duration"
]

precise-code-intel-worker: codeintel_uploadstore_errors

upload store errors every 5m (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 20+ upload store errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_codeintel_uploadstore_errors"
]

precise-code-intel-worker: codeintel_gitserverclient_99th_percentile_duration

99th percentile successful gitserver client operation duration over 5m (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 20s+ 99th percentile successful gitserver client operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_codeintel_gitserverclient_99th_percentile_duration"
]

precise-code-intel-worker: codeintel_gitserverclient_errors

gitserver client errors every 5m (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 20+ gitserver client errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_codeintel_gitserverclient_errors"
]

precise-code-intel-worker: frontend_internal_api_error_responses

frontend-internal API error responses every 5m by route (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs precise-code-intel-worker for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs precise-code-intel-worker for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_frontend_internal_api_error_responses"
]

precise-code-intel-worker: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_container_cpu_usage"
]

precise-code-intel-worker: container_memory_usage

container memory usage by instance (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_container_memory_usage"
]

precise-code-intel-worker: container_restarts

container restarts (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 1+ container restarts
  • critical precise-code-intel-worker: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod precise-code-intel-worker (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p precise-code-intel-worker.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' precise-code-intel-worker (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the precise-code-intel-worker container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs precise-code-intel-worker (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_container_restarts",
  "critical_precise-code-intel-worker_container_restarts"
]

precise-code-intel-worker: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the precise-code-intel-worker service.
  • Docker Compose: Consider increasing cpus: of the precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_provisioning_container_cpu_usage_long_term"
]

precise-code-intel-worker: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the precise-code-intel-worker service.
  • Docker Compose: Consider increasing memory: of the precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_provisioning_container_memory_usage_long_term"
]

precise-code-intel-worker: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_provisioning_container_cpu_usage_short_term"
]

precise-code-intel-worker: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_provisioning_container_memory_usage_short_term"
]

precise-code-intel-worker: go_goroutines

maximum active goroutines (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_go_goroutines"
]

precise-code-intel-worker: go_gc_duration_seconds

maximum go garbage collection duration (code-intel)

Descriptions:

  • warning precise-code-intel-worker: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_go_gc_duration_seconds"
]

precise-code-intel-worker: pods_available_percentage

percentage pods available (code-intel)

Descriptions:

  • critical precise-code-intel-worker: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_precise-code-intel-worker_pods_available_percentage"
]

query-runner: frontend_internal_api_error_responses

frontend-internal API error responses every 5m by route (search)

Descriptions:

  • warning query-runner: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs query-runner for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs query-runner for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_frontend_internal_api_error_responses"
]

query-runner: container_memory_usage

container memory usage by instance (search)

Descriptions:

  • warning query-runner: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of query-runner container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_container_memory_usage"
]

query-runner: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (search)

Descriptions:

  • warning query-runner: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the query-runner container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_container_cpu_usage"
]

query-runner: container_restarts

container restarts (search)

Descriptions:

  • warning query-runner: 1+ container restarts
  • critical query-runner: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod query-runner (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p query-runner.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' query-runner (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the query-runner container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs query-runner (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_container_restarts",
  "critical_query-runner_container_restarts"
]

query-runner: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (search)

Descriptions:

  • warning query-runner: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the query-runner service.
  • Docker Compose: Consider increasing cpus: of the query-runner container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_provisioning_container_cpu_usage_long_term"
]

query-runner: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (search)

Descriptions:

  • warning query-runner: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the query-runner service.
  • Docker Compose: Consider increasing memory: of the query-runner container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_provisioning_container_memory_usage_long_term"
]

query-runner: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (search)

Descriptions:

  • warning query-runner: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the query-runner container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_provisioning_container_cpu_usage_short_term"
]

query-runner: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (search)

Descriptions:

  • warning query-runner: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of query-runner container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_provisioning_container_memory_usage_short_term"
]

query-runner: go_goroutines

maximum active goroutines (search)

Descriptions:

  • warning query-runner: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_go_goroutines"
]

query-runner: go_gc_duration_seconds

maximum go garbage collection duration (search)

Descriptions:

  • warning query-runner: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_go_gc_duration_seconds"
]

query-runner: pods_available_percentage

percentage pods available (search)

Descriptions:

  • critical query-runner: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_query-runner_pods_available_percentage"
]

repo-updater: frontend_internal_api_error_responses

frontend-internal API error responses every 5m by route (cloud)

Descriptions:

  • warning repo-updater: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs repo-updater for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs repo-updater for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_frontend_internal_api_error_responses"
]

repo-updater: src_repoupdater_max_sync_backoff

time since oldest sync (cloud)

Descriptions:

  • critical repo-updater: 32400s+ time since oldest sync for 10m0s

Possible solutions:

  • An alert here indicates that no code host connections have synced in at least 9h0m0s. This indicates that there could be a configuration issue with your code hosts connections or networking issues affecting communication with your code hosts.
  • Check the code host status indicator (cloud icon in top right of Sourcegraph homepage) for errors.
  • Make sure external services do not have invalid tokens by navigating to them in the web UI and clicking save. If there are no errors, they are valid.
  • Check the repo-updater logs for errors about syncing.
  • Confirm that outbound network connections are allowed where repo-updater is deployed.
  • Check back in an hour to see if the issue has resolved itself.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_src_repoupdater_max_sync_backoff"
]

repo-updater: src_repoupdater_syncer_sync_errors_total

sync error rate (cloud)

Descriptions:

  • critical repo-updater: 0+ sync error rate for 10m0s

Possible solutions:

  • An alert here indicates errors syncing repo metadata with code hosts. This indicates that there could be a configuration issue with your code hosts connections or networking issues affecting communication with your code hosts.
  • Check the code host status indicator (cloud icon in top right of Sourcegraph homepage) for errors.
  • Make sure external services do not have invalid tokens by navigating to them in the web UI and clicking save. If there are no errors, they are valid.
  • Check the repo-updater logs for errors about syncing.
  • Confirm that outbound network connections are allowed where repo-updater is deployed.
  • Check back in an hour to see if the issue has resolved itself.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_src_repoupdater_syncer_sync_errors_total"
]

repo-updater: syncer_sync_start

sync was started (cloud)

Descriptions:

  • warning repo-updater: less than 0 sync was started for 9h0m0s

Possible solutions:

  • Check repo-updater logs for errors.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_syncer_sync_start"
]

repo-updater: syncer_sync_duration

95th repositories sync duration (cloud)

Descriptions:

  • warning repo-updater: 30s+ 95th repositories sync duration for 5m0s

Possible solutions:

  • Check the network latency is reasonable (<50ms) between the Sourcegraph and the code host
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_syncer_sync_duration"
]

repo-updater: source_duration

95th repositories source duration (cloud)

Descriptions:

  • warning repo-updater: 30s+ 95th repositories source duration for 5m0s

Possible solutions:

  • Check the network latency is reasonable (<50ms) between the Sourcegraph and the code host
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_source_duration"
]

repo-updater: syncer_synced_repos

repositories synced (cloud)

Descriptions:

  • warning repo-updater: less than 0 repositories synced for 9h0m0s

Possible solutions:

  • Check network connectivity to code hosts
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_syncer_synced_repos"
]

repo-updater: sourced_repos

repositories sourced (cloud)

Descriptions:

  • warning repo-updater: less than 0 repositories sourced for 9h0m0s

Possible solutions:

  • Check network connectivity to code hosts
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_sourced_repos"
]

repo-updater: user_added_repos

total number of user added repos (cloud)

Descriptions:

  • critical repo-updater: 180000+ total number of user added repos for 5m0s

Possible solutions:

  • Check for unusual spikes in user added repos. Each user is only allowed to add 2000
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_user_added_repos"
]

repo-updater: purge_failed

repositories purge failed (cloud)

Descriptions:

  • warning repo-updater: 0+ repositories purge failed for 5m0s

Possible solutions:

  • Check repo-updater`s connectivity with gitserver and gitserver logs
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_purge_failed"
]

repo-updater: sched_auto_fetch

repositories scheduled due to hitting a deadline (cloud)

Descriptions:

  • warning repo-updater: less than 0 repositories scheduled due to hitting a deadline for 9h0m0s

Possible solutions:

  • Check repo-updater logs. This is expected to fire if there are no user added code hosts
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_sched_auto_fetch"
]

repo-updater: sched_known_repos

repositories managed by the scheduler (cloud)

Descriptions:

  • warning repo-updater: less than 0 repositories managed by the scheduler for 10m0s

Possible solutions:

  • Check repo-updater logs. This is expected to fire if there are no user added code hosts
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_sched_known_repos"
]

repo-updater: sched_update_queue_length

rate of growth of update queue length over 5 minutes (cloud)

Descriptions:

  • critical repo-updater: 0+ rate of growth of update queue length over 5 minutes for 2h0m0s

Possible solutions:

  • Check repo-updater logs for indications that the queue is not being processed. The queue length should trend downwards over time as items are sent to GitServer
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_sched_update_queue_length"
]

repo-updater: sched_loops

scheduler loops (cloud)

Descriptions:

  • warning repo-updater: less than 0 scheduler loops for 9h0m0s

Possible solutions:

  • Check repo-updater logs for errors. This is expected to fire if there are no user added code hosts
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_sched_loops"
]

repo-updater: sched_error

repositories schedule error rate (cloud)

Descriptions:

  • critical repo-updater: 1+ repositories schedule error rate for 1m0s

Possible solutions:

  • Check repo-updater logs for errors
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_sched_error"
]

repo-updater: perms_syncer_perms

time gap between least and most up to date permissions (cloud)

Descriptions:

  • warning repo-updater: 259200s+ time gap between least and most up to date permissions for 5m0s

Possible solutions:

  • Increase the API rate limit to GitHub, GitLab or Bitbucket Server.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_perms_syncer_perms"
]

repo-updater: perms_syncer_stale_perms

number of entities with stale permissions (cloud)

Descriptions:

  • warning repo-updater: 100+ number of entities with stale permissions for 5m0s

Possible solutions:

  • Increase the API rate limit to GitHub, GitLab or Bitbucket Server.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_perms_syncer_stale_perms"
]

repo-updater: perms_syncer_no_perms

number of entities with no permissions (cloud)

Descriptions:

  • warning repo-updater: 100+ number of entities with no permissions for 5m0s

Possible solutions:

  • Enabled permissions for the first time: Wait for few minutes and see if the number goes down.
  • Otherwise: Increase the API rate limit to GitHub, GitLab or Bitbucket Server.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_perms_syncer_no_perms"
]

repo-updater: perms_syncer_sync_duration

95th permissions sync duration (cloud)

Descriptions:

  • warning repo-updater: 30s+ 95th permissions sync duration for 5m0s

Possible solutions:

  • Check the network latency is reasonable (<50ms) between the Sourcegraph and the code host.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_perms_syncer_sync_duration"
]

repo-updater: perms_syncer_queue_size

permissions sync queued items (cloud)

Descriptions:

  • warning repo-updater: 100+ permissions sync queued items for 5m0s

Possible solutions:

  • Enabled permissions for the first time: Wait for few minutes and see if the number goes down.
  • Otherwise: Increase the API rate limit to GitHub, GitLab or Bitbucket Server.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_perms_syncer_queue_size"
]

repo-updater: perms_syncer_sync_errors

permissions sync error rate (cloud)

Descriptions:

  • critical repo-updater: 1+ permissions sync error rate for 1m0s

Possible solutions:

  • Check the network connectivity the Sourcegraph and the code host.
  • Check if API rate limit quota is exhausted on the code host.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_perms_syncer_sync_errors"
]

repo-updater: src_repoupdater_external_services_total

the total number of external services (cloud)

Descriptions:

  • critical repo-updater: 20000+ the total number of external services for 1h0m0s

Possible solutions:

  • Check for spikes in external services, could be abuse
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_src_repoupdater_external_services_total"
]

repo-updater: src_repoupdater_user_external_services_total

the total number of user added external services (cloud)

Descriptions:

  • warning repo-updater: 20000+ the total number of user added external services for 1h0m0s

Possible solutions:

  • Check for spikes in external services, could be abuse
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_src_repoupdater_user_external_services_total"
]

repo-updater: repoupdater_queued_sync_jobs_total

the total number of queued sync jobs (cloud)

Descriptions:

  • warning repo-updater: 100+ the total number of queued sync jobs for 1h0m0s

Possible solutions:

  • Check if jobs are failing to sync: "SELECT * FROM external_service_sync_jobs WHERE state = errored";
  • Increase the number of workers using the repoConcurrentExternalServiceSyncers site config.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_repoupdater_queued_sync_jobs_total"
]

repo-updater: repoupdater_completed_sync_jobs_total

the total number of completed sync jobs (cloud)

Descriptions:

  • warning repo-updater: 100000+ the total number of completed sync jobs for 1h0m0s

Possible solutions:

  • Check repo-updater logs. Jobs older than 1 day should have been removed.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_repoupdater_completed_sync_jobs_total"
]

repo-updater: repoupdater_errored_sync_jobs_total

the total number of errored sync jobs (cloud)

Descriptions:

  • warning repo-updater: 100+ the total number of errored sync jobs for 1h0m0s

Possible solutions:

  • Check repo-updater logs. Check code host connectivity
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_repoupdater_errored_sync_jobs_total"
]

repo-updater: github_graphql_rate_limit_remaining

remaining calls to GitHub graphql API before hitting the rate limit (cloud)

Descriptions:

  • critical repo-updater: less than 250 remaining calls to GitHub graphql API before hitting the rate limit

Possible solutions:

  • Try restarting the pod to get a different public IP.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_github_graphql_rate_limit_remaining"
]

repo-updater: github_rest_rate_limit_remaining

remaining calls to GitHub rest API before hitting the rate limit (cloud)

Descriptions:

  • critical repo-updater: less than 250 remaining calls to GitHub rest API before hitting the rate limit

Possible solutions:

  • Try restarting the pod to get a different public IP.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_github_rest_rate_limit_remaining"
]

repo-updater: github_search_rate_limit_remaining

remaining calls to GitHub search API before hitting the rate limit (cloud)

Descriptions:

  • critical repo-updater: less than 5 remaining calls to GitHub search API before hitting the rate limit

Possible solutions:

  • Try restarting the pod to get a different public IP.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_github_search_rate_limit_remaining"
]

repo-updater: gitlab_rest_rate_limit_remaining

remaining calls to GitLab rest API before hitting the rate limit (cloud)

Descriptions:

  • critical repo-updater: less than 30 remaining calls to GitLab rest API before hitting the rate limit

Possible solutions:

  • Try restarting the pod to get a different public IP.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_gitlab_rest_rate_limit_remaining"
]

repo-updater: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (cloud)

Descriptions:

  • warning repo-updater: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the repo-updater container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_container_cpu_usage"
]

repo-updater: container_memory_usage

container memory usage by instance (cloud)

Descriptions:

  • critical repo-updater: 90%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of repo-updater container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_container_memory_usage"
]

repo-updater: container_restarts

container restarts (cloud)

Descriptions:

  • warning repo-updater: 1+ container restarts
  • critical repo-updater: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod repo-updater (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p repo-updater.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' repo-updater (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the repo-updater container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs repo-updater (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_container_restarts",
  "critical_repo-updater_container_restarts"
]

repo-updater: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (cloud)

Descriptions:

  • warning repo-updater: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the repo-updater service.
  • Docker Compose: Consider increasing cpus: of the repo-updater container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_provisioning_container_cpu_usage_long_term"
]

repo-updater: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (cloud)

Descriptions:

  • warning repo-updater: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the repo-updater service.
  • Docker Compose: Consider increasing memory: of the repo-updater container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_provisioning_container_memory_usage_long_term"
]

repo-updater: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (cloud)

Descriptions:

  • warning repo-updater: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the repo-updater container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_provisioning_container_cpu_usage_short_term"
]

repo-updater: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (cloud)

Descriptions:

  • warning repo-updater: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of repo-updater container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_provisioning_container_memory_usage_short_term"
]

repo-updater: go_goroutines

maximum active goroutines (cloud)

Descriptions:

  • warning repo-updater: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_go_goroutines"
]

repo-updater: go_gc_duration_seconds

maximum go garbage collection duration (cloud)

Descriptions:

  • warning repo-updater: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_go_gc_duration_seconds"
]

repo-updater: pods_available_percentage

percentage pods available (cloud)

Descriptions:

  • critical repo-updater: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_pods_available_percentage"
]

searcher: unindexed_search_request_errors

unindexed search request errors every 5m by code (search)

Descriptions:

  • warning searcher: 5%+ unindexed search request errors every 5m by code for 5m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_unindexed_search_request_errors"
]

searcher: replica_traffic

requests per second over 10m (search)

Descriptions:

  • warning searcher: 5+ requests per second over 10m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_replica_traffic"
]

searcher: frontend_internal_api_error_responses

frontend-internal API error responses every 5m by route (search)

Descriptions:

  • warning searcher: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs searcher for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs searcher for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_frontend_internal_api_error_responses"
]

searcher: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (search)

Descriptions:

  • warning searcher: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the searcher container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_container_cpu_usage"
]

searcher: container_memory_usage

container memory usage by instance (search)

Descriptions:

  • warning searcher: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of searcher container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_container_memory_usage"
]

searcher: container_restarts

container restarts (search)

Descriptions:

  • warning searcher: 1+ container restarts
  • critical searcher: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod searcher (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p searcher.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' searcher (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the searcher container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs searcher (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_container_restarts",
  "critical_searcher_container_restarts"
]

searcher: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (search)

Descriptions:

  • warning searcher: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the searcher service.
  • Docker Compose: Consider increasing cpus: of the searcher container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_provisioning_container_cpu_usage_long_term"
]

searcher: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (search)

Descriptions:

  • warning searcher: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the searcher service.
  • Docker Compose: Consider increasing memory: of the searcher container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_provisioning_container_memory_usage_long_term"
]

searcher: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (search)

Descriptions:

  • warning searcher: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the searcher container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_provisioning_container_cpu_usage_short_term"
]

searcher: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (search)

Descriptions:

  • warning searcher: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of searcher container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_provisioning_container_memory_usage_short_term"
]

searcher: go_goroutines

maximum active goroutines (search)

Descriptions:

  • warning searcher: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_go_goroutines"
]

searcher: go_gc_duration_seconds

maximum go garbage collection duration (search)

Descriptions:

  • warning searcher: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_go_gc_duration_seconds"
]

searcher: pods_available_percentage

percentage pods available (search)

Descriptions:

  • critical searcher: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_searcher_pods_available_percentage"
]

symbols: store_fetch_failures

store fetch failures every 5m (code-intel)

Descriptions:

  • warning symbols: 5+ store fetch failures every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_store_fetch_failures"
]

symbols: current_fetch_queue_size

current fetch queue size (code-intel)

Descriptions:

  • warning symbols: 25+ current fetch queue size

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_current_fetch_queue_size"
]

symbols: frontend_internal_api_error_responses

frontend-internal API error responses every 5m by route (code-intel)

Descriptions:

  • warning symbols: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs symbols for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs symbols for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_frontend_internal_api_error_responses"
]

symbols: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (code-intel)

Descriptions:

  • warning symbols: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the symbols container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_container_cpu_usage"
]

symbols: container_memory_usage

container memory usage by instance (code-intel)

Descriptions:

  • warning symbols: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of symbols container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_container_memory_usage"
]

symbols: container_restarts

container restarts (code-intel)

Descriptions:

  • warning symbols: 1+ container restarts
  • critical symbols: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod symbols (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p symbols.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' symbols (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the symbols container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs symbols (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_container_restarts",
  "critical_symbols_container_restarts"
]

symbols: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (code-intel)

Descriptions:

  • warning symbols: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the symbols service.
  • Docker Compose: Consider increasing cpus: of the symbols container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_provisioning_container_cpu_usage_long_term"
]

symbols: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (code-intel)

Descriptions:

  • warning symbols: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the symbols service.
  • Docker Compose: Consider increasing memory: of the symbols container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_provisioning_container_memory_usage_long_term"
]

symbols: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (code-intel)

Descriptions:

  • warning symbols: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the symbols container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_provisioning_container_cpu_usage_short_term"
]

symbols: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (code-intel)

Descriptions:

  • warning symbols: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of symbols container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_provisioning_container_memory_usage_short_term"
]

symbols: go_goroutines

maximum active goroutines (code-intel)

Descriptions:

  • warning symbols: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_go_goroutines"
]

symbols: go_gc_duration_seconds

maximum go garbage collection duration (code-intel)

Descriptions:

  • warning symbols: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_go_gc_duration_seconds"
]

symbols: pods_available_percentage

percentage pods available (code-intel)

Descriptions:

  • critical symbols: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_symbols_pods_available_percentage"
]

syntect-server: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (cloud)

Descriptions:

  • warning syntect-server: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the syntect-server container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_container_cpu_usage"
]

syntect-server: container_memory_usage

container memory usage by instance (cloud)

Descriptions:

  • warning syntect-server: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of syntect-server container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_container_memory_usage"
]

syntect-server: container_restarts

container restarts (cloud)

Descriptions:

  • warning syntect-server: 1+ container restarts
  • critical syntect-server: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod syntect-server (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p syntect-server.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' syntect-server (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the syntect-server container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs syntect-server (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_container_restarts",
  "critical_syntect-server_container_restarts"
]

syntect-server: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (cloud)

Descriptions:

  • warning syntect-server: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the syntect-server service.
  • Docker Compose: Consider increasing cpus: of the syntect-server container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_provisioning_container_cpu_usage_long_term"
]

syntect-server: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (cloud)

Descriptions:

  • warning syntect-server: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the syntect-server service.
  • Docker Compose: Consider increasing memory: of the syntect-server container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_provisioning_container_memory_usage_long_term"
]

syntect-server: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (cloud)

Descriptions:

  • warning syntect-server: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the syntect-server container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_provisioning_container_cpu_usage_short_term"
]

syntect-server: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (cloud)

Descriptions:

  • warning syntect-server: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of syntect-server container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_provisioning_container_memory_usage_short_term"
]

syntect-server: pods_available_percentage

percentage pods available (cloud)

Descriptions:

  • critical syntect-server: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_syntect-server_pods_available_percentage"
]

zoekt-indexserver: average_resolve_revision_duration

average resolve revision duration over 5m (search)

Descriptions:

  • warning zoekt-indexserver: 15s+ average resolve revision duration over 5m
  • critical zoekt-indexserver: 30s+ average resolve revision duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_average_resolve_revision_duration",
  "critical_zoekt-indexserver_average_resolve_revision_duration"
]

zoekt-indexserver: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (search)

Descriptions:

  • warning zoekt-indexserver: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the zoekt-indexserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_container_cpu_usage"
]

zoekt-indexserver: container_memory_usage

container memory usage by instance (search)

Descriptions:

  • warning zoekt-indexserver: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of zoekt-indexserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_container_memory_usage"
]

zoekt-indexserver: container_restarts

container restarts (search)

Descriptions:

  • warning zoekt-indexserver: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod zoekt-indexserver (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p zoekt-indexserver.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' zoekt-indexserver (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the zoekt-indexserver container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs zoekt-indexserver (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_container_restarts"
]

zoekt-indexserver: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (search)

Descriptions:

  • warning zoekt-indexserver: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the zoekt-indexserver service.
  • Docker Compose: Consider increasing cpus: of the zoekt-indexserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_provisioning_container_cpu_usage_long_term"
]

zoekt-indexserver: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (search)

Descriptions:

  • warning zoekt-indexserver: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the zoekt-indexserver service.
  • Docker Compose: Consider increasing memory: of the zoekt-indexserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_provisioning_container_memory_usage_long_term"
]

zoekt-indexserver: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (search)

Descriptions:

  • warning zoekt-indexserver: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the zoekt-indexserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_provisioning_container_cpu_usage_short_term"
]

zoekt-indexserver: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (search)

Descriptions:

  • warning zoekt-indexserver: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of zoekt-indexserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_provisioning_container_memory_usage_short_term"
]

zoekt-indexserver: pods_available_percentage

percentage pods available (search)

Descriptions:

  • critical zoekt-indexserver: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_zoekt-indexserver_pods_available_percentage"
]

zoekt-webserver: indexed_search_request_errors

indexed search request errors every 5m by code (search)

Descriptions:

  • warning zoekt-webserver: 5%+ indexed search request errors every 5m by code for 5m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_indexed_search_request_errors"
]

zoekt-webserver: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (search)

Descriptions:

  • warning zoekt-webserver: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the zoekt-webserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_container_cpu_usage"
]

zoekt-webserver: container_memory_usage

container memory usage by instance (search)

Descriptions:

  • warning zoekt-webserver: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of zoekt-webserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_container_memory_usage"
]

zoekt-webserver: container_restarts

container restarts (search)

Descriptions:

  • warning zoekt-webserver: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod zoekt-webserver (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p zoekt-webserver.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' zoekt-webserver (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the zoekt-webserver container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs zoekt-webserver (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_container_restarts"
]

zoekt-webserver: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (search)

Descriptions:

  • warning zoekt-webserver: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the zoekt-webserver service.
  • Docker Compose: Consider increasing cpus: of the zoekt-webserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_provisioning_container_cpu_usage_long_term"
]

zoekt-webserver: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (search)

Descriptions:

  • warning zoekt-webserver: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the zoekt-webserver service.
  • Docker Compose: Consider increasing memory: of the zoekt-webserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_provisioning_container_memory_usage_long_term"
]

zoekt-webserver: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (search)

Descriptions:

  • warning zoekt-webserver: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the zoekt-webserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_provisioning_container_cpu_usage_short_term"
]

zoekt-webserver: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (search)

Descriptions:

  • warning zoekt-webserver: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of zoekt-webserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_provisioning_container_memory_usage_short_term"
]

prometheus: prometheus_rule_group_evaluation

average prometheus rule group evaluation duration over 10m (distribution)

Descriptions:

  • warning prometheus: 30s+ average prometheus rule group evaluation duration over 10m

Possible solutions:

  • Try increasing resources for Prometheus.
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_prometheus_rule_group_evaluation"
]

prometheus: alertmanager_notifications_failed_total

failed alertmanager notifications over 1m (distribution)

Descriptions:

  • warning prometheus: 0+ failed alertmanager notifications over 1m

Possible solutions:

  • Ensure that your observability.alerts configuration (in site configuration) is valid.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_alertmanager_notifications_failed_total"
]

prometheus: alertmanager_config_status

alertmanager configuration reload status (distribution)

Descriptions:

  • warning prometheus: less than 1 alertmanager configuration reload status

Possible solutions:

  • Ensure that your observability.alerts configuration (in site configuration) is valid.
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_alertmanager_config_status"
]

prometheus: prometheus_tsdb_op_failure

prometheus tsdb failures by operation over 1m (distribution)

Descriptions:

  • warning prometheus: 0+ prometheus tsdb failures by operation over 1m

Possible solutions:

  • Check Prometheus logs for messages related to the failing operation.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_prometheus_tsdb_op_failure"
]

prometheus: prometheus_config_status

prometheus configuration reload status (distribution)

Descriptions:

  • warning prometheus: less than 1 prometheus configuration reload status

Possible solutions:

  • Check Prometheus logs for messages related to configuration loading.
  • Ensure any custom configuration you have provided Prometheus is valid.
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_prometheus_config_status"
]

prometheus: prometheus_target_sample_exceeded

prometheus scrapes that exceed the sample limit over 10m (distribution)

Descriptions:

  • warning prometheus: 0+ prometheus scrapes that exceed the sample limit over 10m

Possible solutions:

  • Check Prometheus logs for messages related to target scrape failures.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_prometheus_target_sample_exceeded"
]

prometheus: prometheus_target_sample_duplicate

prometheus scrapes rejected due to duplicate timestamps over 10m (distribution)

Descriptions:

  • warning prometheus: 0+ prometheus scrapes rejected due to duplicate timestamps over 10m

Possible solutions:

  • Check Prometheus logs for messages related to target scrape failures.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_prometheus_target_sample_duplicate"
]

prometheus: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (distribution)

Descriptions:

  • warning prometheus: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the prometheus container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_container_cpu_usage"
]

prometheus: container_memory_usage

container memory usage by instance (distribution)

Descriptions:

  • warning prometheus: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of prometheus container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_container_memory_usage"
]

prometheus: container_restarts

container restarts (distribution)

Descriptions:

  • warning prometheus: 1+ container restarts
  • critical prometheus: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod prometheus (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p prometheus.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' prometheus (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the prometheus container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs prometheus (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_container_restarts",
  "critical_prometheus_container_restarts"
]

prometheus: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (distribution)

Descriptions:

  • warning prometheus: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the prometheus service.
  • Docker Compose: Consider increasing cpus: of the prometheus container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_provisioning_container_cpu_usage_long_term"
]

prometheus: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (distribution)

Descriptions:

  • warning prometheus: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the prometheus service.
  • Docker Compose: Consider increasing memory: of the prometheus container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_provisioning_container_memory_usage_long_term"
]

prometheus: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (distribution)

Descriptions:

  • warning prometheus: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the prometheus container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_provisioning_container_cpu_usage_short_term"
]

prometheus: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (distribution)

Descriptions:

  • warning prometheus: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of prometheus container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_provisioning_container_memory_usage_short_term"
]

prometheus: pods_available_percentage

percentage pods available (distribution)

Descriptions:

  • critical prometheus: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_prometheus_pods_available_percentage"
]

executor-queue: codeintel_queue_size

queue size (code-intel)

Descriptions:

  • warning executor-queue: 100+ queue size

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_codeintel_queue_size"
]

executor-queue: codeintel_queue_growth_rate

queue growth rate over 30m (code-intel)

Descriptions:

  • warning executor-queue: 5+ queue growth rate over 30m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_codeintel_queue_growth_rate"
]

executor-queue: codeintel_job_errors

job errors every 5m (code-intel)

Descriptions:

  • warning executor-queue: 20+ job errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_codeintel_job_errors"
]

executor-queue: codeintel_workerstore_99th_percentile_duration

99th percentile successful worker store operation duration over 5m (code-intel)

Descriptions:

  • warning executor-queue: 20s+ 99th percentile successful worker store operation duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_codeintel_workerstore_99th_percentile_duration"
]

executor-queue: codeintel_workerstore_errors

worker store errors every 5m (code-intel)

Descriptions:

  • warning executor-queue: 20+ worker store errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_codeintel_workerstore_errors"
]

executor-queue: frontend_internal_api_error_responses

frontend-internal API error responses every 5m by route (code-intel)

Descriptions:

  • warning executor-queue: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs executor-queue for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs executor-queue for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_frontend_internal_api_error_responses"
]

executor-queue: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (code-intel)

Descriptions:

  • warning executor-queue: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the executor-queue container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_container_cpu_usage"
]

executor-queue: container_memory_usage

container memory usage by instance (code-intel)

Descriptions:

  • warning executor-queue: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of executor-queue container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_container_memory_usage"
]

executor-queue: container_restarts

container restarts (code-intel)

Descriptions:

  • warning executor-queue: 1+ container restarts
  • critical executor-queue: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod executor-queue (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p executor-queue.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' executor-queue (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the executor-queue container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs executor-queue (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_container_restarts",
  "critical_executor-queue_container_restarts"
]

executor-queue: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (code-intel)

Descriptions:

  • warning executor-queue: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the executor-queue service.
  • Docker Compose: Consider increasing cpus: of the executor-queue container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_provisioning_container_cpu_usage_long_term"
]

executor-queue: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (code-intel)

Descriptions:

  • warning executor-queue: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the executor-queue service.
  • Docker Compose: Consider increasing memory: of the executor-queue container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_provisioning_container_memory_usage_long_term"
]

executor-queue: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (code-intel)

Descriptions:

  • warning executor-queue: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the executor-queue container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_provisioning_container_cpu_usage_short_term"
]

executor-queue: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (code-intel)

Descriptions:

  • warning executor-queue: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of executor-queue container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_provisioning_container_memory_usage_short_term"
]

executor-queue: go_goroutines

maximum active goroutines (code-intel)

Descriptions:

  • warning executor-queue: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_go_goroutines"
]

executor-queue: go_gc_duration_seconds

maximum go garbage collection duration (code-intel)

Descriptions:

  • warning executor-queue: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_executor-queue_go_gc_duration_seconds"
]

executor-queue: pods_available_percentage

percentage pods available (code-intel)

Descriptions:

  • critical executor-queue: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_executor-queue_pods_available_percentage"
]

precise-code-intel-indexer: codeintel_job_errors

job errors every 5m (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 20+ job errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_codeintel_job_errors"
]

precise-code-intel-indexer: executor_apiclient_99th_percentile_duration

99th percentile successful API request duration over 5m (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 20s+ 99th percentile successful API request duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_executor_apiclient_99th_percentile_duration"
]

precise-code-intel-indexer: executor_apiclient_errors

aPI errors every 5m (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 20+ aPI errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_executor_apiclient_errors"
]

precise-code-intel-indexer: executor_setup_command_errors

setup command errors every 5m (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 20+ setup command errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_executor_setup_command_errors"
]

precise-code-intel-indexer: executor_exec_command_errors

exec command errors every 5m (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 20+ exec command errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_executor_exec_command_errors"
]

precise-code-intel-indexer: executor_teardown_command_errors

teardown command errors every 5m (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 20+ teardown command errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_executor_teardown_command_errors"
]

precise-code-intel-indexer: container_cpu_usage

container cpu usage total (1m average) across all cores by instance (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_container_cpu_usage"
]

precise-code-intel-indexer: container_memory_usage

container memory usage by instance (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_container_memory_usage"
]

precise-code-intel-indexer: container_restarts

container restarts (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 1+ container restarts
  • critical precise-code-intel-indexer: 1+ container restarts for 10m0s

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod precise-code-intel-worker (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p precise-code-intel-worker.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' precise-code-intel-worker (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the precise-code-intel-worker container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs precise-code-intel-worker (note this will include logs from the previous and currently running container).
  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_container_restarts",
  "critical_precise-code-intel-indexer_container_restarts"
]

precise-code-intel-indexer: provisioning_container_cpu_usage_long_term

container cpu usage total (90th percentile over 1d) across all cores by instance (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 80%+ container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the precise-code-intel-worker service.
  • Docker Compose: Consider increasing cpus: of the precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_provisioning_container_cpu_usage_long_term"
]

precise-code-intel-indexer: provisioning_container_memory_usage_long_term

container memory usage (1d maximum) by instance (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 80%+ container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the precise-code-intel-worker service.
  • Docker Compose: Consider increasing memory: of the precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_provisioning_container_memory_usage_long_term"
]

precise-code-intel-indexer: provisioning_container_cpu_usage_short_term

container cpu usage total (5m maximum) across all cores by instance (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_provisioning_container_cpu_usage_short_term"
]

precise-code-intel-indexer: provisioning_container_memory_usage_short_term

container memory usage (5m maximum) by instance (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_provisioning_container_memory_usage_short_term"
]

precise-code-intel-indexer: go_goroutines

maximum active goroutines (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Refer to the dashboards reference for more help interpreting this alert and metric.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_go_goroutines"
]

precise-code-intel-indexer: go_gc_duration_seconds

maximum go garbage collection duration (code-intel)

Descriptions:

  • warning precise-code-intel-indexer: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_go_gc_duration_seconds"
]

precise-code-intel-indexer: pods_available_percentage

percentage pods available (code-intel)

Descriptions:

  • critical precise-code-intel-indexer: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_precise-code-intel-indexer_pods_available_percentage"
]