Alert solutions

This document contains possible solutions for when you find alerts are firing in Sourcegraph's monitoring. If your alert isn't mentioned here, or if the solution doesn't help, contact us for assistance.

To learn more about Sourcegraph's alerting, see our alerting documentation.

frontend: 99th_percentile_search_request_duration

Descriptions:

  • frontend: 20s+ 99th percentile successful search request duration over 5m

Possible solutions:

  • Get details on the exact queries that are slow by configuring "observability.logSlowSearches": 20, in the site configuration and looking for frontend warning logs prefixed with slow search request for additional details.
  • Check that most repositories are indexed by visiting https://sourcegraph.example.com/site-admin/repositories?filter=needs-index (it should show few or no results.)
  • Kubernetes: Check CPU usage of zoekt-webserver in the indexed-search pod, consider increasing CPU limits in the indexed-search.Deployment.yaml if regularly hitting max CPU utilization.
  • Docker Compose: Check CPU usage on the Zoekt Web Server dashboard, consider increasing cpus: of the zoekt-webserver container in docker-compose.yml if regularly hitting max CPU utilization.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_99th_percentile_search_request_duration"
]

frontend: 90th_percentile_search_request_duration

Descriptions:

  • frontend: 15s+ 90th percentile successful search request duration over 5m

Possible solutions:

  • Get details on the exact queries that are slow by configuring "observability.logSlowSearches": 15, in the site configuration and looking for frontend warning logs prefixed with slow search request for additional details.
  • Check that most repositories are indexed by visiting https://sourcegraph.example.com/site-admin/repositories?filter=needs-index (it should show few or no results.)
  • Kubernetes: Check CPU usage of zoekt-webserver in the indexed-search pod, consider increasing CPU limits in the indexed-search.Deployment.yaml if regularly hitting max CPU utilization.
  • Docker Compose: Check CPU usage on the Zoekt Web Server dashboard, consider increasing cpus: of the zoekt-webserver container in docker-compose.yml if regularly hitting max CPU utilization.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_90th_percentile_search_request_duration"
]

frontend: hard_timeout_search_responses

Descriptions:

  • frontend: 2%+ hard timeout search responses every 5m for 15m0s

  • frontend: 5%+ hard timeout search responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_hard_timeout_search_responses",
  "critical_frontend_hard_timeout_search_responses"
]

frontend: hard_error_search_responses

Descriptions:

  • frontend: 2%+ hard error search responses every 5m for 15m0s

  • frontend: 5%+ hard error search responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_hard_error_search_responses",
  "critical_frontend_hard_error_search_responses"
]

frontend: partial_timeout_search_responses

Descriptions:

  • frontend: 5%+ partial timeout search responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_partial_timeout_search_responses"
]

frontend: search_alert_user_suggestions

Descriptions:

  • frontend: 5%+ search alert user suggestions shown every 5m for 15m0s

Possible solutions:

  • This indicates your user`s are making syntax errors or similar user errors.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_search_alert_user_suggestions"
]

frontend: page_load_latency

Descriptions:

  • frontend: 2s+ 90th percentile page load latency over all routes over 10m

Possible solutions:

  • Confirm that the Sourcegraph frontend has enough CPU/memory using the provisioning panels.
  • Trace a request to see what the slowest part is: https://docs.sourcegraph.com/admin/observability/tracing
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_frontend_page_load_latency"
]

frontend: blob_load_latency

Descriptions:

  • frontend: 5s+ 90th percentile blob load latency over 10m

Possible solutions:

  • Confirm that the Sourcegraph frontend has enough CPU/memory using the provisioning panels.
  • Trace a request to see what the slowest part is: https://docs.sourcegraph.com/admin/observability/tracing
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_frontend_blob_load_latency"
]

frontend: 99th_percentile_search_codeintel_request_duration

Descriptions:

  • frontend: 20s+ 99th percentile code-intel successful search request duration over 5m

Possible solutions:

  • Get details on the exact queries that are slow by configuring "observability.logSlowSearches": 20, in the site configuration and looking for frontend warning logs prefixed with slow search request for additional details.
  • Check that most repositories are indexed by visiting https://sourcegraph.example.com/site-admin/repositories?filter=needs-index (it should show few or no results.)
  • Kubernetes: Check CPU usage of zoekt-webserver in the indexed-search pod, consider increasing CPU limits in the indexed-search.Deployment.yaml if regularly hitting max CPU utilization.
  • Docker Compose: Check CPU usage on the Zoekt Web Server dashboard, consider increasing cpus: of the zoekt-webserver container in docker-compose.yml if regularly hitting max CPU utilization.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_99th_percentile_search_codeintel_request_duration"
]

frontend: 90th_percentile_search_codeintel_request_duration

Descriptions:

  • frontend: 15s+ 90th percentile code-intel successful search request duration over 5m

Possible solutions:

  • Get details on the exact queries that are slow by configuring "observability.logSlowSearches": 15, in the site configuration and looking for frontend warning logs prefixed with slow search request for additional details.
  • Check that most repositories are indexed by visiting https://sourcegraph.example.com/site-admin/repositories?filter=needs-index (it should show few or no results.)
  • Kubernetes: Check CPU usage of zoekt-webserver in the indexed-search pod, consider increasing CPU limits in the indexed-search.Deployment.yaml if regularly hitting max CPU utilization.
  • Docker Compose: Check CPU usage on the Zoekt Web Server dashboard, consider increasing cpus: of the zoekt-webserver container in docker-compose.yml if regularly hitting max CPU utilization.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_90th_percentile_search_codeintel_request_duration"
]

frontend: hard_timeout_search_codeintel_responses

Descriptions:

  • frontend: 2%+ hard timeout search code-intel responses every 5m for 15m0s

  • frontend: 5%+ hard timeout search code-intel responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_hard_timeout_search_codeintel_responses",
  "critical_frontend_hard_timeout_search_codeintel_responses"
]

frontend: hard_error_search_codeintel_responses

Descriptions:

  • frontend: 2%+ hard error search code-intel responses every 5m for 15m0s

  • frontend: 5%+ hard error search code-intel responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_hard_error_search_codeintel_responses",
  "critical_frontend_hard_error_search_codeintel_responses"
]

frontend: partial_timeout_search_codeintel_responses

Descriptions:

  • frontend: 5%+ partial timeout search code-intel responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_partial_timeout_search_codeintel_responses"
]

frontend: search_codeintel_alert_user_suggestions

Descriptions:

  • frontend: 5%+ search code-intel alert user suggestions shown every 5m for 15m0s

Possible solutions:

  • This indicates a bug in Sourcegraph, please open an issue.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_search_codeintel_alert_user_suggestions"
]

frontend: 99th_percentile_search_api_request_duration

Descriptions:

  • frontend: 50s+ 99th percentile successful search API request duration over 5m

Possible solutions:

  • Get details on the exact queries that are slow by configuring "observability.logSlowSearches": 20, in the site configuration and looking for frontend warning logs prefixed with slow search request for additional details.
  • If your users are requesting many results with a large count: parameter, consider using our search pagination API.
  • Check that most repositories are indexed by visiting https://sourcegraph.example.com/site-admin/repositories?filter=needs-index (it should show few or no results.)
  • Kubernetes: Check CPU usage of zoekt-webserver in the indexed-search pod, consider increasing CPU limits in the indexed-search.Deployment.yaml if regularly hitting max CPU utilization.
  • Docker Compose: Check CPU usage on the Zoekt Web Server dashboard, consider increasing cpus: of the zoekt-webserver container in docker-compose.yml if regularly hitting max CPU utilization.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_99th_percentile_search_api_request_duration"
]

frontend: 90th_percentile_search_api_request_duration

Descriptions:

  • frontend: 40s+ 90th percentile successful search API request duration over 5m

Possible solutions:

  • Get details on the exact queries that are slow by configuring "observability.logSlowSearches": 15, in the site configuration and looking for frontend warning logs prefixed with slow search request for additional details.
  • If your users are requesting many results with a large count: parameter, consider using our search pagination API.
  • Check that most repositories are indexed by visiting https://sourcegraph.example.com/site-admin/repositories?filter=needs-index (it should show few or no results.)
  • Kubernetes: Check CPU usage of zoekt-webserver in the indexed-search pod, consider increasing CPU limits in the indexed-search.Deployment.yaml if regularly hitting max CPU utilization.
  • Docker Compose: Check CPU usage on the Zoekt Web Server dashboard, consider increasing cpus: of the zoekt-webserver container in docker-compose.yml if regularly hitting max CPU utilization.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_90th_percentile_search_api_request_duration"
]

frontend: hard_timeout_search_api_responses

Descriptions:

  • frontend: 2%+ hard timeout search API responses every 5m for 15m0s

  • frontend: 5%+ hard timeout search API responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_hard_timeout_search_api_responses",
  "critical_frontend_hard_timeout_search_api_responses"
]

frontend: hard_error_search_api_responses

Descriptions:

  • frontend: 2%+ hard error search API responses every 5m for 15m0s

  • frontend: 5%+ hard error search API responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_hard_error_search_api_responses",
  "critical_frontend_hard_error_search_api_responses"
]

frontend: partial_timeout_search_api_responses

Descriptions:

  • frontend: 5%+ partial timeout search API responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_partial_timeout_search_api_responses"
]

frontend: search_api_alert_user_suggestions

Descriptions:

  • frontend: 5%+ search API alert user suggestions shown every 5m

Possible solutions:

  • This indicates your user`s search API requests have syntax errors or a similar user error. Check the responses the API sends back for an explanation.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_search_api_alert_user_suggestions"
]

frontend: 99th_percentile_precise_code_intel_api_duration

Descriptions:

  • frontend: 20s+ 99th percentile successful precise code intel api query duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_99th_percentile_precise_code_intel_api_duration"
]

frontend: precise_code_intel_api_errors

Descriptions:

  • frontend: 5%+ precise code intel api errors every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_precise_code_intel_api_errors"
]

frontend: 99th_percentile_precise_code_intel_store_duration

Descriptions:

  • frontend: 20s+ 99th percentile successful precise code intel database query duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_99th_percentile_precise_code_intel_store_duration"
]

frontend: precise_code_intel_store_errors

Descriptions:

  • frontend: 5%+ precise code intel database errors every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_precise_code_intel_store_errors"
]

frontend: internal_indexed_search_error_responses

Descriptions:

  • frontend: 5%+ internal indexed search error responses every 5m for 15m0s

Possible solutions:

  • Check the Zoekt Web Server dashboard for indications it might be unhealthy.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_internal_indexed_search_error_responses"
]

frontend: internal_unindexed_search_error_responses

Descriptions:

  • frontend: 5%+ internal unindexed search error responses every 5m for 15m0s

Possible solutions:

  • Check the Searcher dashboard for indications it might be unhealthy.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_internal_unindexed_search_error_responses"
]

frontend: internal_api_error_responses

Descriptions:

  • frontend: 5%+ internal API error responses every 5m by route for 15m0s

Possible solutions:

  • May not be a substantial issue, check the frontend logs for potential causes.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_internal_api_error_responses"
]

frontend: 99th_percentile_precise_code_intel_bundle_manager_query_duration

Descriptions:

  • frontend: 20s+ 99th percentile successful precise-code-intel-bundle-manager query duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_99th_percentile_precise_code_intel_bundle_manager_query_duration"
]

frontend: 99th_percentile_precise_code_intel_bundle_manager_transfer_duration

Descriptions:

  • frontend: 300s+ 99th percentile successful precise-code-intel-bundle-manager data transfer duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_99th_percentile_precise_code_intel_bundle_manager_transfer_duration"
]

frontend: precise_code_intel_bundle_manager_error_responses

Descriptions:

  • frontend: 5%+ precise-code-intel-bundle-manager error responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_precise_code_intel_bundle_manager_error_responses"
]

frontend: 99th_percentile_gitserver_duration

Descriptions:

  • frontend: 20s+ 99th percentile successful gitserver query duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_99th_percentile_gitserver_duration"
]

frontend: gitserver_error_responses

Descriptions:

  • frontend: 5%+ gitserver error responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_gitserver_error_responses"
]

frontend: observability_test_alert_warning

Descriptions:

  • frontend: 1+ warning test alert metric

Possible solutions:

  • This alert is triggered via the triggerObservabilityTestAlert GraphQL endpoint, and will automatically resolve itself.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_observability_test_alert_warning"
]

frontend: observability_test_alert_critical

Descriptions:

  • frontend: 1+ critical test alert metric

Possible solutions:

  • This alert is triggered via the triggerObservabilityTestAlert GraphQL endpoint, and will automatically resolve itself.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_frontend_observability_test_alert_critical"
]

frontend: container_cpu_usage

Descriptions:

  • frontend: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the frontend container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_container_cpu_usage"
]

frontend: container_memory_usage

Descriptions:

  • frontend: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of frontend container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_container_memory_usage"
]

frontend: container_restarts

Descriptions:

  • frontend: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod frontend (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p frontend.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' frontend (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the frontend container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs frontend (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_container_restarts"
]

frontend: fs_inodes_used

Descriptions:

  • frontend: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_fs_inodes_used"
]

frontend: provisioning_container_cpu_usage_long_term

Descriptions:

  • frontend: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the frontend service.
    • Docker Compose: Consider increasing cpus: of the frontend container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_provisioning_container_cpu_usage_long_term"
]

frontend: provisioning_container_memory_usage_long_term

Descriptions:

  • frontend: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the frontend service.
    • Docker Compose: Consider increasing memory: of the frontend container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_provisioning_container_memory_usage_long_term"
]

frontend: provisioning_container_cpu_usage_short_term

Descriptions:

  • frontend: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the frontend container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_provisioning_container_cpu_usage_short_term"
]

frontend: provisioning_container_memory_usage_short_term

Descriptions:

  • frontend: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of frontend container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_provisioning_container_memory_usage_short_term"
]

frontend: go_goroutines

Descriptions:

  • frontend: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_go_goroutines"
]

frontend: go_gc_duration_seconds

Descriptions:

  • frontend: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_frontend_go_gc_duration_seconds"
]

frontend: pods_available_percentage

Descriptions:

  • frontend: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_frontend_pods_available_percentage"
]

gitserver: disk_space_remaining

Descriptions:

  • gitserver: less than 25% disk space remaining by instance

  • gitserver: less than 15% disk space remaining by instance

Possible solutions:

  • Provision more disk space: Sourcegraph will begin deleting least-used repository clones at 10% disk space remaining which may result in decreased performance, users having to wait for repositories to clone, etc.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_disk_space_remaining",
  "critical_gitserver_disk_space_remaining"
]

gitserver: running_git_commands

Descriptions:

  • gitserver: 50+ running git commands (signals load) for 2m0s

  • gitserver: 100+ running git commands (signals load) for 5m0s

Possible solutions:

  • Check if the problem may be an intermittent and temporary peak using the "Container monitoring" section at the bottom of the Git Server dashboard.
  • Single container deployments: Consider upgrading to a Docker Compose deployment which offers better scalability and resource isolation.
  • Kubernetes and Docker Compose: Check that you are running a similar number of git server replicas and that their CPU/memory limits are allocated according to what is shown in the Sourcegraph resource estimator.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_running_git_commands",
  "critical_gitserver_running_git_commands"
]

gitserver: repository_clone_queue_size

Descriptions:

  • gitserver: 25+ repository clone queue size

Possible solutions:

  • If you just added several repositories, the warning may be expected.
  • Check which repositories need cloning, by visiting e.g. https://sourcegraph.example.com/site-admin/repositories?filter=not-cloned
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_repository_clone_queue_size"
]

gitserver: repository_existence_check_queue_size

Descriptions:

  • gitserver: 25+ repository existence check queue size

Possible solutions:

  • Check the code host status indicator for errors: on the Sourcegraph app homepage, when signed in as an admin click the cloud icon in the top right corner of the page.
  • Check if the issue continues to happen after 30 minutes, it may be temporary.
  • Check the gitserver logs for more information.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_repository_existence_check_queue_size"
]

gitserver: echo_command_duration_test

Descriptions:

  • gitserver: 1s+ echo command duration test

  • gitserver: 2s+ echo command duration test

Possible solutions:

  • Check if the problem may be an intermittent and temporary peak using the "Container monitoring" section at the bottom of the Git Server dashboard.
  • Single container deployments: Consider upgrading to a Docker Compose deployment which offers better scalability and resource isolation.
  • Kubernetes and Docker Compose: Check that you are running a similar number of git server replicas and that their CPU/memory limits are allocated according to what is shown in the Sourcegraph resource estimator.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_echo_command_duration_test",
  "critical_gitserver_echo_command_duration_test"
]

gitserver: frontend_internal_api_error_responses

Descriptions:

  • gitserver: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs gitserver for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs gitserver for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_frontend_internal_api_error_responses"
]

gitserver: container_cpu_usage

Descriptions:

  • gitserver: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the gitserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_container_cpu_usage"
]

gitserver: container_memory_usage

Descriptions:

  • gitserver: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of gitserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_container_memory_usage"
]

gitserver: container_restarts

Descriptions:

  • gitserver: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod gitserver (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p gitserver.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' gitserver (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the gitserver container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs gitserver (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_container_restarts"
]

gitserver: fs_inodes_used

Descriptions:

  • gitserver: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_fs_inodes_used"
]

gitserver: fs_io_operations

Descriptions:

  • gitserver: 5000+ filesystem reads and writes rate by instance over 1h

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_fs_io_operations"
]

gitserver: provisioning_container_cpu_usage_long_term

Descriptions:

  • gitserver: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the gitserver service.
    • Docker Compose: Consider increasing cpus: of the gitserver container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_provisioning_container_cpu_usage_long_term"
]

gitserver: provisioning_container_memory_usage_long_term

Descriptions:

  • gitserver: less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the gitserver service.
    • Docker Compose: Consider increasing memory: of the gitserver container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_provisioning_container_memory_usage_long_term"
]

gitserver: provisioning_container_cpu_usage_short_term

Descriptions:

  • gitserver: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the gitserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_provisioning_container_cpu_usage_short_term"
]

gitserver: go_goroutines

Descriptions:

  • gitserver: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_go_goroutines"
]

gitserver: go_gc_duration_seconds

Descriptions:

  • gitserver: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_gitserver_go_gc_duration_seconds"
]

gitserver: pods_available_percentage

Descriptions:

  • gitserver: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_gitserver_pods_available_percentage"
]

github-proxy: github_core_rate_limit_remaining

Descriptions:

  • github-proxy: less than 500 remaining calls to GitHub before hitting the rate limit for 5m0s

Possible solutions:

  • Try restarting the pod to get a different public IP.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_github-proxy_github_core_rate_limit_remaining"
]

github-proxy: github_search_rate_limit_remaining

Descriptions:

  • github-proxy: less than 5 remaining calls to GitHub search before hitting the rate limit

Possible solutions:

  • Try restarting the pod to get a different public IP.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_github_search_rate_limit_remaining"
]

github-proxy: container_cpu_usage

Descriptions:

  • github-proxy: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the github-proxy container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_container_cpu_usage"
]

github-proxy: container_memory_usage

Descriptions:

  • github-proxy: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of github-proxy container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_container_memory_usage"
]

github-proxy: container_restarts

Descriptions:

  • github-proxy: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod github-proxy (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p github-proxy.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' github-proxy (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the github-proxy container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs github-proxy (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_container_restarts"
]

github-proxy: fs_inodes_used

Descriptions:

  • github-proxy: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_fs_inodes_used"
]

github-proxy: provisioning_container_cpu_usage_long_term

Descriptions:

  • github-proxy: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the github-proxy service.
    • Docker Compose: Consider increasing cpus: of the github-proxy container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_provisioning_container_cpu_usage_long_term"
]

github-proxy: provisioning_container_memory_usage_long_term

Descriptions:

  • github-proxy: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the github-proxy service.
    • Docker Compose: Consider increasing memory: of the github-proxy container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_provisioning_container_memory_usage_long_term"
]

github-proxy: provisioning_container_cpu_usage_short_term

Descriptions:

  • github-proxy: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the github-proxy container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_provisioning_container_cpu_usage_short_term"
]

github-proxy: provisioning_container_memory_usage_short_term

Descriptions:

  • github-proxy: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of github-proxy container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_provisioning_container_memory_usage_short_term"
]

github-proxy: go_goroutines

Descriptions:

  • github-proxy: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_go_goroutines"
]

github-proxy: go_gc_duration_seconds

Descriptions:

  • github-proxy: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_github-proxy_go_gc_duration_seconds"
]

github-proxy: pods_available_percentage

Descriptions:

  • github-proxy: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_github-proxy_pods_available_percentage"
]

precise-code-intel-bundle-manager: 99th_percentile_bundle_database_duration

Descriptions:

  • precise-code-intel-bundle-manager: 20s+ 99th percentile successful bundle database query duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_99th_percentile_bundle_database_duration"
]

precise-code-intel-bundle-manager: bundle_database_errors

Descriptions:

  • precise-code-intel-bundle-manager: 20+ bundle database errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_bundle_database_errors"
]

precise-code-intel-bundle-manager: 99th_percentile_bundle_reader_duration

Descriptions:

  • precise-code-intel-bundle-manager: 20s+ 99th percentile successful bundle reader query duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_99th_percentile_bundle_reader_duration"
]

precise-code-intel-bundle-manager: bundle_reader_errors

Descriptions:

  • precise-code-intel-bundle-manager: 20+ bundle reader errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_bundle_reader_errors"
]

precise-code-intel-bundle-manager: disk_space_remaining

Descriptions:

  • precise-code-intel-bundle-manager: less than 25% disk space remaining by instance

  • precise-code-intel-bundle-manager: less than 15% disk space remaining by instance

Possible solutions:

  • Provision more disk space: Sourcegraph will begin deleting the oldest uploaded bundle files at 10% disk space remaining.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_disk_space_remaining",
  "critical_precise-code-intel-bundle-manager_disk_space_remaining"
]

precise-code-intel-bundle-manager: janitor_errors

Descriptions:

  • precise-code-intel-bundle-manager: 20+ janitor errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_janitor_errors"
]

precise-code-intel-bundle-manager: janitor_old_uploads_removed

Descriptions:

  • precise-code-intel-bundle-manager: 20+ upload files removed (due to age) every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_janitor_old_uploads_removed"
]

precise-code-intel-bundle-manager: janitor_old_parts_removed

Descriptions:

  • precise-code-intel-bundle-manager: 20+ upload and database part files removed (due to age) every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_janitor_old_parts_removed"
]

precise-code-intel-bundle-manager: janitor_old_dumps_removed

Descriptions:

  • precise-code-intel-bundle-manager: 20+ bundle files removed (due to low disk space) every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_janitor_old_dumps_removed"
]

precise-code-intel-bundle-manager: janitor_orphans

Descriptions:

  • precise-code-intel-bundle-manager: 20+ bundle and upload files removed (with no corresponding database entry) every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_janitor_orphans"
]

precise-code-intel-bundle-manager: janitor_uploads_removed

Descriptions:

  • precise-code-intel-bundle-manager: 20+ upload records removed every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_janitor_uploads_removed"
]

precise-code-intel-bundle-manager: frontend_internal_api_error_responses

Descriptions:

  • precise-code-intel-bundle-manager: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs precise-code-intel-bundle-manager for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs precise-code-intel-bundle-manager for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_frontend_internal_api_error_responses"
]

precise-code-intel-bundle-manager: container_cpu_usage

Descriptions:

  • precise-code-intel-bundle-manager: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the precise-code-intel-bundle-manager container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_container_cpu_usage"
]

precise-code-intel-bundle-manager: container_memory_usage

Descriptions:

  • precise-code-intel-bundle-manager: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of precise-code-intel-bundle-manager container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_container_memory_usage"
]

precise-code-intel-bundle-manager: container_restarts

Descriptions:

  • precise-code-intel-bundle-manager: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod precise-code-intel-bundle-manager (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p precise-code-intel-bundle-manager.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' precise-code-intel-bundle-manager (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the precise-code-intel-bundle-manager container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs precise-code-intel-bundle-manager (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_container_restarts"
]

precise-code-intel-bundle-manager: fs_inodes_used

Descriptions:

  • precise-code-intel-bundle-manager: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_fs_inodes_used"
]

precise-code-intel-bundle-manager: provisioning_container_cpu_usage_long_term

Descriptions:

  • precise-code-intel-bundle-manager: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the precise-code-intel-bundle-manager service.
    • Docker Compose: Consider increasing cpus: of the precise-code-intel-bundle-manager container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_provisioning_container_cpu_usage_long_term"
]

precise-code-intel-bundle-manager: provisioning_container_memory_usage_long_term

Descriptions:

  • precise-code-intel-bundle-manager: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the precise-code-intel-bundle-manager service.
    • Docker Compose: Consider increasing memory: of the precise-code-intel-bundle-manager container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_provisioning_container_memory_usage_long_term"
]

precise-code-intel-bundle-manager: provisioning_container_cpu_usage_short_term

Descriptions:

  • precise-code-intel-bundle-manager: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the precise-code-intel-bundle-manager container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_provisioning_container_cpu_usage_short_term"
]

precise-code-intel-bundle-manager: provisioning_container_memory_usage_short_term

Descriptions:

  • precise-code-intel-bundle-manager: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of precise-code-intel-bundle-manager container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_provisioning_container_memory_usage_short_term"
]

precise-code-intel-bundle-manager: go_goroutines

Descriptions:

  • precise-code-intel-bundle-manager: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_go_goroutines"
]

precise-code-intel-bundle-manager: go_gc_duration_seconds

Descriptions:

  • precise-code-intel-bundle-manager: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-bundle-manager_go_gc_duration_seconds"
]

precise-code-intel-bundle-manager: pods_available_percentage

Descriptions:

  • precise-code-intel-bundle-manager: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_precise-code-intel-bundle-manager_pods_available_percentage"
]

precise-code-intel-worker: upload_queue_size

Descriptions:

  • precise-code-intel-worker: 100+ upload queue size

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_upload_queue_size"
]

precise-code-intel-worker: upload_queue_growth_rate

Descriptions:

  • precise-code-intel-worker: 5+ upload queue growth rate every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_upload_queue_growth_rate"
]

precise-code-intel-worker: upload_process_errors

Descriptions:

  • precise-code-intel-worker: 20+ upload process errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_upload_process_errors"
]

precise-code-intel-worker: 99th_percentile_store_duration

Descriptions:

  • precise-code-intel-worker: 20s+ 99th percentile successful database query duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_99th_percentile_store_duration"
]

precise-code-intel-worker: store_errors

Descriptions:

  • precise-code-intel-worker: 20+ database errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_store_errors"
]

precise-code-intel-worker: processing_uploads_reset

Descriptions:

  • precise-code-intel-worker: 20+ uploads reset to queued state every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_processing_uploads_reset"
]

precise-code-intel-worker: processing_uploads_reset_failures

Descriptions:

  • precise-code-intel-worker: 20+ uploads errored after repeated resets every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_processing_uploads_reset_failures"
]

precise-code-intel-worker: upload_resetter_errors

Descriptions:

  • precise-code-intel-worker: 20+ upload resetter errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_upload_resetter_errors"
]

precise-code-intel-worker: 99th_percentile_bundle_manager_transfer_duration

Descriptions:

  • precise-code-intel-worker: 300s+ 99th percentile successful bundle manager data transfer duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_99th_percentile_bundle_manager_transfer_duration"
]

precise-code-intel-worker: bundle_manager_error_responses

Descriptions:

  • precise-code-intel-worker: 5+ bundle manager error responses every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_bundle_manager_error_responses"
]

precise-code-intel-worker: 99th_percentile_gitserver_duration

Descriptions:

  • precise-code-intel-worker: 20s+ 99th percentile successful gitserver query duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_99th_percentile_gitserver_duration"
]

precise-code-intel-worker: gitserver_error_responses

Descriptions:

  • precise-code-intel-worker: 5%+ gitserver error responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_gitserver_error_responses"
]

precise-code-intel-worker: frontend_internal_api_error_responses

Descriptions:

  • precise-code-intel-worker: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs precise-code-intel-worker for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs precise-code-intel-worker for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_frontend_internal_api_error_responses"
]

precise-code-intel-worker: container_cpu_usage

Descriptions:

  • precise-code-intel-worker: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_container_cpu_usage"
]

precise-code-intel-worker: container_memory_usage

Descriptions:

  • precise-code-intel-worker: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_container_memory_usage"
]

precise-code-intel-worker: container_restarts

Descriptions:

  • precise-code-intel-worker: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod precise-code-intel-worker (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p precise-code-intel-worker.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' precise-code-intel-worker (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the precise-code-intel-worker container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs precise-code-intel-worker (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_container_restarts"
]

precise-code-intel-worker: fs_inodes_used

Descriptions:

  • precise-code-intel-worker: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_fs_inodes_used"
]

precise-code-intel-worker: provisioning_container_cpu_usage_long_term

Descriptions:

  • precise-code-intel-worker: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the precise-code-intel-worker service.
    • Docker Compose: Consider increasing cpus: of the precise-code-intel-worker container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_provisioning_container_cpu_usage_long_term"
]

precise-code-intel-worker: provisioning_container_memory_usage_long_term

Descriptions:

  • precise-code-intel-worker: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the precise-code-intel-worker service.
    • Docker Compose: Consider increasing memory: of the precise-code-intel-worker container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_provisioning_container_memory_usage_long_term"
]

precise-code-intel-worker: provisioning_container_cpu_usage_short_term

Descriptions:

  • precise-code-intel-worker: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_provisioning_container_cpu_usage_short_term"
]

precise-code-intel-worker: provisioning_container_memory_usage_short_term

Descriptions:

  • precise-code-intel-worker: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of precise-code-intel-worker container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_provisioning_container_memory_usage_short_term"
]

precise-code-intel-worker: go_goroutines

Descriptions:

  • precise-code-intel-worker: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_go_goroutines"
]

precise-code-intel-worker: go_gc_duration_seconds

Descriptions:

  • precise-code-intel-worker: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-worker_go_gc_duration_seconds"
]

precise-code-intel-worker: pods_available_percentage

Descriptions:

  • precise-code-intel-worker: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_precise-code-intel-worker_pods_available_percentage"
]

precise-code-intel-indexer: index_queue_size

Descriptions:

  • precise-code-intel-indexer: 100+ index queue size

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_index_queue_size"
]

precise-code-intel-indexer: index_queue_growth_rate

Descriptions:

  • precise-code-intel-indexer: 5+ index queue growth rate every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_index_queue_growth_rate"
]

precise-code-intel-indexer: index_process_errors

Descriptions:

  • precise-code-intel-indexer: 20+ index process errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_index_process_errors"
]

precise-code-intel-indexer: 99th_percentile_store_duration

Descriptions:

  • precise-code-intel-indexer: 20s+ 99th percentile successful database query duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_99th_percentile_store_duration"
]

precise-code-intel-indexer: store_errors

Descriptions:

  • precise-code-intel-indexer: 20+ database errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_store_errors"
]

precise-code-intel-indexer: indexability_updater_errors

Descriptions:

  • precise-code-intel-indexer: 20+ indexability updater errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_indexability_updater_errors"
]

precise-code-intel-indexer: index_scheduler_errors

Descriptions:

  • precise-code-intel-indexer: 20+ index scheduler errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_index_scheduler_errors"
]

precise-code-intel-indexer: processing_indexes_reset

Descriptions:

  • precise-code-intel-indexer: 20+ indexes reset to queued state every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_processing_indexes_reset"
]

precise-code-intel-indexer: processing_indexes_reset_failures

Descriptions:

  • precise-code-intel-indexer: 20+ indexes errored after repeated resets every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_processing_indexes_reset_failures"
]

precise-code-intel-indexer: index_resetter_errors

Descriptions:

  • precise-code-intel-indexer: 20+ index resetter errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_index_resetter_errors"
]

precise-code-intel-indexer: janitor_errors

Descriptions:

  • precise-code-intel-indexer: 20+ janitor errors every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_janitor_errors"
]

precise-code-intel-indexer: janitor_indexes_removed

Descriptions:

  • precise-code-intel-indexer: 20+ index records removed every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_janitor_indexes_removed"
]

precise-code-intel-indexer: 99th_percentile_gitserver_duration

Descriptions:

  • precise-code-intel-indexer: 20s+ 99th percentile successful gitserver query duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_99th_percentile_gitserver_duration"
]

precise-code-intel-indexer: gitserver_error_responses

Descriptions:

  • precise-code-intel-indexer: 5%+ gitserver error responses every 5m for 15m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_gitserver_error_responses"
]

precise-code-intel-indexer: frontend_internal_api_error_responses

Descriptions:

  • precise-code-intel-indexer: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs precise-code-intel-indexer for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs precise-code-intel-indexer for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_frontend_internal_api_error_responses"
]

precise-code-intel-indexer: container_cpu_usage

Descriptions:

  • precise-code-intel-indexer: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the precise-code-intel-indexer container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_container_cpu_usage"
]

precise-code-intel-indexer: container_memory_usage

Descriptions:

  • precise-code-intel-indexer: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of precise-code-intel-indexer container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_container_memory_usage"
]

precise-code-intel-indexer: container_restarts

Descriptions:

  • precise-code-intel-indexer: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod precise-code-intel-indexer (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p precise-code-intel-indexer.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' precise-code-intel-indexer (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the precise-code-intel-indexer container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs precise-code-intel-indexer (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_container_restarts"
]

precise-code-intel-indexer: fs_inodes_used

Descriptions:

  • precise-code-intel-indexer: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_fs_inodes_used"
]

precise-code-intel-indexer: provisioning_container_cpu_usage_long_term

Descriptions:

  • precise-code-intel-indexer: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the precise-code-intel-indexer service.
    • Docker Compose: Consider increasing cpus: of the precise-code-intel-indexer container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_provisioning_container_cpu_usage_long_term"
]

precise-code-intel-indexer: provisioning_container_memory_usage_long_term

Descriptions:

  • precise-code-intel-indexer: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the precise-code-intel-indexer service.
    • Docker Compose: Consider increasing memory: of the precise-code-intel-indexer container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_provisioning_container_memory_usage_long_term"
]

precise-code-intel-indexer: provisioning_container_cpu_usage_short_term

Descriptions:

  • precise-code-intel-indexer: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the precise-code-intel-indexer container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_provisioning_container_cpu_usage_short_term"
]

precise-code-intel-indexer: provisioning_container_memory_usage_short_term

Descriptions:

  • precise-code-intel-indexer: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of precise-code-intel-indexer container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_provisioning_container_memory_usage_short_term"
]

precise-code-intel-indexer: go_goroutines

Descriptions:

  • precise-code-intel-indexer: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_go_goroutines"
]

precise-code-intel-indexer: go_gc_duration_seconds

Descriptions:

  • precise-code-intel-indexer: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_precise-code-intel-indexer_go_gc_duration_seconds"
]

precise-code-intel-indexer: pods_available_percentage

Descriptions:

  • precise-code-intel-indexer: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_precise-code-intel-indexer_pods_available_percentage"
]

query-runner: frontend_internal_api_error_responses

Descriptions:

  • query-runner: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs query-runner for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs query-runner for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_frontend_internal_api_error_responses"
]

query-runner: container_memory_usage

Descriptions:

  • query-runner: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of query-runner container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_container_memory_usage"
]

query-runner: container_cpu_usage

Descriptions:

  • query-runner: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the query-runner container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_container_cpu_usage"
]

query-runner: container_restarts

Descriptions:

  • query-runner: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod query-runner (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p query-runner.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' query-runner (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the query-runner container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs query-runner (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_container_restarts"
]

query-runner: fs_inodes_used

Descriptions:

  • query-runner: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_fs_inodes_used"
]

query-runner: provisioning_container_cpu_usage_long_term

Descriptions:

  • query-runner: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the query-runner service.
    • Docker Compose: Consider increasing cpus: of the query-runner container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_provisioning_container_cpu_usage_long_term"
]

query-runner: provisioning_container_memory_usage_long_term

Descriptions:

  • query-runner: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the query-runner service.
    • Docker Compose: Consider increasing memory: of the query-runner container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_provisioning_container_memory_usage_long_term"
]

query-runner: provisioning_container_cpu_usage_short_term

Descriptions:

  • query-runner: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the query-runner container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_provisioning_container_cpu_usage_short_term"
]

query-runner: provisioning_container_memory_usage_short_term

Descriptions:

  • query-runner: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of query-runner container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_provisioning_container_memory_usage_short_term"
]

query-runner: go_goroutines

Descriptions:

  • query-runner: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_go_goroutines"
]

query-runner: go_gc_duration_seconds

Descriptions:

  • query-runner: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_query-runner_go_gc_duration_seconds"
]

query-runner: pods_available_percentage

Descriptions:

  • query-runner: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_query-runner_pods_available_percentage"
]

repo-updater: frontend_internal_api_error_responses

Descriptions:

  • repo-updater: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs repo-updater for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs repo-updater for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_frontend_internal_api_error_responses"
]

repo-updater: perms_syncer_perms

Descriptions:

  • repo-updater: 259200s+ time gap between least and most up to date permissions for 5m0s

Possible solutions:

  • Increase the API rate limit to GitHub, GitLab or Bitbucket Server.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_perms_syncer_perms"
]

repo-updater: perms_syncer_stale_perms

Descriptions:

  • repo-updater: 100+ number of entities with stale permissions for 5m0s

Possible solutions:

  • Increase the API rate limit to GitHub, GitLab or Bitbucket Server.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_perms_syncer_stale_perms"
]

repo-updater: perms_syncer_no_perms

Descriptions:

  • repo-updater: 100+ number of entities with no permissions for 5m0s

Possible solutions:

  • Enabled permissions for the first time: Wait for few minutes and see if the number goes down.
  • Otherwise: Increase the API rate limit to GitHub, GitLab or Bitbucket Server.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_perms_syncer_no_perms"
]

repo-updater: perms_syncer_sync_duration

Descriptions:

  • repo-updater: 30s+ 95th permissions sync duration for 5m0s

Possible solutions:

  • Check the network latency is reasonable (<50ms) between the Sourcegraph and the code host.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_perms_syncer_sync_duration"
]

repo-updater: perms_syncer_queue_size

Descriptions:

  • repo-updater: 100+ permissions sync queued items for 5m0s

Possible solutions:

  • Enabled permissions for the first time: Wait for few minutes and see if the number goes down.
  • Otherwise: Increase the API rate limit to GitHub, GitLab or Bitbucket Server.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_perms_syncer_queue_size"
]

repo-updater: authz_filter_duration

Descriptions:

  • repo-updater: 1s+ 95th authorization duration for 1m0s

Possible solutions:

  • Check if database is overloaded.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_authz_filter_duration"
]

repo-updater: perms_syncer_sync_errors

Descriptions:

  • repo-updater: 1+ permissions sync error rate for 1m0s

Possible solutions:

  • Check the network connectivity the Sourcegraph and the code host.
  • Check if API rate limit quota is exhausted on the code host.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_perms_syncer_sync_errors"
]

repo-updater: container_cpu_usage

Descriptions:

  • repo-updater: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the repo-updater container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_container_cpu_usage"
]

repo-updater: container_memory_usage

Descriptions:

  • repo-updater: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of repo-updater container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_container_memory_usage"
]

repo-updater: container_restarts

Descriptions:

  • repo-updater: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod repo-updater (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p repo-updater.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' repo-updater (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the repo-updater container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs repo-updater (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_container_restarts"
]

repo-updater: fs_inodes_used

Descriptions:

  • repo-updater: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_fs_inodes_used"
]

repo-updater: provisioning_container_cpu_usage_long_term

Descriptions:

  • repo-updater: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the repo-updater service.
    • Docker Compose: Consider increasing cpus: of the repo-updater container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_provisioning_container_cpu_usage_long_term"
]

repo-updater: provisioning_container_memory_usage_long_term

Descriptions:

  • repo-updater: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the repo-updater service.
    • Docker Compose: Consider increasing memory: of the repo-updater container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_provisioning_container_memory_usage_long_term"
]

repo-updater: provisioning_container_cpu_usage_short_term

Descriptions:

  • repo-updater: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the repo-updater container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_provisioning_container_cpu_usage_short_term"
]

repo-updater: provisioning_container_memory_usage_short_term

Descriptions:

  • repo-updater: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of repo-updater container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_provisioning_container_memory_usage_short_term"
]

repo-updater: go_goroutines

Descriptions:

  • repo-updater: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_go_goroutines"
]

repo-updater: go_gc_duration_seconds

Descriptions:

  • repo-updater: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_repo-updater_go_gc_duration_seconds"
]

repo-updater: pods_available_percentage

Descriptions:

  • repo-updater: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_repo-updater_pods_available_percentage"
]

searcher: unindexed_search_request_errors

Descriptions:

  • searcher: 5%+ unindexed search request errors every 5m by code for 5m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_unindexed_search_request_errors"
]

searcher: replica_traffic

Descriptions:

  • searcher: 5+ requests per second over 10m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_replica_traffic"
]

searcher: frontend_internal_api_error_responses

Descriptions:

  • searcher: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs searcher for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs searcher for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_frontend_internal_api_error_responses"
]

searcher: container_cpu_usage

Descriptions:

  • searcher: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the searcher container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_container_cpu_usage"
]

searcher: container_memory_usage

Descriptions:

  • searcher: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of searcher container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_container_memory_usage"
]

searcher: container_restarts

Descriptions:

  • searcher: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod searcher (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p searcher.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' searcher (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the searcher container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs searcher (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_container_restarts"
]

searcher: fs_inodes_used

Descriptions:

  • searcher: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_fs_inodes_used"
]

searcher: provisioning_container_cpu_usage_long_term

Descriptions:

  • searcher: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the searcher service.
    • Docker Compose: Consider increasing cpus: of the searcher container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_provisioning_container_cpu_usage_long_term"
]

searcher: provisioning_container_memory_usage_long_term

Descriptions:

  • searcher: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the searcher service.
    • Docker Compose: Consider increasing memory: of the searcher container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_provisioning_container_memory_usage_long_term"
]

searcher: provisioning_container_cpu_usage_short_term

Descriptions:

  • searcher: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the searcher container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_provisioning_container_cpu_usage_short_term"
]

searcher: provisioning_container_memory_usage_short_term

Descriptions:

  • searcher: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of searcher container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_provisioning_container_memory_usage_short_term"
]

searcher: go_goroutines

Descriptions:

  • searcher: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_go_goroutines"
]

searcher: go_gc_duration_seconds

Descriptions:

  • searcher: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_searcher_go_gc_duration_seconds"
]

searcher: pods_available_percentage

Descriptions:

  • searcher: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_searcher_pods_available_percentage"
]

symbols: store_fetch_failures

Descriptions:

  • symbols: 5+ store fetch failures every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_store_fetch_failures"
]

symbols: current_fetch_queue_size

Descriptions:

  • symbols: 25+ current fetch queue size

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_current_fetch_queue_size"
]

symbols: frontend_internal_api_error_responses

Descriptions:

  • symbols: 2%+ frontend-internal API error responses every 5m by route for 5m0s

Possible solutions:

  • Single-container deployments: Check docker logs $CONTAINER_ID for logs starting with repo-updater that indicate requests to the frontend service are failing.
  • Kubernetes:
    • Confirm that kubectl get pods shows the frontend pods are healthy.
    • Check kubectl logs symbols for logs indicate request failures to frontend or frontend-internal.
  • Docker Compose:
    • Confirm that docker ps shows the frontend-internal container is healthy.
    • Check docker logs symbols for logs indicating request failures to frontend or frontend-internal.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_frontend_internal_api_error_responses"
]

symbols: container_cpu_usage

Descriptions:

  • symbols: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the symbols container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_container_cpu_usage"
]

symbols: container_memory_usage

Descriptions:

  • symbols: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of symbols container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_container_memory_usage"
]

symbols: container_restarts

Descriptions:

  • symbols: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod symbols (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p symbols.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' symbols (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the symbols container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs symbols (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_container_restarts"
]

symbols: fs_inodes_used

Descriptions:

  • symbols: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_fs_inodes_used"
]

symbols: provisioning_container_cpu_usage_long_term

Descriptions:

  • symbols: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the symbols service.
    • Docker Compose: Consider increasing cpus: of the symbols container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_provisioning_container_cpu_usage_long_term"
]

symbols: provisioning_container_memory_usage_long_term

Descriptions:

  • symbols: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the symbols service.
    • Docker Compose: Consider increasing memory: of the symbols container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_provisioning_container_memory_usage_long_term"
]

symbols: provisioning_container_cpu_usage_short_term

Descriptions:

  • symbols: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the symbols container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_provisioning_container_cpu_usage_short_term"
]

symbols: provisioning_container_memory_usage_short_term

Descriptions:

  • symbols: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of symbols container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_provisioning_container_memory_usage_short_term"
]

symbols: go_goroutines

Descriptions:

  • symbols: 10000+ maximum active goroutines for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_go_goroutines"
]

symbols: go_gc_duration_seconds

Descriptions:

  • symbols: 2s+ maximum go garbage collection duration

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_symbols_go_gc_duration_seconds"
]

symbols: pods_available_percentage

Descriptions:

  • symbols: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_symbols_pods_available_percentage"
]

syntect-server: syntax_highlighting_errors

Descriptions:

  • syntect-server: 5%+ syntax highlighting errors every 5m for 5m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_syntax_highlighting_errors"
]

syntect-server: syntax_highlighting_timeouts

Descriptions:

  • syntect-server: 5%+ syntax highlighting timeouts every 5m for 5m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_syntax_highlighting_timeouts"
]

syntect-server: syntax_highlighting_panics

Descriptions:

  • syntect-server: 5+ syntax highlighting panics every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_syntax_highlighting_panics"
]

syntect-server: syntax_highlighting_worker_deaths

Descriptions:

  • syntect-server: 1+ syntax highlighter worker deaths every 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_syntax_highlighting_worker_deaths"
]

syntect-server: container_cpu_usage

Descriptions:

  • syntect-server: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the syntect-server container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_container_cpu_usage"
]

syntect-server: container_memory_usage

Descriptions:

  • syntect-server: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of syntect-server container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_container_memory_usage"
]

syntect-server: container_restarts

Descriptions:

  • syntect-server: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod syntect-server (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p syntect-server.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' syntect-server (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the syntect-server container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs syntect-server (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_container_restarts"
]

syntect-server: fs_inodes_used

Descriptions:

  • syntect-server: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_fs_inodes_used"
]

syntect-server: provisioning_container_cpu_usage_long_term

Descriptions:

  • syntect-server: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the syntect-server service.
    • Docker Compose: Consider increasing cpus: of the syntect-server container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_provisioning_container_cpu_usage_long_term"
]

syntect-server: provisioning_container_memory_usage_long_term

Descriptions:

  • syntect-server: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the syntect-server service.
    • Docker Compose: Consider increasing memory: of the syntect-server container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_provisioning_container_memory_usage_long_term"
]

syntect-server: provisioning_container_cpu_usage_short_term

Descriptions:

  • syntect-server: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the syntect-server container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_provisioning_container_cpu_usage_short_term"
]

syntect-server: provisioning_container_memory_usage_short_term

Descriptions:

  • syntect-server: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of syntect-server container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_syntect-server_provisioning_container_memory_usage_short_term"
]

syntect-server: pods_available_percentage

Descriptions:

  • syntect-server: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_syntect-server_pods_available_percentage"
]

zoekt-indexserver: average_resolve_revision_duration

Descriptions:

  • zoekt-indexserver: 15s+ average resolve revision duration over 5m

  • zoekt-indexserver: 30s+ average resolve revision duration over 5m

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_average_resolve_revision_duration",
  "critical_zoekt-indexserver_average_resolve_revision_duration"
]

zoekt-indexserver: container_cpu_usage

Descriptions:

  • zoekt-indexserver: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the zoekt-indexserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_container_cpu_usage"
]

zoekt-indexserver: container_memory_usage

Descriptions:

  • zoekt-indexserver: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of zoekt-indexserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_container_memory_usage"
]

zoekt-indexserver: container_restarts

Descriptions:

  • zoekt-indexserver: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod zoekt-indexserver (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p zoekt-indexserver.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' zoekt-indexserver (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the zoekt-indexserver container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs zoekt-indexserver (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_container_restarts"
]

zoekt-indexserver: fs_inodes_used

Descriptions:

  • zoekt-indexserver: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_fs_inodes_used"
]

zoekt-indexserver: fs_io_operations

Descriptions:

  • zoekt-indexserver: 5000+ filesystem reads and writes rate by instance over 1h

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_fs_io_operations"
]

zoekt-indexserver: provisioning_container_cpu_usage_long_term

Descriptions:

  • zoekt-indexserver: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the zoekt-indexserver service.
    • Docker Compose: Consider increasing cpus: of the zoekt-indexserver container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_provisioning_container_cpu_usage_long_term"
]

zoekt-indexserver: provisioning_container_memory_usage_long_term

Descriptions:

  • zoekt-indexserver: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the zoekt-indexserver service.
    • Docker Compose: Consider increasing memory: of the zoekt-indexserver container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_provisioning_container_memory_usage_long_term"
]

zoekt-indexserver: provisioning_container_cpu_usage_short_term

Descriptions:

  • zoekt-indexserver: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the zoekt-indexserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_provisioning_container_cpu_usage_short_term"
]

zoekt-indexserver: provisioning_container_memory_usage_short_term

Descriptions:

  • zoekt-indexserver: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of zoekt-indexserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-indexserver_provisioning_container_memory_usage_short_term"
]

zoekt-indexserver: pods_available_percentage

Descriptions:

  • zoekt-indexserver: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_zoekt-indexserver_pods_available_percentage"
]

zoekt-webserver: indexed_search_request_errors

Descriptions:

  • zoekt-webserver: 5%+ indexed search request errors every 5m by code for 5m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_indexed_search_request_errors"
]

zoekt-webserver: container_cpu_usage

Descriptions:

  • zoekt-webserver: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the zoekt-webserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_container_cpu_usage"
]

zoekt-webserver: container_memory_usage

Descriptions:

  • zoekt-webserver: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of zoekt-webserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_container_memory_usage"
]

zoekt-webserver: container_restarts

Descriptions:

  • zoekt-webserver: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod zoekt-webserver (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p zoekt-webserver.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' zoekt-webserver (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the zoekt-webserver container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs zoekt-webserver (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_container_restarts"
]

zoekt-webserver: fs_inodes_used

Descriptions:

  • zoekt-webserver: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_fs_inodes_used"
]

zoekt-webserver: fs_io_operations

Descriptions:

  • zoekt-webserver: 5000+ filesystem reads and writes by instance rate over 1h

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_fs_io_operations"
]

zoekt-webserver: provisioning_container_cpu_usage_long_term

Descriptions:

  • zoekt-webserver: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the zoekt-webserver service.
    • Docker Compose: Consider increasing cpus: of the zoekt-webserver container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_provisioning_container_cpu_usage_long_term"
]

zoekt-webserver: provisioning_container_memory_usage_long_term

Descriptions:

  • zoekt-webserver: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the zoekt-webserver service.
    • Docker Compose: Consider increasing memory: of the zoekt-webserver container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_provisioning_container_memory_usage_long_term"
]

zoekt-webserver: provisioning_container_cpu_usage_short_term

Descriptions:

  • zoekt-webserver: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the zoekt-webserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_provisioning_container_cpu_usage_short_term"
]

zoekt-webserver: provisioning_container_memory_usage_short_term

Descriptions:

  • zoekt-webserver: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of zoekt-webserver container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_zoekt-webserver_provisioning_container_memory_usage_short_term"
]

prometheus: prometheus_metrics_bloat

Descriptions:

  • prometheus: 20000B+ prometheus metrics payload size

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_prometheus_metrics_bloat"
]

prometheus: alertmanager_notifications_failed_total

Descriptions:

  • prometheus: 1+ failed alertmanager notifications over 1m

Possible solutions:

  • Ensure that your observability.alerts configuration (in site configuration) is valid.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_alertmanager_notifications_failed_total"
]

prometheus: container_cpu_usage

Descriptions:

  • prometheus: 99%+ container cpu usage total (1m average) across all cores by instance

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the prometheus container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_container_cpu_usage"
]

prometheus: container_memory_usage

Descriptions:

  • prometheus: 99%+ container memory usage by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of prometheus container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_container_memory_usage"
]

prometheus: container_restarts

Descriptions:

  • prometheus: 1+ container restarts every 5m by instance

Possible solutions:

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod prometheus (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p prometheus.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' prometheus (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the prometheus container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs prometheus (note this will include logs from the previous and currently running container).
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_container_restarts"
]

prometheus: fs_inodes_used

Descriptions:

  • prometheus: 3e+06+ fs inodes in use by instance

Possible solutions:

  •   	- Refer to your OS or cloud provider`s documentation for how to increase inodes.
      	- **Kubernetes:** consider provisioning more machines with less resources.
    
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_fs_inodes_used"
]

prometheus: provisioning_container_cpu_usage_long_term

Descriptions:

  • prometheus: 80%+ or less than 30% container cpu usage total (90th percentile over 1d) across all cores by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing CPU limits in the Deployment.yaml for the prometheus service.
    • Docker Compose: Consider increasing cpus: of the prometheus container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_provisioning_container_cpu_usage_long_term"
]

prometheus: provisioning_container_memory_usage_long_term

Descriptions:

  • prometheus: 80%+ or less than 30% container memory usage (1d maximum) by instance for 336h0m0s

Possible solutions:

  • If usage is high:
    • Kubernetes: Consider increasing memory limits in the Deployment.yaml for the prometheus service.
    • Docker Compose: Consider increasing memory: of the prometheus container in docker-compose.yml.
  • If usage is low, consider decreasing the above values.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_provisioning_container_memory_usage_long_term"
]

prometheus: provisioning_container_cpu_usage_short_term

Descriptions:

  • prometheus: 90%+ container cpu usage total (5m maximum) across all cores by instance for 30m0s

Possible solutions:

  • Kubernetes: Consider increasing CPU limits in the the relevant Deployment.yaml.
  • Docker Compose: Consider increasing cpus: of the prometheus container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_provisioning_container_cpu_usage_short_term"
]

prometheus: provisioning_container_memory_usage_short_term

Descriptions:

  • prometheus: 90%+ container memory usage (5m maximum) by instance

Possible solutions:

  • Kubernetes: Consider increasing memory limit in relevant Deployment.yaml.
  • Docker Compose: Consider increasing memory: of prometheus container in docker-compose.yml.
  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "warning_prometheus_provisioning_container_memory_usage_short_term"
]

prometheus: pods_available_percentage

Descriptions:

  • prometheus: less than 90% percentage pods available for 10m0s

Possible solutions:

  • Silence this alert: If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
"observability.silenceAlerts": [
  "critical_prometheus_pods_available_percentage"
]