How to troubleshoot pod evictions in Sourcegraph Kubernetes deployments
This document will take you through how to solve for pod eviction that can cause data loss in ephemeral storage.
This document will take you step-by-step through the tasks required to perform troubleshooting to understand why this occurrence took place and eventually solve for it.
This document assumes that you have deployed Sourcegraph on Kubernetes and you are a site admin for your organization.
Steps to troubleshoot
kubectl describe pod $EVICTEDPOD
- Check the
- If the error is:
Pod ephemeral local storage usage exceeds the total limit of containers xGi.
- Check on the:
ephemeral-storageLimits and Requests, for example
ephemeral-storage: xGi. Also, check the cache size for the pod where
$PODNAME_CACHE_SIZE_MB>:x0000, (x is an integer).
- In the
$PODNAME.Deployment.yaml, raise the
ephemeral-storagefigures to a preferred storage size for your node and set the
CACHE_SIZE_MBto a size lower than the ephemeral storage limit.
- Enable auto scaling by increasing the number of replicas(if preferred)