TL;DR skip to the part Ok, so what did you do?
You might be wondering why we even bother ourselves with this “how to”. You can just go to Google Cloud Monitoring, find a correctly labeled metric, and move on. That might be true when you are reading this article, but it wasn’t in my case. For some reason, monitoring Out Of Memory (OOM) kills on GKE doesn’t seem important. The situation might have changed. Therefore, go and check. If it doesn't, let’s continue.
Using metrics
As I was trying to explain, my situation currently looks like this:
There is no metric called OOMKilled
, only metrics related to memory.
You can watch your memory metrics and detect sudden spikes. It’s not going to help you once the spike happens too fast. Monitoring doesn’t have the sampling frequency needed to notice it. In case the OOM kill takes hours, you won’t notice either. No spike there, unfortunately. Any indirect measurement would lose its meaning once you can just check the Kubernetes control plane if any of the pods reported OOMKilled.
How about checking logs?
The thing is that OOM kills do not even get logged correctly. Only during the state change of deployment. Let’s have the following deployment for testing:
apiVersion: apps/v1
kind: Deployment
metadata:
name: oom-tester
namespace: development
spec:
replicas: 1
selector:
matchLabels:
app: oom-tester
template:
metadata:
labels:
app: oom-tester
spec:
containers:
- name: test
image: ubuntu
command:
- "perl"
- "-wE"
- "my @xs; for (1..2**20) { push @xs, q{a} x 2**20 }; say scalar @xs;"
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
You might think this generates a log message in Logs Explorer – well, no. Not exactly. If you created the query in Log Explorer with OOMKill
, you would get the following:
"I0417 15:15:46.250863 1833 log\_monitor.go:160\] New status generated: &{Source:kernel-monitor Events:\[{Severity:warn Timestamp:2022\-04-17 15:15:45.770989804 +0000 UTC m=+117.097198737 Reason:OOMKilling Message:Memory cgroup out of memory: Killed process 4817 (perl) total-vm:138468kB, anon-rss:130352kB, file-rss:4548kB, shmem-rss:0kB, UID:0 pgtables:308kB oom\_score\_adj:983}\] Conditions:\[{Type:KernelDeadlock Status:False Transition:2022\-04-17 15:13:53.38722946 +0000 UTC m=+4.713438426 Reason:KernelHasNoDeadlock Message:kernel has no deadlock} {Type:ReadonlyFilesystem Status:False Transition:2022\-04-17 15:13:53.387229627 +0000 UTC m=+4.713438553 Reason:FilesystemIsNotReadOnly Message:Filesystem is not read-only}\]}"
This log is from node-problem-detector
, so the system knows perl
is doing something wrong, but there is no mention of which pod is having the issue. Furthermore, the log is from resource type Kubernetes Node, not Kubernetes Cluster, Container or Pod. It wouldn’t be the first place where I would like to go check.
Let’s change deployment not to generate OOM kills in repeat and just sleep for a while:
apiVersion: apps/v1
kind: Deployment
metadata:
name: oom-tester
namespace: development
spec:
replicas: 1
selector:
matchLabels:
app: oom-tester
template:
metadata:
labels:
app: oom-tester
spec:
containers:
- name: test
image: ubuntu
command:
- "sleep"
- "999999"
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Now we can see among many logs the message we were hoping for:
terminated: {
containerID: "containerd://19fca44bab42c657219189547d841fe31c2d4a6aced9df0036adc88f0418c760"
exitCode: 137
finishedAt: "2022-04-17T15:31:07Z"
reason: "OOMKilled"
startedAt: "2022-04-17T15:31:06Z"
}
It’s obvious now that the state of the pod is reported only once it is changed by kubectl
, not by the failure of the pod itself.
Well, Martin, you are not the first person who had this issue
Duh! I am aware of that. I was googling for anything reasonable, and I could only find the approach by monitoring messages from the container socket. One example is kubernetes-oomkill-exporter. It checks the docker socket and exports the oom kills as Prometheus metric. It also contains DaemonSet deployment to make it work for you.
Kubernetes-oomkill-exporter seems cool. It might have a few issues with the security but since this is a monitoring tool with public source code, that would be such a deal-breaker. The problem is Google Cloud Monitoring doesn't support Prometheus. You would need to install workload monitoring which is not directly designated for this purpose. The representation of Prometheus metrics is not as nice in Google Cloud Monitoring as in Prometheus itself.
Ok, so what did you do?
Glad you finally asked. Well, my solution is super simple. Just scrape the Kubernetes control plane, check if there is a pod with state OOMKilled
and report it to the Cloud Monitoring. That’s it.
Pod status is available under the following URL:
response = requests.get(
f"{APISERVER}/api/v1/namespaces/{NAMESPACE}/pods",
verify=f"{SERVICEACCOUNT}/ca.crt",
headers={'Authorization': f"Bearer {TOKEN}"}
)
Check is just compares last known state from the pod:
last_state = container_statuses.get('lastState', {})
if last_state is {}:
continue
for k, v in last_state.items():
if k == "terminated":
if v['reason'] == "OOMKilled":
report_pod = True
If you are interested, you can deploy the whole thing by the terraform module. It also installs the Grafana dashboard with an alert. Once the OOM kill happens, you will receive an alert with the pod name. It will close once the pod changes to a state without OOM kill.
Final notes
Not even my solution seems to be good enough. If you don’t have Prometheus in your cluster, your hands may be tight and rely only on Google Cloud Monitoring. Maybe, Google will create a new metric, and my work will be worthless. And that would be fine!
If you find anything better, please let me know. I will gladly put you in the introduction of this article with the links to a better solution.