Back to the Basics: Kubernetes Service Selector
Today, you will read about something you are familiar with. There is hardly anyone in infrastructure engineering who hasn't worked with Kubernetes. Let alone with services. Why should you waste your time by continuing to read this article? The answer might surprise you.
Recap: how does it work
Let’s skip technical mumbo jumbo. We all read somewhere that iptables
do that magic for us. Let’s trust this statement since it hardly ever fails to be true, and if so, it is fixed by the Kubernetes (like GCP GKE or AWS EKS) provider with the next update. We are spared of this configuration by the power of abstraction. The only thing we have to know is the label selector and even that is too hard for us. And that’s the point of today's story.
Let’s have a deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: sayhi
labels:
app: sayhi
spec:
replicas: 1
selector:
matchLabels:
app: sayhi
template:
metadata:
labels:
app: sayhi
spec:
containers:
- name: sayhi
image: beranm14/sayhi
ports:
- containerPort: 3000
It deploys a pod with an app which says "hi🐈\n"
on the endpoint /hi
. And that’s it. If you are interested, check the repository with the app on GitHub. Now let’s create a service to expose the app to the cluster:
apiVersion: v1
kind: Service
metadata:
name: sayhi
spec:
selector:
app: sayhi
ports:
- protocol: TCP
port: 80
targetPort: 3000
Let’s check if everything is running by simple curl while loop:
while true; do date; curl http://sayhi/hi; sleep 1; done
Sat Mar 25 17:24:58 UTC 2023
hi🐈
Sat Mar 25 17:24:59 UTC 2023
hi🐈
Sat Mar 25 17:25:00 UTC 2023
hi🐈
Sat Mar 25 17:25:01 UTC 2023
hi🐈
Sat Mar 25 17:25:02 UTC 2023
hi🐈
Keeping the app ready
Look at all these cats. It seems everything is as it should be. But for the sake of being sure our app is running smoothly, let’s add readiness probes. Those should disconnect our deployment from the service once the pods are overloaded and do not respond to the readiness probes. Please, check the documentation for further details.
Following yaml block needs to be present at container definition in the deployment yaml:
readinessProbe:
httpGet:
path: /healthz
port: 3000
Now we have an app that says hi with a cat and disconnects once it has trouble delivering a cat. Everything works smoothly. Things are getting more stable and robust with minimal work required of programmers.
Jobs, pods and generating cats
Hardly any app can function without occasional jobs done. Those could be cron jobs or jobs related to the data migration needed to release a new version of an app. Kubernetes has a job as an entity. In our fictitious cat app, let’s generate cats. The job manifest can look like this:
apiVersion: batch/v1
kind: Job
metadata:
name: sayhi
labels:
app: sayhi
spec:
template:
metadata:
labels:
app: sayhi
spec:
containers:
- name: generate-cats
image: beranm14/sayhi-job
restartPolicy: Never
The source code is also available on GitHub. It creates a new cat every second until the number of cats reaches the count in the environment variable COUNT_OF_CATS
. And the output, surprise, surprise, is a lot of cats:
🐈
🐈
🐈
🐈
🐈
🐈
🐈
🐈
🐈
🐈
🐈
Oh no, something is happening to our sayhi
app. Let’s see our while loop:
while true; do date; curl sayhi/hi; sleep 1; done
Sat Mar 25 17:58:58 UTC 2023
hi🐈
Sat Mar 25 17:58:59 UTC 2023
hi🐈
Sat Mar 25 17:59:00 UTC 2023
curl: (7) Failed to connect to sayhi port 80 after 1 ms: Connection refused
Sat Mar 25 17:59:01 UTC 2023
curl: (7) Failed to connect to sayhi port 80 after 1 ms: Connection refused
Sat Mar 25 17:59:02 UTC 2023
hi🐈
Sat Mar 25 17:59:03 UTC 2023
hi🐈
But why? The only thing we changed was adding the job generating the cats. If something is happening to the sayhi
app, it should be disconnected. Is that the reason? There is no Readiness probe failed message.
Why are our connections refused?
For those who read this carefully, all is already clear. We gave our job and our deployment the same label app: sayhi
. Since service discovery is done by matching labels, traffic is forwarded to the pod generating our cats. Once the generation is finished, everything is back to normal. Readiness probes are not checked for the pod of the job. They are present only for containers in the deployment.
This can have significant consequences during the deployment and execution jobs related to data migration – if the labels are incorrect. The issue is visible only for a brief period during the job execution. The service registers the new endpoint and you can see the IP address leading to the pod from the job. But once the job is finished, the endpoint disappears.
Anyway, this little glitch in labels took me a few hours to figure out. It is lame. I hope you can benefit from my mistakes. Like and subscribe!