Ackee Node.js Optimized GitLab CI Runners
After several tedious tasks around our servers with GitLab CI Runners, we have decided to try to move this last stuff into the cloud, because we would like to be more like DevOps Engineers than System Administrators. Probably nobody in this situation wants to clean disk space, check IOPS, and do other ridiculous tasks every single day, right?
So the task was quite simple: move this responsibility to Google, make the runners as good or even better than they are now and the last and most important thing (and what’s all this article about), check the cost and optimize it as much as we can.
Where to begin
In the past, we have used the Terraform module for AWS, which is pretty awesome and we wanted something exactly like that for GCP (Terraform is the way we provision things). After a quick research, we have found only one promising module for GCP, which was great for inspiration, but not for our production. Therefore, we have decided to start from scratch, and after a deep dive into the documentation of GitLab CI Runners, docker-machine, etc., we had a first working setup.
The first iteration has the following objectives:
- Scaling runners and removing them after 10 minutes of inactivity
- Distributed cache using GCS bucket with TTL for 7 days
- 8 GB memory (e2-standard-2) and 100 GB storage for the runners
- Scaling on working hours, 0-12 runners during the day, 0-4 otherwise
- Initial token setup should be done automatically
- Private instances with NAT
When everything was successfully tested, we deployed the brand new module into our production for 24 hours to check the price and performance. Apparently, it was working but honestly we can’t say it was the same or even better than the previous setup, probably more the opposite, performance was worse and the price was calculated as nearly 50€ for a day. So, we were a bit confused, but not done yet!
Advanced caching for Preemptible instances
Thoughts for a second iteration was to use some mirrors because due to the lifetime of the runners, there wasn’t much cached stuff at all. We have added a GCR mirror and ensured that Private Google Access is on. We have spared about 4€, so we can’t be sure if something from that helped or if it was caused by smaller traffic.
The third iteration was more interesting. Instances were changed to be more lightweight (n2d-standard-2) and enabled as a Preemptible (Spot instances didn’t exist yet on GCP). We got the day's cost to 30€ with the same performance and surprisingly we didn’t encounter any issues. In docker-machine it is very easy to set up only with one flag:
`--machine-machine-options "google-preemptible=true`
With each optimization, the controller startup script was bigger and bigger. According to the documentation, we have tried to move all gitlab-runner register
parameters to the Terraform template and pass it as a template to the gitlab-runner register
command to make the script more readable, however…
Some properties could be passed only by template (also because escaping) and others only as a CLI parameter.
# Register GCP runner
gitlab-runner register -n \
--name "${var.controller_gitlab_name} 💪" \
--url ${var.gitlab_url} \
--registration-token ${var.runner_token} \
--executor "docker+machine" \
--docker-image "${var.runner_docker_image}" \
${join("\n", formatlist("--docker-volumes \"%s\" \\", var.runner_mount_volumes))}
--tag-list "${var.controller_gitlab_tags}" \
--run-untagged="${var.controller_gitlab_untagged}" \
--pre-build-script "echo \"registry=http://$IP:4975\" >> .npmrc" \
--template-config "/tmp/config.toml"
The concurrency number (limits how many jobs can run concurrently) cannot be set by either of these options. You are forced to use sed
, which can be pleasant or a nightmare, depending on your personal appétit. Use of a sed
is a favorite clash in our DevOps team after distro fights, but we can all agree that a native parameter for concurrency would be nice.
In the fourth iteration, we’ve continued the idea with the mirrors and implemented them for Docker and NPM (Verdaccio), which were running from the controller and thus hardware requirements were raised and we were forced to satisfy them. Despite that, it was a great deal. We’ve cut the price in half and the performance was significantly better.
NAT Gateways are overrated
There was a last (quite unreasonable) high paid amount, the data processing through NAT gateway. So, what about removing the NAT and changing the runners to use public IP addresses? Firstly we thought it would be pricier, but the opposite was true. We have also slightly adjusted the hardware and a final setup was done.
Optimized Gitlab CI Runners: What’s next
In the end, after days of optimizing and testing, the performance of the cloud runners was better and with a cost around 6€ per day. Ready for production.
The next steps will be to create customized cloud runners for our Android projects, where we will face new challenges like Gradle caching and more resource-hungry builds.