Google Container Registry lifecycle policy for images retention
Is your Google Container Registry filling up, taking up storage and becoming expensive? How to handle images retention as a service?
Amazon’s Elastic Container Registry has a feature called Lifecycle Policies to handle images retention. Google doesn’t have this feature. There is a feature request in their tracker since Aug 2018 and there is not ETA for it so far…
Available solutions
There is a popular bash script from Ahmet and Go in CloudRun from Seth but none of them solve the requirements I needed. What exactly do I need?
Problem description
I want to scan my whole GCR and delete the digests that are:
- older than X days
- not being used in my kubernetes cluster
- not the most recent Y digests (I wanna keep, say, 20 most recent tagged digests)
When I check these requirements, I want to apply these lifecycle policies to all the images.
Say I have few images in GCR with certain prefixes:
eu.gcr.io/my-project/foo/bar/my-service:123
eu.gcr.io is the docker registry endpoint
my-project is ID of my GCP project
foo/bar is the prefix (“repo”)
my-service is an image name
123 is a tag
my-service:123 is an image with a tag, but wait, what is the digest?
Image vs Layers, taken from https://windsock.io/explaining-docker-image-ids/:
A docker image digest is an ID (hashing algorithm used and the hash computed). The digest can look like this:
@sha256:296e2378f7a14695b2f53101a3bd443f656f823c46d13bf6406b91e9e9950ef0
You can tag a digest with several tags, even zero tags = untagged image.
Let’s say build an image my-service and push it to docker registry. When pushing, I tag it with :123. The new produced digest has two tags,:123 and :latest.The digest that was tagged :latest before I pushed this image, got the :latest tag removed.
If I remove a tag from an image in GCR, I simply remove a tag from the digest, I don’t delete the digest though.
What I can delete, in order to save some space, is the digest, like this:
gcloud container images delete -q — force-delete-tags \
eu.gcr.io/my-project/foo/bar/my-service@sha256:296e2378f7a14695b2f53101a3bd443f656f823c46d13bf6406b91e9e9950ef0
Implementation
Then, what do I need to do?
- Recursively scan gcr.io for all image prefixes (eu.gcr.io/my-project/foo/bar/my-service)
- For each prefix — list all its digests, delete the ones that don’t match my rules
How to check if they match the rules:
- sort them, preserve the most recent Y digests
- fetch pods and replicaSets’ image:tags (all of them, even the ones scaled to zero, we’d need these images in case of a rollback) from the k8s cluster, then go through the digests (that belong to that image name) and check if ANY of their tags contain a tag that is used in the cluster, preserve those
- check the rest of the digests, if they are older than X days, delete them
you could use standard kubectl to fetch the data:
kubectl get rs,po --all-namespaces -o jsonpath={..image} | tr ' ' '\n'
the gcr.io is exposing a docker v2 API, you can use a standard docker client or just curl using the gcloud token
ACCESS_TOKEN=$(gcloud auth print-access-token)
curl --silent --show-error -u_token:"$ACCESS_TOKEN" -X GET "https://eu.gcr.io/v2/_catalog"
I implemented all of this using bash/jq (yep, that wasn’t a smart idea) and published it to github:
Right now I’m running this in Gitlab-CI pipeline on a cron schedule (once a day) to evaluate it’s dry-run logs for production GCP projects.
Summary
I’m planning on rewriting this to python (py-kubeng and docker-py) if Google will not to come up with ETA for this feature.