Deceptive rm Command in Dockerfile
Subject
There are already many Dockerfile best practices explained on various websites (I especially like this one -> https://sysdig.com/blog/dockerfile-best-practices/); yet, the subject of this post is still not emphasized enough. Thus, I wanted to write a short, dedicated post. It’ll be about:
- Keeping the size of Docker images small
- Dangers of putting credentials into Docker images
Consider the following two Dockerfiles:
Dockerfile-samelayer
FROM alpine
RUN fallocate -l 256M dummy-file && rm -f dummy-file
Dockerfile-differentlayer
FROM alpine
RUN fallocate -l 256M dummy-file
RUN rm -f dummy-file
(fallocate
is a Linux-specific command to generate dummy files)
How do you think these two images differ, specifically in size and final filesystem? If Docker has been a part of your life for some time already, you most likely know the answer.
To be honest, if someone asked this question to me back when I started learning containerization, I’d have said sizes and the filesystems would be identical for both Dockerfiles. After all, we are removing the dummy-file
in both Dockerfiles, right?
Well, I’d be wrong…
Inspection
After building the images, let’s take a look at how big the images are:
Hmm… Even though we remove the dummy-file
in both Dockerfiles, the image size of Dockerfile-differentlayer is huge as if we haven’t removed the file.
By using the magnificent https://github.com/wagoodman/dive tool, we can understand why.
Dockerfile-samelayer
Take a look at the filesystem of the Dockerfile-samelayer first:
No trace of dummy-file
. Nothing surprising here.
Let’s take a look at the image size now:
As expected, the file is actually removed from the image(the final image size is 5.6 MB); therefore the dummy-file
is not contributing to the size of the image.
Dockerfile-differentlayer
Let’s see if the dummy-file
is in the filesystem this time:
The filesystem is identical to the Dockerfile-samelayer in every way. dummy-file
is again not in the filesystem.
Let’s check the image size, where things get a bit complicated:
The rm
command is not doing what it is supposed to do. The only effect that it has is there is no trace of dummy-file
in the filesystem. (This is also not entirely true, I’ll come back to that later.) The file is still contributing to the final image size (the final image size is 274 MB).
So, what happened?
Explanation
Layers happened. In Docker, the image size is the sum of all layers and a layer cannot have a negative size. That’s because Docker uses Union filesystem.
But, where is the dummy-file
? We know it’s somewhere, but where?
In our example, we created the file in the second layer but removed it in the third layer. Which means,dummy-file
is still in the second layer. (See Union filesystem) To prove this, let’s view the filesystem of the second layer of rm-test:differentlayer
.
Here we can see that the dummy-file
is still accessible! This is also one of the reasons why sensitive data shouldn't be put into Docker images. It asks for all sorts of trouble.
So, is it really impossible to actually remove a file after the image is created?
No, but I consider it as a last resort solution. Only if I’m reaaaaally desperate, I might give it a try. See if you are still interested -> https://medium.com/@samhavens/how-to-make-a-docker-container-smaller-by-deleting-files-7354b5c6c8f1