Deceptive rm Command in Dockerfile
There are already many Dockerfile best practices explained on various websites (I especially like this one -> https://sysdig.com/blog/dockerfile-best-practices/); yet, the subject of this post is still not emphasized enough. Thus, I wanted to write a short, dedicated post. It’ll be about:
- Keeping the size of Docker images small
- Dangers of putting credentials into Docker images
Consider the following two Dockerfiles:
RUN fallocate -l 256M dummy-file && rm -f dummy-file
RUN fallocate -l 256M dummy-file
RUN rm -f dummy-file
fallocate is a Linux-specific command to generate dummy files)
How do you think these two images differ, specifically in size and final filesystem? If Docker has been a part of your life for some time already, you most likely know the answer.
To be honest, if someone asked this question to me back when I started learning containerization, I’d have said sizes and the filesystems would be identical for both Dockerfiles. After all, we are removing the
dummy-file in both Dockerfiles, right?
Well, I’d be wrong…
After building the images, let’s take a look at how big the images are:
Hmm… Even though we remove the
dummy-file in both Dockerfiles, the image size of Dockerfile-differentlayer is huge as if we haven’t removed the file.
By using the magnificent https://github.com/wagoodman/dive tool, we can understand why.
Take a look at the filesystem of the Dockerfile-samelayer first:
No trace of
dummy-file. Nothing surprising here.
Let’s take a look at the image size now:
As expected, the file is actually removed from the image(the final image size is 5.6 MB); therefore the
dummy-file is not contributing to the size of the image.
Let’s see if the
dummy-file is in the filesystem this time:
The filesystem is identical to the Dockerfile-samelayer in every way.
dummy-file is again not in the filesystem.
Let’s check the image size, where things get a bit complicated:
rm command is not doing what it is supposed to do. The only effect that it has is there is no trace of
dummy-file in the filesystem. (This is also not entirely true, I’ll come back to that later.) The file is still contributing to the final image size (the final image size is 274 MB).
So, what happened?
Layers happened. In Docker, the image size is the sum of all layers and a layer cannot have a negative size. That’s because Docker uses Union filesystem.
But, where is the
dummy-file? We know it’s somewhere, but where?
In our example, we created the file in the second layer but removed it in the third layer. Which means,
dummy-file is still in the second layer. (See Union filesystem) To prove this, let’s view the filesystem of the second layer of
Here we can see that the
dummy-file is still accessible! This is also one of the reasons why sensitive data shouldn't be put into Docker images. It asks for all sorts of trouble.
So, is it really impossible to actually remove a file after the image is created?
No, but I consider it as a last resort solution. Only if I’m reaaaaally desperate, I might give it a try. See if you are still interested -> https://medium.com/@samhavens/how-to-make-a-docker-container-smaller-by-deleting-files-7354b5c6c8f1