How to Optimize your Container Images

What is this about?

Our best practices on writing Dockerfiles.

The idea is that if people know of the intricacies of docker, then they will make better decisions regarding their applications and how they containerize it!

How to optimize your Containers ?

Here are a set of recommendations that I think are crucial. I will also get into the details of why they are important and how these changes will affect our systems. 😀

Use slimmer base images
The base docker image is the image on which the entire container is built. Most of the time we use the default base image which works fine, but we can do better than just “fine” . Now, the problems is that the default base image always has additional bells and whistles which have tools that can help in debugging, but are not really that important in production. Hence, for production deployments we can safely remove them. This is where the slimmer versions of the main images comes into picture.

DEMO TIME

Look at that Holy Cow!

Notice the reduction in the size of the image – 918 MB to 90 MB i.e, almost 10x reduction in its size. 

This will mean that our containers will be smaller and also there won’t be unnecessary things inside it. And that is good!

The *only* possible problem with this is that – slimmer containers will mean that debugging live containers during an issue will be problematic as the images will lack some debugging tools. Although, this problem is not too bad because if you have to ssh into a container to debug, then a LOT of things are already going wrong, so we might as well install something else and then debug. 😉

Leverage the Docker build cache

To understand what the build cache is, we need to take a small dip into the OverlayFS (overlay2 driver) file-system that docker uses. OverlayFS layers two directories on a single Linux host and presents it as a single directory on the Container.

A Docker image is built by building layers on top of the base image. This is cool because it means that the layers are cached and the layers are also shared between different images.

Now, let us look at how images are built…
When we build a image from a Dockerfile, Docker first pulls the base image and then it runs the different commands on top of the base image. This is lines like “RUN” in the Dockerfile. Each “RUN” line creates a new layer.

How does caching work ?
Caching basically means that if a layer is already present and is unchanged, then there is no need to rebuild that layer. 

You can see this on the output of docker build. 

Step 3/8 : WORKDIR /usr/src/app
 ---> Using cache
 ---> a0ecb2fba1ac

This line shows that since the “WORKDIR” part did not change, docker is pulling the layer from the cache itself.

How can we use this to our advantage ?

Okay, caching sounds fine, but there is a catch. (there is no free lunch)

If a lower layer is changed, then all upper layers are rebuilt. 

This effectively means that if you edit something on the top of your Dockerfile, then the entire image needs to be rebuilt. Hence it is a very good idea to keep the most volatile part of your Container at the bottom of the Dockerfile. Things like code, etc, etc. If the code changes at a lower layer, then all the upper layers will be rebuilt (even if they do not change themselves).

NOTE: It’s also a good idea to make sure that we are not using too many layers. This can be done by combining RUN commands in the dockerfile.

NOTE_1: Changing the base image may lead to some bugs. It also depends on the availability of the slimmer base images. But nevertheless, you should always test your containers.

NOTE_2: None of this is new information. It’s all on the internet. If I am wrong somewhere please let me know. I will fix it.

NOTE_3: We can also leverage multi-stage builds to reduce container image sizes. Often times the tools that you need for building an image are not necessarily needed to run the application, hence they do not need to be present in the final image.

REFERENCES: