Docker Image: In depth understanding of layers and it’s file system

Docker
DockerImage
DockerContainer
Dockerize
DevOps
CloudNative
author avatar
Sahil R Patel Software Engineering
14 min read  .  21 November 2024

banner image

Docker Image: In depth understanding of layers and it’s file system

The foundation of Docker is the idea of a Docker image, which is a small, standalone executable package that includes all of the necessary system tools, libraries, runtime, and code to run an application. This blog explores the inner workings of Docker images, their several construction techniques, and the underlying architecture that contributes to their efficiency. Developers may make more efficient development workflows and produce applications that are more portable, scalable, and repeatable by grasping the foundations of Docker images.

Docker Image Architecture:

Components

To save and extract the content of docker image available in local docker image repository, following commands are used.

docker save -o <tar_file_name> <docker_image_name>
tar -xfh <tar_file_name> -C <folder_name>
  • tar_file_name: Name of the tar file of saved docker image.
  • docker_image_name: Name of the docker image to saved
  • folder_name: Name of the folder to extract docker image tarball's content

Following is an example of extracted content of docker image tarball.

$ tree docker_image
docker_image
├── 9159d1e7ef15cd2f06dcb3e11cd841d351a4d2e54b07bde88fbf1ec64bd21764.json
├── 9f8e3b555a19066ccd80be3b2fec3c4d98857811872c5166cd0faf5e87344c37
│   ├── json
│   ├── layer.tar
│   └── VERSION
├── manifest.json
└── repositories

1 directory, 6 files
  • Manifest file:

    Manifest file is a json file containing information about config file name, image tags and folders which have the layer’s content. Following is the content of manifest.json file.

    [
      {
        "Config": "67dc28e4bcb8aae1f79ffd6d7644633c9967605515fc8c68d32fb3e6071e47fdd.json",
        "RepoTags": [
          "docker_image:latest"
        ],
        "Layers": [
          "14e7e31fc0b7bb37b9e37f61450e858b1746fd6b6e229277b27a7eb29b192c48/layer.tar"
        ]
      }
    ]
    
  • Config file:

    Config file is a json object containing all information regarding image (e.g. History, Intermediate container config, etc.). The name of this file is a sha256 hash, the same as image_id in the local image registry. Here’s the content of the config file.

  • Layer:

    For each instruction in the dockerfile which prompts filesystem change, a layer is created. In the docker image, layer is a tarball of filesystem changes. These layer tarballs can be found under the folders named with a uniquely created digest. Following are the details of each file under such folders:

    • VERSION: contains the version number of layers.

    • layer.tar: contains the content of a layer in zipped format. Following is the content of layer.tar.

      $ cd ~/docker_image/9f8e3b555a19066ccd80be3b2fec3c4d98857811872c5166cd0faf5e87344c37 && tree -L 1 layer
      layer
      ├── bin -> usr/bin
      ├── boot
      ├── dev
      ├── etc
      ├── home
      ├── lib -> usr/lib
      ├── lib64 -> usr/lib64
      ├── media
      ├── mnt
      ├── opt
      ├── proc
      ├── root
      ├── run
      ├── sbin -> usr/sbin
      ├── srv
      ├── sys
      ├── tmp
      ├── usr
      └── var
      
      19 directories, 0 files
      
    • json: Contains metadata about the layer.

    To effectively organize and monitor different kinds of data within image and container infrastructure, Docker uses an advanced hashing mechanism. The main idea behind this system is content-addressed storage, in which a piece of data is uniquely identifiable not by its name or location but by its hash value. The following types of hashes are used by the docker:

    • Diff Id: The hash generated by the content of the layer using SHA256 algorithm is diff id.

    • Chain Id: While creating an image, a chain id is created by combining the diff ids of each layer required to build that image. calculation method :

      For lowest layer, chainId = diffId

      Otherwise, chainId(n)=sha256sum( chainId(n-1), diffId(n) )

      Each layer in docker is saved internally with this chainId as a name.

    • Cache Id: This is randomly generated by the docker and assigned to the name of the folder containing the layer tarball file in docker image.

  • Supporting files:

    • Repositories: Contains information related to topmost image layer content.

Examples

Following are the examples of how the image looks like in different scenarios.

Note: layer_content is the folder in which layer's content is extracted after saving the docker image.

Hello-World:

This is the introductory docker image that is used to check installation of docker in a system.

Directory structure:

$ cd ~/hello_world && tree
.
├── d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a.json
├── df1c22f7c9ab11ff627ce477eae827d0eb29e637b95bffe1c5fd3f414ace672c
│   ├── json
│   ├── layer_content
│   │   └── hello
│   ├── layer.tar
│   └── VERSION
├── hello-world.tar
├── manifest.json
└── repositories

2 directories, 8 files

Layer Content:

cd ~/hello_world/df1c22f7c9ab11ff627ce477eae827d0eb29e637b95bffe1c5fd3f414ace672c/layer_content && tree
.
└── hello

0 directories, 1 file

Here, we’ve saved the Hello-World image and as shown in the directory structure, it contains only one layer in which only a hello binary is present which was statically compiled. When we execute hello binary, we get the exact same output as running docker run hello-world.

Multilayer

Multiple layers are present in docker image when there are multiple instructions present in dockerfile. This scenario better explains the relevance of layers in a docker image.

Dockerfile content:

FROM ubuntu
RUN apt-get update && apt-get install -y python3 python3-pip
RUN apt install -y python3-numpy
COPY ./np_example.py /
RUN python3 np_example.py

Here we’re creating a docker image with multiple commands. Directory Structure:

cd ~/multilayer_image && tree -L 2
.
├── 199d546a6ea4b477cf5829f009e8018a8284a96678cfe319647876c6ac743a01.json
├── 63d14bcd8a007d4954107b665b4fc9dfe5f0c60b298aa2c7d341959fe133ad1f
│   ├── json
│   ├── layer_content
│   ├── layer.tar
│   └── VERSION
├── 6adb54fa42bf9a84ef4855e35a96a83dde4290835a274f8fd33cb568b33686b4
│   ├── json
│   ├── layer_content
│   ├── layer.tar
│   └── VERSION
├── 9782337d1303a1a56ef6e5f07fde32b99f3186062abf3863f4ed58799aee90df
│   ├── json
│   ├── layer_content
│   ├── layer.tar
│   └── VERSION
├── ab130820146f2086cfa47c9b535d85f4244591cce2ae3b1a1a6d6f4d792ed48f
│   ├── json
│   ├── layer_content
│   ├── layer.tar
│   └── VERSION
├── manifest.json
└── repositories

8 directories, 15 files

Config Object:

  "Cmd": [
    "/bin/sh",
    "-c",
    "python3 np_example.py"
  ],

Under directory structure, it can be seen that there are four layers present corresponding to each command in the dockerfile except the last one which is set in “Cmd” field in the config object. This command will be executed by default when we run the container.

Layer Content:

As seen in the snapshots below, each layer contains content relevant to the command they correspond to.

Following snapshot belongs to the Ubuntu base image layer.

cd ~/multilayer_image && tree -L 1 63d14bcd8a007d4954107b665b4fc9dfe5f0c60b298aa2c7d341959fe133ad1f/layer_content/layer/
63d14bcd8a007d4954107b665b4fc9dfe5f0c60b298aa2c7d341959fe133ad1f/layer_content/layer/
├── bin -> usr/bin
├── boot
├── dev
├── etc
├── home
├── lib -> usr/lib
├── lib64 -> usr/lib64
├── media
├── mnt
├── opt
├── proc
├── root
├── run
├── sbin -> usr/sbin
├── srv
├── sys
├── tmp
├── usr
└── var

19 directories, 0 files

The following layer only contains the files that got changed while installing python3.

cd ~/multilayer_image && tree -L 1 9782337d1303a1a56ef6e5f07fde32b99f3186062abf3863f4ed58799aee90df/layer_content/layer/
9782337d1303a1a56ef6e5f07fde32b99f3186062abf3863f4ed58799aee90df/layer_content/layer/
├── etc
├── tmp
├── usr
└── var

4 directories, 0 files

The following layer contains files that got changed while installing numpy

cd ~/multilayer_image && tree -L 1 6adb54fa42bf9a84ef4855e35a96a83dde4290835a274f8fd33cb568b33686b4/layer_content/layer/
6adb54fa42bf9a84ef4855e35a96a83dde4290835a274f8fd33cb568b33686b4/layer_content/layer/
├── etc
├── tmp
├── usr
└── var

4 directories, 0 files

The following is the last layer contains only the python script that got copied by COPY command.

cd ~/multilayer_image && tree -L 1 ab130820146f2086cfa47c9b535d85f4244591cce2ae3b1a1a6d6f4d792ed48f/layer_content/layer/
ab130820146f2086cfa47c9b535d85f4244591cce2ae3b1a1a6d6f4d792ed48f/layer_content/layer/
└── np_example.py

0 directories, 1 file

Scratch

This is an empty image which is used to create minimalistic docker images. This docker image is used in creating base images like linux, busybox etc.

Dockerfile Content:

FROM scratch
COPY ./hello /
ENTRYPOINT [ "/hello" ]

Layer Content:

cd ~/scratch_example/4c527f87244a5da206b6f68553f458e08522614bd5aa571d45b7671923b3ef4b/layer_content/layer && tree
.
└── hello

0 directories, 1 file

Here, we’ve used scratch as base image which is an empty docker image. It’s useful for creating small sized docker images. As shown in the layer content, there’s only one executable file hello in the docker image which is set to run via an entrypoint.

Multi Stage

Multistage docker build is used to enhance readability and maintainability of docker image as groups of instructions get divided into multiple stages.

Dockerfile Content:

FROM golang:1.21.6-alpine3.18 AS build


WORKDIR /go/src/app
COPY ./go_server/* /go/src/app/
RUN go mod download
RUN GOOS=linux go build -o /go/bin/app -v .


FROM scratch
COPY --from=build /go/bin/app /
EXPOSE 8080
ENTRYPOINT [ "/app" ]

Here, we’ve used a multistage docker image to show what layer content looks like in such a scenario.

Directory Structure:

cd ~/multistage_image && tree -L 2
.
├── 26e64755810479a9ea5a9e199eaa59282bff7afa2f8b13591098d79c5244233a.json
├── a3f965528f0fe10cc632272203a173270f13b06b58af940f271da60524af7d87
│   ├── json
│   ├── layer_content
│   ├── layer.tar
│   └── VERSION
├── manifest.json
└── repositories

2 directories, 6 files

Layer Content:

cd ~/multistage_image/a3f965528f0fe10cc632272203a173270f13b06b58af940f271da60524af7d87/layer_content/layer && tree
.
└── app

0 directories, 1 file

As shown in the layer content, the final image contains only the second stage of build which is the final stage and has one layer which is created because we used a copied app executable from the first stage.

OverlayFS in Docker

Pretty good so far! isn’t it? As we have seen, docker image is nothing but a bunch of different layers and some metadata. Before we start with the docker build process we have to understand union filesystem Overlayfs. The Union File System allows merging the contents of one or more file systems [directories] while keeping the content [physically] separate.

What is Overlay?

OverlayFS deals with layers of content – one or more lower layers and one upper layer. The lower layers are treated as read-only and the upper layer as read-write. OverlayFS provides a unified view of these layers through a union mount. It is used to overlay an upper directory tree on top of a lower directory tree – these are virtually merged even if they belong to different filesystems. The interesting thing about OverlayFS is that the lower directory tree is read-only, while the upper partition can be modified. Basically overlay contains these directories:

  • The lower directory of the filesystem is read-only.
  • The upper directory of the filesystem can be both read to and written from.
  • Merged directory is unified view of the lowerdir(s) and the upperdir. Read operations are handled with a copy-on-write policy: when a container modifies a file system object, OverlayFS clones it into the upperdir and modifies the copy.
  • There is a 4th layer called the work directory. This layer serves as a temporary workspace for preparing changes before they are merged into the merged.

When a process reads a file, the overlayfs filesystem driver looks in the upper directory and reads the file from there if it’s present. Otherwise, it looks in the lower directory. When a process writes a file, overlayfs will just write it to the upper directory.

let’s make an overlay with mount

$ mkdir upper lower merged work
$ echo "File 1 in lower dir!" > lower/f1.txt
$ echo "File 2 in lower dir!" > lower/f2.txt


# `f3.txt` is in both directories
$ echo "File 3 in lower dir!" > lower/f3.txt
$ echo "File 3 in upper dir!" > upper/f3.txt


$ echo "File 4 in upper dir!" > upper/f4.txt
$ cd ..
$ tree test/

test/
├── lower
│   ├── f1.txt
│   ├── f2.txt
│   └── f3.txt
├── merged
├── upper
│   ├── f3.txt
│   └── f4.txt
└── work

Merged will be the union mount, where the unified view of lower and upper directories will be available. Now, we can use the mount command to perform a union mount.

$ sudo mount -t overlay overlay
  -o lowerdir=/home/cherish/test/lower,upperdir=/home/cherish/test/upper,workdir=/home/cherish/test/work
  /home/cherish/test/merged




> Note: To unmout these directories:
# $ unmount /home/cherish/test/merged

The mount command above mounts the directories, lowerdir and upperdir, as a union mount under the merged folder. The workdir can be considered a temporary scratch space copying files from lower to upper.

$ cat merged/f3.txt
File 3 in upper dir!

> Note: While working with union fs we have to perform changes in the unified mount. Here it is merged directory.

It worked! And the contents of our directories are what we’d expect:

$ find lower/ upper/ merged/
lower/
lower/f1.txt
lower/f2.txt
lower/f3.txt
upper/
upper/f3.txt
upper/f4.txt
merged/
merged/f1.txt
merged/f2.txt
merged/f3.txt
merged/f4.txt

Adding new files

When adding a new file, OverlayFS adds it to the upper layer.

echo 'new file' > merged/new_file
find lower/ upper/ merged/
lower/
lower/f1.txt
lower/f2.txt
lower/f3.txt
upper/
upper/f3.txt
upper/f4.txt
upper/new_file
merged/
merged/f1.txt
merged/f2.txt
merged/f3.txt
merged/f4.txt
merged/new_file

Removing files

Removing a file from the OverlayFS directory will successfully remove a file from the upper directory, but if the file belongs to the lower directory, OverlayFS will simulate that removal by creating a whiteout file. A whiteout is created as a character device with 0/0 device number or as a zero-size regular file with the xattr “trusted.overlay.whiteout”.

If we remove file from upper dir:
$ rm merged/new_file
$ find lower/ upper/ merged/
lower/
lower/f1.txt
lower/f2.txt
lower/f3.txt
upper/
upper/f3.txt
upper/f4.txt
merged/
merged/f1.txt
merged/f2.txt
merged/f3.txt
merged/f4.txt
If we remove file from lower dir:
$ rm merged/f1.txt
$ find lower/ upper/ merged/
lower/
lower/f1.txt
lower/f2.txt
lower/f3.txt
upper/
upper/f1.txt
upper/f3.txt
upper/f4.txt
merged/
merged/f2.txt
merged/f3.txt
merged/f4.txt

$ ls -l upper/f1.txt

c--------- 1 root root 0, 0 May 6 16:02 upper/in_lower.txt

So it generates whiteout as character file and place it to upper directory and it doesn’t appear in unified view of both directories i.e. in merged directory

Modifying files

Any changes made to the files from the upper directory tree will be carried out as usual. However, any changes made to the lower tree are temporary and stored on the view level. This means that a copy of the modified files will be created in the upper directory and undergo the changes instead of the original file in the lower layer.

$ cat merged/f2.txt
"File 2 in lower dir!"
$ echo "Modified content" > merged/f2.txt
$ cat merged/f2.txt
"Modified content"
$ cat upper/f2.txt
"Modified content"
$ cat lower/f2.txt
"File 2 in lower dir!"
Copy On Write(COW)

When files at the bottom layer are about to be modified, since the bottom layer is accessed read-only, they first need to be copied into the top layer. This operation is commonly referred to as copy-up.

Opaque folder

When a new directory is created and the bottom layer contains a directory with the same name, the files from the bottom layer would be visible. This is not desirable as newly created directories are supposed to be empty. To fix this defect, new directories are created with a special “opaque” flag set: this flag indicates that bottom-layer directory contents shall remain hidden.

Building a docker image:

Let's assume we have the following dockerfile we want to use to build an image from:

FROM ubuntu
RUN apt-get update
...

At high level, this is how docker builds an image out of it:

  1. Docker downloads the tarball for the image specified in the "FROM" and unpacks it. This is the first layer of the image. Lowerdir contains a read-only filesystem from the base image. The upper dir is an empty folder.

  2. On every run command a new intermediate container is spawned and the upper dir contains changes regarding each intermediate container. Docker creates a new layer in the overlay filesystem with changes introduced by that instruction.

  3. When the command is executed, it zips the upper layer. This is the new layer of the image we are building.

  4. For each command in dockerfile, process from the second step is repeated using the layers created so far as the lower directory.

Conclusion:

In conclusion, Docker images are composed of multiple storage layers, which are fundamental to Docker’s efficient and lightweight containerization process. Each layer in a Docker image is immutable and stored in a layered filesystem, such as OverlayFS, which allows for file and directory changes to be recorded in an upper layer without altering the lower layers. This layered approach enables Docker to reuse layers across different images, reducing storage usage and build times.

References:

https://windsock.io/explaining-docker-image-ids

https://earthly.dev/blog/docker-image-storage-on-host

https://docs.docker.com/build/building/multi-stage/ https://docs.kernel.org/filesystems/overlayfs.html https://martinheinz.dev/blog/44