Dockerfile: runtime definition

A Dockerfile is to a project what sheet music is to a song - it ensures that the same operations are always performed, in the same order, with the same result.

What is a `Dockerfile`

Imagine a README.md file in a project or a page on Confluence, where all the steps required for setting up an application’s runtime are included - Dockerfile is just such an instruction, only automated 😁. This is a kind of simplification, but the fact is that the Dockerfile provides us with a consistent runtime environment, which thanks to this file will be the same for everyone working on the project.

So what can such a file define? For example:

operating system: it is the base for further operations and the foundation of the environment.
installation of system packages: using the package manager built into the operating system, we can install everything we need for the proper functioning of the application.
environment for the programming language: everything needed for our code to run properly. Typically, programming languages provide official images that have a ready-made environment and tools to extend it (e.g. PHP images contain pecl, which is a library of extensions that can be added on demand). Using them, we can easily extend the default environment with additional extensions and, if necessary, provide custom configurations.
listened ports: this is a kind of contract with the outside world. The container launched from the built image will be able to communicate through these ports. For example, the MySQL database image exposes port 3306, and you can connect to it later.

In addition to the above-mentioned fundamental things, the Dockerfile can define many other operations, but it’s impossible to list them all because every application is different and has different requirements. Need to create a default directory structure for later use? Need to include a set of files in the image that the application will use later? No problem 🙂. You can do basically any operations - the only limitation is the Dockerfile syntax and the operating system used as the basis for our image.

`Dockerfile` syntax

Full syntax documentation is available on the official website and I strongly encourage you to read it. In this article, however, we will focus on the basic instructions offered by Dockerfile, which will allow us to prepare for more advanced operations, which will be discussed in series’ subsequent posts. The others will be discussed briefly or even omitted and, if necessary, will appear later in this series.

Basically, a Dockerfile is a set of instructions, arranged in the order in which they are executed. The way we arrange them determines how such an image will be built, as well as what its final content will be. Optimizing these instructions is a topic for a separate entry, at this point we will only focus on what each of them do.

`FROM`

The very basic instruction is FROM, and it starts every build definition. We can assign an alias to this instruction, thus defining the so-called build target, and this is done by adding as <alias>. There can be many such targets in the Dockerfile file. When executing docker build, we can indicate such a target by passing the --target <name> flag, and if we do not specify it explicitly, the last target defined in Dockerfile will be built.

The FROM instruction can take several values as an argument:

FROM scratch: as you can easily guess, we start with a “blank page”, our image contains nothing.
FROM php:8.2: the basis for further instructions will be the PHP image in version 8.2. After the colon, we specify the expected version, and the list of tags for a given image can be found in the registry we use, in this case in Docker Hub. If the version is not specified explicitly, then the latest tag indicating the latest version is used by default (or at least that’s the convention, because the existence of the latest tag must be guaranteed by the person/team responsible for the build and publication).
FROM other-target: in this case we indicate that the base for further operations is another target defined in Dockerfile. Such constructions are automatically resolved by the Docker engine, so when building one of the targets, we do not need to build its dependencies (other-target) beforehand - it will be built automatically, and then it will be used as the base for the next target. In case we refer to a target that doesn’t exist, the build will fail, and we’ll get an error message.

In summary, the entry FROM php:8.2-cli-alpine as php-base defines the php-base target, which will be built on top of the official php image in version 8.2-cli-alpine, containing the PHP runtime for CLI in the Alpine Linux operating system.

`RUN`

The RUN instruction is probably the most commonly used statement in the Dockerfile, although this of course depends on its specifics. In any case, its purpose is to execute the indicated command within the environment available during the image build process. Therefore, if we use the previously mentioned php:8.2-cli-alpine as a base, then our environment is Alpine Linux and each RUN can run commands available within this system or those that we install ourselves using the apk package manager.

It’s useful to think of RUN as a CLI command, whether on your own machine or a remote one via SSH. We simply have a specific environment (usually operating systems such as Debian or the aforementioned Alpine Linux) and within it, we execute all kinds of commands as if we were preparing a local environment or server to ensure the correct runtime for our application.

Thus, RUN apk add git (Alpine) or RUN apt-get install git (Debian) will install the git package for us, which we can later use in the further part of the build process or in the target container that was created using the built image.

`COPY`

The COPY statement is the link between the local file system (the one from which build is executed) and the image being built. Thanks to it, we can add to the built image any files that are needed for later operation. In the case of applications, the entire application folder is usually copied, but it is worth noting right away that we can limit this context by using the .dockerignore file, in which we define which paths are to be omitted during copying (so we can skip folders such as vendor or node_modules). The notation COPY . . tells that all files (except ignored ones) will be copied into the image, into its working directory. Of course, it is possible to specify more specific paths, such as COPY ./bin/example /usr/bin/example, we have full flexibility in this matter.

It is worth noting that the COPY instruction can use not only the local file system as a source, but also already built images or other build targets. For this purpose, use the entry COPY --from=..., specifying the source from which the files are to be copied.

Another very helpful flag is --chown, thanks to which we can define their owner already at the moment of copying files. Very often our application must be fully available to the user who runs the web server, so it is common practice to execute COPY --chown=www-data:www-data. Thanks to this flag, we don’t have to perform two operations: COPY and RUN chmod 🙂.

`ADD`

The ADD statement basically does the same as COPY, but it has additional functionality - it can copy files from remote locations and handle local TAR files (source).

In the case of copying files from remote locations, it is possible to verify the checksum of the file by using the --checksum=<checksum> option.

`ENV`

The ENV instruction is used to define environment variables that are available during the next steps of building the image, and finally also in the container launched from such image. Therefore, be careful when defining these types of environment variables, as they may affect the behavior of the tools contained in the image.

`ARG`

ARG is similar to ENV, but differs from it in that the life cycle of ARG is limited to image building process. ARG can take default values which can be overridden (or simply supplied) later using --build-arg <name>=<value> option.

Variables passed in this way can affect the build process and the resulting image. For example, by doing ARG CLI_VERBOSITY='' we could declare the default verbosity of the messages returned by CLI comments, and then build the image using docker build --build-arg CLI_VERBOSITY=-vvv. Then all that’s left is to use this variable in commands, for example, RUN bin/console cache:clear $CLI_VERBOSITY. This may not be a particularly real life example, but it shows the principle of operation 😉.

ARG should not be used to pass sensitive data (passwords, tokens) - that’s what secrets are for, but it’s a topic for another post. The reason for this is the values of these arguments are permanently stored in the image and can be accessed using the docker history command.

`WORKDIR`

This simple statement defines the path in which any RUN, COPY, ADD, CMD and ENTRYPOINT statements that are defined after WORKDIR will be executed. In practice, this means that if we do WORKDIR /app and then RUN bin/console, we assume that there is an executable in the path /app/bin/console - if it doesn’t, we will of course see an error, and the build process will be aborted.

`CMD`

CMD defines the default run command for the container. For example, for php:8.2.3-fpm it is CMD ["php-fpm"], which means that PHP-FPM will be started when the container starts.

There can only be one CMD statement in the image, so for each build target the last one is used.

Usually when building images for our applications we don’t need to define CMD, because when using base images like PHP we already have it defined. However, nothing stands in the way of adapting the manual to your needs.

`ENTRYPOINT`

The ENTRYPOINT topic is quite complicated, and you could write an entire dedicated article about it and CMD, but in short this statement defines the starting point for a container. To simplify: we make the container to be considered as a command (executable script), so during its launch we can pass additional flags/arguments that will be passed to the entrypoint. For example, docker run <image> -d will pass -d to the entrypoint.

`USER`

As the name suggests, USER is used to define a user (and optionally a group) that will be used to perform all operations during image building, as well as to run ENTRYPOINT and CMD.

`EXPOSE`

If the container is to provide an interface to communicate with the services it contains, we should use the EXPOSE statement. It defines the ports that are listened to inside the container (TCP and UDP protocols are supported, the former is the default).

For example, in images containing a web server, you can find EXPOSE 80/tcp, which means that the container is listening on port 80 in the TCP protocol. These ports can then be accessed from the outside by doing docker run -p 80:80/tcp <image>.

`VOLUME`

If we need a filesystem interface between the container and the system the container is running on, we can use the VOLUME statement. It defines the so-called mount point and causes the files contained in such a volume to be synchronized to the operating system. More on volumes can be found in this article

`HEALTHCHECK`

It’s one thing to run a container, but to monitor that it’s running properly all the time is quite different. HEALTHCHECK can help us with this, it defines the way how the container can be checked if it’s operational (and if not, it can be automatically restarted).

`Dockerfile` example

FROM php:8.2-cli-alpine

WORKDIR /app

# See: https://twitter.com/_Codito_/status/1587052303869267968 
COPY --from=composer/composer:2-bin /composer /usr/bin/composer

# Install some PHP extensions, then clean up things a bit.
# It's important to do it in the same `RUN`, so there are no leftovers in the image.
RUN apk add --no-cache icu \
    && apk add --no-cache --update \
      --virtual .build-deps \
      $PHPIZE_DEPS \
      icu-dev \
      linux-headers \
    && pecl install xdebug-3.2.0 \
    && docker-php-ext-install intl \
    && docker-php-ext-enable xdebug \
    && apk del -f .build-deps

# This will copy all local files (from where `Dockerfile` is used) \
# to `/app` (which was set as WORKDIR above).
COPY . .

# Prepare app's runtime by installing Composer dependencies.
RUN composer install --no-dev --no-scripts

Validating `Dockerfile`

A good IDE should help us write a Dockerfile in terms of allowed statements and their syntax. However, there are tools like Hadolint that also allow you to keep an eye on good practices when creating a Dockerfile. Information on how to use hadolint can be found on the project’s website 🙂.

Hadolint is the project where I had my first (and so far only) contact with Haskell programming language - I added support to the tool for verifying COPY statements that use external images. It started with a discussions because the rule DL3022 was giving me trouble 😅, and ended with my Merge Request.

Summary

All of the above information can be overwhelming at first, especially if you don’t have experience with Docker. The truth is, however, that you do not need to know all of this to start your adventure with containerisation 🙂. You may never need some instructions available in the Dockerfile (e.g. I don’t remember using VOLUME personally), some may appear later on as the project and its CI/CD processes grow, which will require more advanced implementations.

In the next post in this series, we’ll look at the second file that is extremely important from Docker’s perspective - the Compose file (compose.yaml), which defines the stack in which the application is run. See ya!

What is a Dockerfile#

Dockerfile syntax#

FROM#

RUN#

COPY#

ADD#

ENV#

ARG#

WORKDIR#

CMD#

ENTRYPOINT#

USER#

EXPOSE#

VOLUME#

HEALTHCHECK#

Dockerfile example#

Validating Dockerfile#

Summary#

TL:DR;#