5
\$\begingroup\$

I have a Dockerfile with ubuntu:22.04 as a base image and a manual Python installation(specific version i.e., 3.11.1) layer which takes a long time.

Approach 1:

  1. Build a custom_image with ubuntu:22.04 as a base image and a manual Python installation layer.
  2. Create another Dockerfile with custom_image as the base image and add other layers of the image
  3. Cache the custom_image using docker/build-push-action@v5 action

Approach 2:

  1. Use ubuntu-latest runner with actions/setup-python GitHub action with a specific version.

  2. Once Python setup is successful, copy Python binaries to Docker context like cp -r $RUNNER_TOOL_CACHE/Python/3.11.1/* Python/3.11.1

  3. Copy these files into docker and update the PATH variable like

    # Copy Python binaries from the GitHub Actions workflow
    COPY Python /opt/Python
    # Update PATH to include the copied Python binaries
    ENV PATH="/opt/Python/3.11.1/x64/bin:/opt/Python/3.11.1/x64:${PATH}"
    # Set the LD_LIBRARY_PATH to include the copied shared libraries
    ENV LD_LIBRARY_PATH="/opt/Python/3.11.1/x64/lib:${LD_LIBRARY_PATH}"
    

Dockerfile:

FROM --platform=linux/amd64 ubuntu:22.04 as base
USER root
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
ENV DEBIAN_FRONTEND noninteractive
COPY . /app
WORKDIR /app
COPY ZscalerRootCertificate-2048-SHA256.crt /usr/local/share/ca-certificates/ZscalerRootCertificate-2048-SHA256.crt
# Copy Python binaries from the GitHub Actions workflow
COPY Python /opt/Python
# Update PATH to include the copied Python binaries
ENV PATH="/opt/Python/3.11.1/x64/bin:/opt/Python/3.11.1/x64:${PATH}"
# Set the LD_LIBRARY_PATH to include the copied shared libraries
ENV LD_LIBRARY_PATH="/opt/python/lib:${LD_LIBRARY_PATH}"
# Fail soon than later, if python wasn't installed
RUN python --version
RUN apt-get update && \
 apt-get upgrade -y && \
 apt-get install -y software-properties-common ca-certificates && \
 update-ca-certificates
RUN python -m pip install -U pip
RUN python3.11 -m pip install --no-cache-dir --trusted-host pypi.org --trusted-host files.pythonhosted.org -r requirements/core-requirements.txt

GitHub workflow:

name: Docker Image CI
env:
 CONTAINER_NAME: my-use-case
on:
 workflow_dispatch:
concurrency:
 group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
 cancel-in-progress: true
permissions:
 id-token: write
 contents: write
jobs:
 integration-tests:
 name: Integration Test
 runs-on: ubuntu-latest
 
 steps:
 - name: Checkout code
 uses: actions/checkout@v3
 - name: View directory files
 run: |
 pwd
 echo "$PATH"
 - name: Set up Python
 uses: actions/setup-python@v4
 with:
 python-version: "3.11.1"
 - name: Copy Python binaries to Docker context
 run: |
 mkdir -p Python/3.11.1
 cp -r $RUNNER_TOOL_CACHE/Python/3.11.1/* Python/3.11.1
 - name: View directory files
 run: |
 ls -lhR
 echo "$PATH"
 - name: Install dependencies
 run: |
 python --version
 python -m pip install --upgrade pip
 - name: View directory files
 run: |
 ls -lhR
 - name: Build & push docker image
 uses: docker/build-push-action@v2
 with:
 context: .
 push: false
 tags: ${{ env.CONTAINER_NAME }}:${{ github.run_number }}
 file: ./Dockerfile
pacmaninbw
26.2k13 gold badges47 silver badges113 bronze badges
asked Dec 8, 2023 at 17:58
\$\endgroup\$
2

2 Answers 2

4
\$\begingroup\$
  • COPY ZscalerRootCertificate-2048-SHA256.crt /usr/local/share/ca-certificates/ZscalerRootCertificate-2048-SHA256.crt
    

    The command here runs, presumably, on a file. When Docker performs the copy, Docker compares the file with the file in the cache. If the files are deemed to be the same then Docker skips the step and moves on using the previously cached image.

    As such if you have the above COPY before any other COPYs then the copy will be cached.

  • COPY . /app
    COPY Python /opt/Python
    

    Copying directories to the docker build scope invalidates the cache. And acts as if the files are not the same, even if the files are the same.

    If you can change the action to be copying a file (say a .zip or .tar.bz) then you can cache the interaction.

  • RUN apt-get update && \
     apt-get upgrade -y && \
     apt-get install -y software-properties-common ca-certificates && \
     update-ca-certificates
    

    I don't see why the above RUN command needs to be below the COPY operations. If you don't want to copy an archive then you probably can move the command higher to allow caching to happen.

  • RUN python -m pip install -U pip
    RUN python3.11 -m pip install --no-cache-dir --trusted-host pypi.org --trusted-host files.pythonhosted.org -r requirements/core-requirements.txt
    

    Likewise I'm not sure you need to run the commands after any of the COPY commands. You can have a single COPY command for the requirements, and possibly any configs you need.

    Additionally I normally position the commands in a builder which focuses on building a venv. I use the standardized python image for the version I'm building against. I then copy the files to optimize caching and then just have a fat 'copy everything and don't care about caching' core.

    # Not version pinned, doesn't work
    FROM python:3 AS builder
    WORKDIR /build
    RUN python -m pip install virtualenv
    RUN python -m virtualenv venv
    COPY requirements.txt requirements.txt
    RUN venv/bin/python -m pip install -r requirements.txt
    # invalidates the cache
    COPY . .
    RUN venv/bin/python -m pip install .
    FROM nginx/unit:1.26.1-python3.10
    EXPOSE 80
    EXPOSE 443
    VOLUME ["/app/.data"]
    WORKDIR /app
    ENV PATH /app/venv/bin:$PATH
    # no caching needed
    COPY nginx_config.json /docker-entrypoint.d/config.json
    COPY --from=builder /build/venv venv # copy the venv from the builder
    RUN chown unit:unit -R .data venv
    

    In more intricate builders I have copied configs and SSH keys to the builder and know the files won't be in the final image. Reducing the clean-up steps.

answered Dec 8, 2023 at 19:15
\$\endgroup\$
4
  • \$\begingroup\$ Thanks for the details on how the position of commands impacts caching. Any good resource would you recommend to get a deeper understanding of docker commands and best practices? If I place Python manual installation commands at the start, will it get cached? I will update my post with the Python manual installation command soon. \$\endgroup\$ Commented Dec 9, 2023 at 7:06
  • \$\begingroup\$ Is there any way that I can cache the manual Python install command in the Dockerfile in subsequent CI runs GitHub actions? \$\endgroup\$ Commented Dec 9, 2023 at 8:08
  • \$\begingroup\$ As per community rules, I have to ask this as new question, can you please help with that? The new question is how-to-reduce-the-build-time-cache-docker-image-layer-in-subsequent-github-action-runs? \$\endgroup\$ Commented Dec 9, 2023 at 16:19
  • 1
    \$\begingroup\$ @shaikmoeed When I learnt the Dockerfile commands I partially rewrote the Dockerfile reference page to have both my own cheat sheet, and to fasten my learning. From memory the Docker reference page didn't go too much into caching, so through trial and error I figured out which commands to place where to minimize cache invalidation. RUN commands only invalidate the cache if the command changes -- so treat the command as never invalidate the cache. Then you just want file COPY commands at the top, followed by RUN and then dir COPYs. \$\endgroup\$ Commented Dec 9, 2023 at 18:09
3
\$\begingroup\$

We have two goals for a speedy build of a docker image:

  1. small image
  2. small diffs

I notice you're not yet using layers. You want to.


You start out with LTS Jammy Jellyfish as the base layer, good. Although you might prefer an image with the Alpine distribution, as they have gone to some trouble to pare its functionality to the bone. If there's an apt package installed which you never use, then copying its bits around just costs us more time.

After the base layer, we don't take advantage of any additional CoW layers.

Depending on your requirements, you may or may not have to ask apt to update and upgrade on every CI/CD build. Doing that daily or weekly might suffice.

On top of base, use FROM base as python to build a python layer that includes interpreter plus pip-installed libraries. Again, up-to-the-minute fresh libraries might not be a requirement for every build. Ideally we would see a stable hash for this layer over multiple builds, indicating no work needed to be done.

On top of python, build an app layer that includes the checked-out feature branch.

Our goal in all this is to minimize diffs within a layer from one build to the next, so the apt and pip steps can complete within a couple seconds because there is very little to do, few diffs since last time.

Use RUN date or other timing measurements to help you assess whether a given stage is doing more work than you believe it should have to do. Rapid docker builds are possible.

answered Dec 8, 2023 at 19:22
\$\endgroup\$
4
  • \$\begingroup\$ "build an app layer" are the layers metaphors, or does Docker contain a layering mechanism? \$\endgroup\$ Commented Dec 8, 2023 at 20:16
  • \$\begingroup\$ If I understand correctly, I need to create three stages base, python, app and most of the commands in the first two stages will be cached. And I have updated my Dockerfile in the post with a manual Python install command, is there any way that I can cache that command in subsequent CI runs? \$\endgroup\$ Commented Dec 9, 2023 at 7:56
  • 2
    \$\begingroup\$ I think this answer conflates two docker concepts, namely layers and stages. Each docker command in a Dockerfile will introduce a new layer into the produced image. These layers are cached aggressively during build and pull operations. You can separate a build process into stages, i.e. groups of layers, which can additionally optimize the build but the primary use is to reduce image size by separating buildtime environment from runtime environment. \$\endgroup\$ Commented Dec 9, 2023 at 9:36
  • \$\begingroup\$ As per community rules, I have to ask this as new question, can you please help with that? The new question is how-to-reduce-the-build-time-cache-docker-image-layer-in-subsequent-github-action-runs? \$\endgroup\$ Commented Dec 9, 2023 at 16:19

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.