I have a Dockerfile with ubuntu:22.04
as a base image and a manual Python installation(specific version i.e., 3.11.1) layer which takes a long time.
Approach 1:
- Build a custom_image with ubuntu:22.04 as a base image and a manual Python installation layer.
- Create another Dockerfile with custom_image as the base image and add other layers of the image
- Cache the custom_image using
docker/build-push-action@v5
action
Approach 2:
Use
ubuntu-latest
runner withactions/setup-python
GitHub action with a specific version.Once Python setup is successful, copy Python binaries to Docker context like
cp -r $RUNNER_TOOL_CACHE/Python/3.11.1/* Python/3.11.1
Copy these files into docker and update the
PATH
variable like# Copy Python binaries from the GitHub Actions workflow COPY Python /opt/Python # Update PATH to include the copied Python binaries ENV PATH="/opt/Python/3.11.1/x64/bin:/opt/Python/3.11.1/x64:${PATH}" # Set the LD_LIBRARY_PATH to include the copied shared libraries ENV LD_LIBRARY_PATH="/opt/Python/3.11.1/x64/lib:${LD_LIBRARY_PATH}"
Dockerfile:
FROM --platform=linux/amd64 ubuntu:22.04 as base
USER root
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
ENV DEBIAN_FRONTEND noninteractive
COPY . /app
WORKDIR /app
COPY ZscalerRootCertificate-2048-SHA256.crt /usr/local/share/ca-certificates/ZscalerRootCertificate-2048-SHA256.crt
# Copy Python binaries from the GitHub Actions workflow
COPY Python /opt/Python
# Update PATH to include the copied Python binaries
ENV PATH="/opt/Python/3.11.1/x64/bin:/opt/Python/3.11.1/x64:${PATH}"
# Set the LD_LIBRARY_PATH to include the copied shared libraries
ENV LD_LIBRARY_PATH="/opt/python/lib:${LD_LIBRARY_PATH}"
# Fail soon than later, if python wasn't installed
RUN python --version
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y software-properties-common ca-certificates && \
update-ca-certificates
RUN python -m pip install -U pip
RUN python3.11 -m pip install --no-cache-dir --trusted-host pypi.org --trusted-host files.pythonhosted.org -r requirements/core-requirements.txt
GitHub workflow:
name: Docker Image CI
env:
CONTAINER_NAME: my-use-case
on:
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
permissions:
id-token: write
contents: write
jobs:
integration-tests:
name: Integration Test
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: View directory files
run: |
pwd
echo "$PATH"
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11.1"
- name: Copy Python binaries to Docker context
run: |
mkdir -p Python/3.11.1
cp -r $RUNNER_TOOL_CACHE/Python/3.11.1/* Python/3.11.1
- name: View directory files
run: |
ls -lhR
echo "$PATH"
- name: Install dependencies
run: |
python --version
python -m pip install --upgrade pip
- name: View directory files
run: |
ls -lhR
- name: Build & push docker image
uses: docker/build-push-action@v2
with:
context: .
push: false
tags: ${{ env.CONTAINER_NAME }}:${{ github.run_number }}
file: ./Dockerfile
-
\$\begingroup\$ Please do not edit the question, especially the code, after an answer has been posted. Changing the question may cause answer invalidation. Everyone needs to be able to see what the reviewer was referring to. What to do after the question has been answered. \$\endgroup\$pacmaninbw– pacmaninbw ♦2023年12月09日 13:45:51 +00:00Commented Dec 9, 2023 at 13:45
-
1\$\begingroup\$ @pacmaninbw Thanks for suggesting. I asked new question instead of editing and accepted the closest answer. The new question is how-to-reduce-the-build-time-cache-docker-image-layer-in-subsequent-github-action-runs \$\endgroup\$shaik moeed– shaik moeed2023年12月09日 16:18:34 +00:00Commented Dec 9, 2023 at 16:18
2 Answers 2
-
COPY ZscalerRootCertificate-2048-SHA256.crt /usr/local/share/ca-certificates/ZscalerRootCertificate-2048-SHA256.crt
The command here runs, presumably, on a file. When Docker performs the copy, Docker compares the file with the file in the cache. If the files are deemed to be the same then Docker skips the step and moves on using the previously cached image.
As such if you have the above
COPY
before any otherCOPY
s then the copy will be cached. -
COPY . /app COPY Python /opt/Python
Copying directories to the docker build scope invalidates the cache. And acts as if the files are not the same, even if the files are the same.
If you can change the action to be copying a file (say a
.zip
or.tar.bz
) then you can cache the interaction. -
RUN apt-get update && \ apt-get upgrade -y && \ apt-get install -y software-properties-common ca-certificates && \ update-ca-certificates
I don't see why the above
RUN
command needs to be below theCOPY
operations. If you don't want to copy an archive then you probably can move the command higher to allow caching to happen. -
RUN python -m pip install -U pip RUN python3.11 -m pip install --no-cache-dir --trusted-host pypi.org --trusted-host files.pythonhosted.org -r requirements/core-requirements.txt
Likewise I'm not sure you need to run the commands after any of the
COPY
commands. You can have a singleCOPY
command for the requirements, and possibly any configs you need.Additionally I normally position the commands in a builder which focuses on building a
venv
. I use the standardizedpython
image for the version I'm building against. I then copy the files to optimize caching and then just have a fat 'copy everything and don't care about caching' core.# Not version pinned, doesn't work FROM python:3 AS builder WORKDIR /build RUN python -m pip install virtualenv RUN python -m virtualenv venv COPY requirements.txt requirements.txt RUN venv/bin/python -m pip install -r requirements.txt # invalidates the cache COPY . . RUN venv/bin/python -m pip install . FROM nginx/unit:1.26.1-python3.10 EXPOSE 80 EXPOSE 443 VOLUME ["/app/.data"] WORKDIR /app ENV PATH /app/venv/bin:$PATH # no caching needed COPY nginx_config.json /docker-entrypoint.d/config.json COPY --from=builder /build/venv venv # copy the venv from the builder RUN chown unit:unit -R .data venv
In more intricate builders I have copied configs and SSH keys to the builder and know the files won't be in the final image. Reducing the clean-up steps.
-
\$\begingroup\$ Thanks for the details on how the position of commands impacts caching. Any good resource would you recommend to get a deeper understanding of docker commands and best practices? If I place Python manual installation commands at the start, will it get cached? I will update my post with the Python manual installation command soon. \$\endgroup\$shaik moeed– shaik moeed2023年12月09日 07:06:31 +00:00Commented Dec 9, 2023 at 7:06
-
\$\begingroup\$ Is there any way that I can cache the manual Python install command in the Dockerfile in subsequent CI runs GitHub actions? \$\endgroup\$shaik moeed– shaik moeed2023年12月09日 08:08:54 +00:00Commented Dec 9, 2023 at 8:08
-
\$\begingroup\$ As per community rules, I have to ask this as new question, can you please help with that? The new question is how-to-reduce-the-build-time-cache-docker-image-layer-in-subsequent-github-action-runs? \$\endgroup\$shaik moeed– shaik moeed2023年12月09日 16:19:24 +00:00Commented Dec 9, 2023 at 16:19
-
1\$\begingroup\$ @shaikmoeed When I learnt the Dockerfile commands I partially rewrote the Dockerfile reference page to have both my own cheat sheet, and to fasten my learning. From memory the Docker reference page didn't go too much into caching, so through trial and error I figured out which commands to place where to minimize cache invalidation.
RUN
commands only invalidate the cache if the command changes -- so treat the command as never invalidate the cache. Then you just want fileCOPY
commands at the top, followed byRUN
and then dirCOPY
s. \$\endgroup\$2023年12月09日 18:09:54 +00:00Commented Dec 9, 2023 at 18:09
We have two goals for a speedy build of a docker image:
- small image
- small diffs
I notice you're not yet using layers. You want to.
You start out with LTS Jammy Jellyfish as the base
layer, good.
Although you might prefer an image with the Alpine distribution,
as they have gone to some trouble to pare its functionality
to the bone. If there's an apt package installed which you never use,
then copying its bits around just costs us more time.
After the base layer, we don't take advantage of any additional CoW layers.
Depending on your requirements, you may or may not have
to ask apt
to update and upgrade on every CI/CD build.
Doing that daily or weekly might suffice.
On top of base
, use FROM base as python
to build a python
layer that
includes interpreter plus pip-installed libraries.
Again, up-to-the-minute fresh libraries might not
be a requirement for every build.
Ideally we would see a stable hash for this layer
over multiple builds, indicating no work needed to be done.
On top of python
, build an app
layer
that includes the checked-out feature branch.
Our goal in all this is to minimize diffs
within a layer from one build to the next,
so the apt
and pip
steps can complete
within a couple seconds because there is
very little to do, few diffs since last time.
Use RUN date
or other timing measurements
to help you assess whether a given stage is
doing more work than you believe it should
have to do.
Rapid docker builds are possible.
-
\$\begingroup\$ "build an
app
layer" are the layers metaphors, or does Docker contain a layering mechanism? \$\endgroup\$2023年12月08日 20:16:31 +00:00Commented Dec 8, 2023 at 20:16 -
\$\begingroup\$ If I understand correctly, I need to create three stages
base, python, app
and most of the commands in the first two stages will be cached. And I have updated my Dockerfile in the post with a manual Python install command, is there any way that I can cache that command in subsequent CI runs? \$\endgroup\$shaik moeed– shaik moeed2023年12月09日 07:56:47 +00:00Commented Dec 9, 2023 at 7:56 -
2\$\begingroup\$ I think this answer conflates two docker concepts, namely layers and stages. Each docker command in a Dockerfile will introduce a new layer into the produced image. These layers are cached aggressively during build and pull operations. You can separate a build process into stages, i.e. groups of layers, which can additionally optimize the build but the primary use is to reduce image size by separating buildtime environment from runtime environment. \$\endgroup\$Vogel612– Vogel6122023年12月09日 09:36:43 +00:00Commented Dec 9, 2023 at 9:36
-
\$\begingroup\$ As per community rules, I have to ask this as new question, can you please help with that? The new question is how-to-reduce-the-build-time-cache-docker-image-layer-in-subsequent-github-action-runs? \$\endgroup\$shaik moeed– shaik moeed2023年12月09日 16:19:36 +00:00Commented Dec 9, 2023 at 16:19