Downloading specific version like 3.11.1
from https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz
and installing ./configure --enable-optimizations && make install
is slow (30 - 40 mins), on GitHub actions.
Dockerfile:
FROM --platform=linux/amd64 ubuntu:22.04 as base
USER root
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
ENV DEBIAN_FRONTEND noninteractive
COPY . /app
WORKDIR /app
COPY ZscalerCertificate.crt /usr/local/share/ca-certificates/ZscalerCertificate.crt
RUN find /tmp -name \*.deb -exec rm {} +
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y software-properties-common ca-certificates &&\
update-ca-certificates
RUN apt-get update &&\
apt-get upgrade -y && \
apt-get install -y --no-install-recommends curl gcc g++ gnupg unixodbc-dev openssl git &&\
rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get upgrade
RUN apt-get install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libssl-dev \
libreadline-dev libffi-dev wget libbz2-dev libsqlite3-dev
RUN mkdir /python && cd /python
RUN wget https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz
RUN tar -zxvf Python-3.11.1.tgz
RUN cd Python-3.11.1 && ls -lhR && ./configure --enable-optimizations && make install
What is the optimal way to install python specific version? and how to reduce the build time?
2 Answers 2
This is a bad layer:
RUN apt-get update && apt-get upgrade
That locks in the package lists permanently in a layer. Worse, it means changes to the next layer may use a cached version of this layer instead of getting an update. That could mean the apt-get install
tries to get packages no longer in the archive.
Combine it with the install command and post-install size reduction, as you did for the earlier package installs.
Each of those ought to apt-get clean
, too, to remove the contents of /var/cache/apt/archives
, which tend to be large.
Similarly, this one leaves the large archive lying around:
wget https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz
We should be immediately unpacking then removing it (all in a single layer, to keep the container image reasonably small). Preferably, we should unpack, build and clean, keeping just the installed result.
-
\$\begingroup\$ For my understanding: isn't changes to the next layer may use a cached version of this layer instead of getting an update a feature instead of a bug? If a layer does not have caching as its purpose, what is it for? \$\endgroup\$Reinderien– Reinderien2023年08月17日 15:14:19 +00:00Commented Aug 17, 2023 at 15:14
-
\$\begingroup\$ Thanks for providing the review. I get to know some new things, any suggestions on improving the build time? \$\endgroup\$Python coder– Python coder2023年08月17日 15:32:19 +00:00Commented Aug 17, 2023 at 15:32
-
\$\begingroup\$ I think you're unlikely to improve the build time, but reducing the container size may give you speed and cost improvements by transferring less data. The trouble with using cached package indexes is that when the package repositories are updated, we'll be using old lists of what's available (i.e. the cache is stale). So we always want
apt-get update
in the same layer asapt-get install
. \$\endgroup\$Toby Speight– Toby Speight2023年08月17日 15:51:09 +00:00Commented Aug 17, 2023 at 15:51 -
\$\begingroup\$ I think you might be able to declare
/var/cache
as a volume, so that it's never included in any layers. I haven't tried that though - perhaps someone else can verify whether that's a good practice? \$\endgroup\$Toby Speight– Toby Speight2023年08月17日 15:52:12 +00:00Commented Aug 17, 2023 at 15:52 -
\$\begingroup\$ An update on the solution, Size got reduced by 100MB and no change in build time. \$\endgroup\$Python coder– Python coder2023年08月18日 14:13:31 +00:00Commented Aug 18, 2023 at 14:13
There are a couple optimisations you can do here.
First and foremost, you should re-order your commands, so that the earlier layers are the least likely to change and latter layers are the ones that are more likely to change. In general, you should install your dependencies first before COPY-ing files belonging to the application that will run on it. This way, you take better advantage of caching. COPY .
makes a very volatile layer as any change in any files would invalidate all of the subsequent layers. If you really have to do that, do that as close as possible to the final step.
FROM --platform=linux/amd64 ubuntu:22.04 as base
USER root
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
ENV DEBIAN_FRONTEND noninteractive
# is this step really necessary? there shouldn't be anything in /tmp
# RUN find /tmp -name \*.deb -exec rm {} +
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y software-properties-common ca-certificates &&\
update-ca-certificates
RUN apt-get update &&\
apt-get upgrade -y && \
apt-get install -y --no-install-recommends curl gcc g++ gnupg unixodbc-dev openssl git &&\
rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get upgrade
RUN apt-get install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libssl-dev \
libreadline-dev libffi-dev wget libbz2-dev libsqlite3-dev
RUN mkdir /python && cd /python
RUN wget https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz
RUN tar -zxvf Python-3.11.1.tgz
RUN cd Python-3.11.1 && ls -lhR && ./configure --enable-optimizations && make install
COPY . /app
WORKDIR /app
COPY ZscalerCertificate.crt /usr/local/share/ca-certificates/ZscalerCertificate.crt
That way, merely changing Dockerfile or certificate wouldn't require an entire reinstallation and recompilation of Python.
Second, to minimize image sizes, avoid creating layers with unnecessary cached files:
FROM --platform=linux/amd64 ubuntu:22.04 as base
USER root
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
ENV DEBIAN_FRONTEND noninteractive
# is this step really necessary? there shouldn't be anything in /tmp
# RUN find /tmp -name \*.deb -exec rm {} +
RUN apt-get update &&\
apt-get upgrade -y && \
apt-get install -y --no-install-recommends curl gcc g++ gnupg unixodbc-dev openssl git &&\
apt-get install -y software-properties-common ca-certificates &&\
apt-get install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libssl-dev libreadline-dev libffi-dev wget libbz2-dev libsqlite3-dev && \
update-ca-certificates && \
rm -rf /var/lib/apt/lists/*
RUN mkdir /python && cd /python && \
wget https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz && \
tar -zxvf Python-3.11.1.tgz && \
cd Python-3.11.1 && \
ls -lhR && \
./configure --enable-optimizations && \
make install && \
rm -rf /python
COPY . /app
WORKDIR /app
COPY ZscalerCertificate.crt /
Third, you can go further than this to slim down the image by removing unnecessary apt-get dependencies as well. There are two approach to this:
- use multi-stage build and use
COPY --from
to copy just the Python files that you need on a fresh image. - Or do what the official debian slim python did, which is to use apt-mark to uninstall unnecessary packages. This need to happen in the same layer as the entire python compile and install step to avoid creating bloated intermediate images.
-
\$\begingroup\$ Thanks for the details. Can you add more details on multi-stage build? What are the paths? From which path python executables need to copy and in which path should it need to be store in final stage? \$\endgroup\$Python coder– Python coder2023年08月18日 05:55:28 +00:00Commented Aug 18, 2023 at 5:55
-
\$\begingroup\$ @Pythoncoder you can use
docker diff
to find the list of files that changed since the container was created from the image. You can runmake
as a dockerfile step, then runmake install
, find thedocker diff
, and use that list of files. Alternatively, make a deb package so all the file you need to COPY is in a single self contained file (though this doubles the image diff size). \$\endgroup\$Lie Ryan– Lie Ryan2023年08月19日 00:07:43 +00:00Commented Aug 19, 2023 at 0:07
Explore related questions
See similar questions with these tags.
make -j
with your number of cores for parallelism. \$\endgroup\$