Leftovers after generating a docker image

Question 1

I am building a docker image that contains LLMs so in the end I got a kind of big image

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
custom-ollama-custom-vision-llava250314-e1x-c latest 2a51a22d7005 18 minutes ago 17.6GB

This container was built from transforming gguf files which were also big. But my system is running out of space so I did $ sudo find / -type f -size +1G -exec ls -lh {} \; 2>/dev/null | sort -k 5 -rh | head -20 and I got these interesting results

-rw-r--r-- 1 root root 13G 4月 16 12:06 /var/lib/docker/overlay2/of7kdxtrkg2kpufauknowz683/diff/root/.ollama/models/blobs/sha256-cfa7e10c7fa82c02f498ab55f3f5be06d25b28b0a6598d01a03aa61f87c08750
-rw-r--r-- 1 root root 13G 4月 16 12:06 /var/lib/docker/overlay2/coh80g8u43u43yuw97ekn8497/diff/root/.ollama/models/blobs/sha256-cfa7e10c7fa82c02f498ab55f3f5be06d25b28b0a6598d01a03aa61f87c08750
-rw-r--r-- 1 root root 13G 4月 16 12:02 /var/lib/docker/overlay2/yb776umeljvv7oajellb8ixrj/diff/temp_model_data/llava-v1.6-vicuna-7B-7b_20250314-epoch1-F16.gguf
-rw-r--r-- 1 root root 13G 4月 16 12:02 /var/lib/docker/overlay2/2ncan7h59si2r29dza5s4z7ss/diff/temp_model_data/llava-v1.6-vicuna-7B-7b_20250314-epoch1-F16.gguf
-rw-r--r-- 1 root root 13G 4月 16 11:42 /var/lib/docker/overlay2/xoi30ch9xr1cdkfuro4cuuqyl/diff/root/.ollama/models/blobs/sha256-cfa7e10c7fa82c02f498ab55f3f5be06d25b28b0a6598d01a03aa61f87c08750
-rw-r--r-- 1 root root 13G 4月 16 11:42 /var/lib/docker/overlay2/x4vr94gkwmsco51qeivr2zscp/diff/root/.ollama/models/blobs/sha256-cfa7e10c7fa82c02f498ab55f3f5be06d25b28b0a6598d01a03aa61f87c08750
-rw-r--r-- 1 root root 13G 4月 16 11:36 /var/lib/docker/overlay2/bqsculhwdq76v6hah9nj3ygij/diff/temp_model_data/llava-v1.6-vicuna-7B-7b_20250314-epoch1-F16.gguf

I use docker images in my work but I am not an expert on how images are structured and formed. So my question is what are these (huge) files? Surely they can't be part of the final image since the sum of them are bigger than the image it self, but what are they? Are they leftovers of the docker build process?

I already pruned docker so for example in docker images I just have

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
custom-ollama-custom-vision-llava250314-e1x-c latest 2a51a22d7005 30 minutes ago 17.6GB
redis 7-alpine 7d06252fad43 8 months ago 41.2MB

so I dont see any temporary or intermediate image there.

Can I safely delete the huge files above?

Question 2

What created the image; do you have a minimal reproducible example? You shouldn't usually look inside /var/lib/docker and you can't directly modify the files there. The overlay2 directory contains both image and container filesystems, and it's possible that routine cleanup commands like docker system prune will free up some space.

Question 3

You should refer to docker documentation to understand what layers are. In simple words resulting image will contain all snapshots of container filesystem which were created at building stage.

For example, I can suggest that you do something like this in your Dockerfile

FROM python
RUN wget llama_initial_weights
RUN python3 cleanup.py llama_initial_weights llama_cleaned_weights
RUN python3 transform.py llama_cleaned_weights llama_transformed_weights
RUN rm llama_cleaned_weights
RUN rm llama_initial_weights

Every command will create separate immutable snapshot layer for resulting image. So, you will have all data you created in 'hidden' layers on your hard drive even after executing rm command, the llama_initial_weights is still exists somewhere in intermediate layer.

You can try to use --squash command to remove intermediate layers. Also, as it was mentioned in comments, the docker system prune can help to clean up your hard drive from unused images.

Also you can change your Dockerfile to something not so clean, but better for layers

FROM python
RUN wget llama_initial_weights \
 && python3 cleanup.py llama_initial_weights llama_cleaned_weights \
 && python3 transform.py llama_cleaned_weights llama_transformed_weights \
 && rm llama_cleaned_weights \ 
 && rm llama_initial_weights

You can see that only one RUN was executed, thus only one layer will be created.

Mullo 6024 silver badges11 bronze badges · Accepted Answer · 2025-04-16 12:41:15Z

You should refer to docker documentation to understand what layers are. In simple words resulting image will contain all snapshots of container filesystem which were created at building stage.

For example, I can suggest that you do something like this in your Dockerfile

FROM python
RUN wget llama_initial_weights
RUN python3 cleanup.py llama_initial_weights llama_cleaned_weights
RUN python3 transform.py llama_cleaned_weights llama_transformed_weights
RUN rm llama_cleaned_weights
RUN rm llama_initial_weights

Every command will create separate immutable snapshot layer for resulting image. So, you will have all data you created in 'hidden' layers on your hard drive even after executing rm command, the llama_initial_weights is still exists somewhere in intermediate layer.

You can try to use --squash command to remove intermediate layers. Also, as it was mentioned in comments, the docker system prune can help to clean up your hard drive from unused images.

Also you can change your Dockerfile to something not so clean, but better for layers

FROM python
RUN wget llama_initial_weights \
 && python3 cleanup.py llama_initial_weights llama_cleaned_weights \
 && python3 transform.py llama_cleaned_weights llama_transformed_weights \
 && rm llama_cleaned_weights \ 
 && rm llama_initial_weights

You can see that only one RUN was executed, thus only one layer will be created.

CollectivesTM on Stack Overflow

Leftovers after generating a docker image

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related