0

I am building a docker image that contains LLMs so in the end I got a kind of big image

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
custom-ollama-custom-vision-llava250314-e1x-c latest 2a51a22d7005 18 minutes ago 17.6GB

This container was built from transforming gguf files which were also big. But my system is running out of space so I did $ sudo find / -type f -size +1G -exec ls -lh {} \; 2>/dev/null | sort -k 5 -rh | head -20 and I got these interesting results

-rw-r--r-- 1 root root 13G 4月 16 12:06 /var/lib/docker/overlay2/of7kdxtrkg2kpufauknowz683/diff/root/.ollama/models/blobs/sha256-cfa7e10c7fa82c02f498ab55f3f5be06d25b28b0a6598d01a03aa61f87c08750
-rw-r--r-- 1 root root 13G 4月 16 12:06 /var/lib/docker/overlay2/coh80g8u43u43yuw97ekn8497/diff/root/.ollama/models/blobs/sha256-cfa7e10c7fa82c02f498ab55f3f5be06d25b28b0a6598d01a03aa61f87c08750
-rw-r--r-- 1 root root 13G 4月 16 12:02 /var/lib/docker/overlay2/yb776umeljvv7oajellb8ixrj/diff/temp_model_data/llava-v1.6-vicuna-7B-7b_20250314-epoch1-F16.gguf
-rw-r--r-- 1 root root 13G 4月 16 12:02 /var/lib/docker/overlay2/2ncan7h59si2r29dza5s4z7ss/diff/temp_model_data/llava-v1.6-vicuna-7B-7b_20250314-epoch1-F16.gguf
-rw-r--r-- 1 root root 13G 4月 16 11:42 /var/lib/docker/overlay2/xoi30ch9xr1cdkfuro4cuuqyl/diff/root/.ollama/models/blobs/sha256-cfa7e10c7fa82c02f498ab55f3f5be06d25b28b0a6598d01a03aa61f87c08750
-rw-r--r-- 1 root root 13G 4月 16 11:42 /var/lib/docker/overlay2/x4vr94gkwmsco51qeivr2zscp/diff/root/.ollama/models/blobs/sha256-cfa7e10c7fa82c02f498ab55f3f5be06d25b28b0a6598d01a03aa61f87c08750
-rw-r--r-- 1 root root 13G 4月 16 11:36 /var/lib/docker/overlay2/bqsculhwdq76v6hah9nj3ygij/diff/temp_model_data/llava-v1.6-vicuna-7B-7b_20250314-epoch1-F16.gguf

I use docker images in my work but I am not an expert on how images are structured and formed. So my question is what are these (huge) files? Surely they can't be part of the final image since the sum of them are bigger than the image it self, but what are they? Are they leftovers of the docker build process?

I already pruned docker so for example in docker images I just have

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
custom-ollama-custom-vision-llava250314-e1x-c latest 2a51a22d7005 30 minutes ago 17.6GB
redis 7-alpine 7d06252fad43 8 months ago 41.2MB

so I dont see any temporary or intermediate image there.

Can I safely delete the huge files above?

asked Apr 16, 2025 at 3:40
1
  • 1
    What created the image; do you have a minimal reproducible example? You shouldn't usually look inside /var/lib/docker and you can't directly modify the files there. The overlay2 directory contains both image and container filesystems, and it's possible that routine cleanup commands like docker system prune will free up some space. Commented Apr 16, 2025 at 9:51

1 Answer 1

1

You should refer to docker documentation to understand what layers are. In simple words resulting image will contain all snapshots of container filesystem which were created at building stage.

For example, I can suggest that you do something like this in your Dockerfile

FROM python
RUN wget llama_initial_weights
RUN python3 cleanup.py llama_initial_weights llama_cleaned_weights
RUN python3 transform.py llama_cleaned_weights llama_transformed_weights
RUN rm llama_cleaned_weights
RUN rm llama_initial_weights

Every command will create separate immutable snapshot layer for resulting image. So, you will have all data you created in 'hidden' layers on your hard drive even after executing rm command, the llama_initial_weights is still exists somewhere in intermediate layer.

You can try to use --squash command to remove intermediate layers. Also, as it was mentioned in comments, the docker system prune can help to clean up your hard drive from unused images.

Also you can change your Dockerfile to something not so clean, but better for layers

FROM python
RUN wget llama_initial_weights \
 && python3 cleanup.py llama_initial_weights llama_cleaned_weights \
 && python3 transform.py llama_cleaned_weights llama_transformed_weights \
 && rm llama_cleaned_weights \ 
 && rm llama_initial_weights

You can see that only one RUN was executed, thus only one layer will be created.

answered Apr 16, 2025 at 12:41
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.