I’ve been working on a bash script to clean up old Docker images while keeping the latest tag for each repository and skipping images currently in use. I’m sharing the script below and would appreciate any suggestions on how to improve its efficiency, readability, or adherence to best practices. I would also love feedback on any edge cases I might have missed.
Here’s the script:
#!/bin/bash
DRY_RUN=true
IN_USE_IMAGES=$(docker ps --format "{{.Image}}" | xargs -I{} docker inspect --format '{{.Id}}' {} | cut -d: -f2)
IMAGE_LIST=$(docker images --format "{{.Repository}}:{{.Tag}} {{.ID}} {{.CreatedAt}}" | sort -t' ' -k1,1 -k3,3r -k4,4r)
declare -A LATEST_TAGS
while read -r line; do
REPO_TAG=$(echo "$line" | awk '{print 1ドル}')
IMAGE_ID=$(echo "$line" | awk '{print 2ドル}')
CREATED_AT=$(echo "$line" | awk '{print 3ドル" "4ドル" "5ドル" "6ドル}')
REPO_NAME=$(echo "$REPO_TAG" | cut -d':' -f1)
if echo "$IN_USE_IMAGES" | grep -qw "$IMAGE_ID"; then
continue
fi
if [[ -z "${LATEST_TAGS[$REPO_NAME]}" ]]; then
LATEST_TAGS[$REPO_NAME]=$IMAGE_ID
continue
fi
CLEANED_DATE=$(echo "${CREATED_AT}" | sed 's/ UTC//')
IMAGE_DATE=$(date -d "${CLEANED_DATE}" +%s 2>/dev/null)
CURRENT_DATE=$(date +%s)
if [[ -z "$IMAGE_DATE" ]]; then
continue
fi
AGE_DAYS=$(( (CURRENT_DATE - IMAGE_DATE) / 86400 ))
if [ "$AGE_DAYS" -gt 3 ]; then
if [ "$DRY_RUN" = true ]; then
echo "Would delete image $IMAGE_ID ($REPO_TAG) created $AGE_DAYS days ago."
else
docker rmi "$IMAGE_ID"
fi
fi
done <<< "$IMAGE_LIST"
Example output with dry run:
root@server:~# ./script.sh
Error: No such object: registry.example.com/repo/products/app:merchant.staging.1.7.13
Error: No such object: registry.example.com/repo/products/app:customer.staging.1.4.6
Would delete image c02cf39d3dba (gcr.io/example/cadvisor:latest) created 268 days ago.
Would delete image 71dc9668b154 (prom/node-exporter:latest) created 134 days ago.
Would delete image 6a22698eab0e (python:3.9-slim) created 38 days ago.
Would delete image 4baa2e5aa1c9 (registry.example.com/repo/products/app:production-finance-v1.4.0) created 28 days ago.
Would delete image 90e5f1082b26 (registry.example.com/repo/products/app:production-finance-v1.2.0) created 91 days ago.
Would delete image 2333813ecdbb (registry.example.com/repo/products/app:production-v1.1.22) created 106 days ago.
Would delete image 587f3ff87335 (registry.example.com/repo/products/app:staging-v1.20.31) created 28 days ago.
Would delete image 587f3ff87335 (registry.example.com/repo/products/app:staging-v1.20.32) created 28 days ago.
Would delete image f17dfd3f8776 (registry.example.com/repo/products/app:staging-v1.20.30) created 30 days ago.
Would delete image f17dfd3f8776 (registry.example.com/repo/products/app:staging-v1.20.18) created 30 days ago.
Would delete image 89defed4e655 (registry.example.com/repo/products/app:staging-v1.21.4) created 23 days ago.
Would delete image ee2c9dd1668c (registry.example.com/repo/products/app:staging-v1.19.24) created 49 days ago.
Would delete image 90e5f1082b26 (registry.example.com/repo/products/app:staging-v1.17.0) created 91 days ago.
Would delete image 4baa2e5aa1c9 (registry.example.com/repo/products/app:staging-v1.20.36) created 28 days ago.
Would delete image 2333813ecdbb (registry.example.com/repo/products/app:staging-v1.16.8) created 106 days ago.
Would delete image 90e5f1082b26 (registry.example.com/repo/products/app:staging-v1.17.7) created 91 days ago.
Would delete image 2333813ecdbb (registry.example.com/repo/products/app:staging-v1.15.27) created 106 days ago.
Would delete image 2333813ecdbb (registry.example.com/repo/products/app:staging-v1.15.30) created 106 days ago.
Would delete image 57c802036fb5 (registry.example.com/repo/products/app:staging-v1.16.8) created 99 days ago.
Would delete image 526e7b9887c0 (registry.example.com/repo/products/app:staging-v1.19.54) created 38 days ago.
Would delete image a9e425d4a4c6 (registry.example.com/repo/products/app:staging-v1.16.8) created 106 days ago.
Would delete image a9e425d4a4c6 (registry.example.com/repo/products/app:staging-v1.15.27) created 106 days ago.
Would delete image 706e72fa4a64 (registry.example.com/repo/products/app:staging-v1.22.31) created 8 days ago.
Would delete image 69912743e021 (registry.example.com/repo/products/app:staging-v1.16.8) created 99 days ago.
Would delete image 4404b651cf08 (registry.example.com/repo/products/app:staging-v1.22.29) created 8 days ago.
Would delete image 6e435c33fdb0 (registry.example.com/repo/products/app:<none>) created 6 days ago.
Would delete image 324afd1d63a5 (registry.example.com/repo/products/app:<none>) created 7 days ago.
Would delete image d8594896d6ca (registry.example.com/repo/products/app:<none>) created 127 days ago.
Would delete image 87fbee245267 (registry.example.com/repo/products/app:merchant.staging.1.7.3) created 6 days ago.
Would delete image 8bcedd565583 (registry.example.com/repo/products/app:merchant.staging.1.7.4) created 6 days ago.
Would delete image 1dbe0e931976 (registry.example.com/docker-cache/prom/node-exporter:v1.3.1) created 1086 days ago.
root@server:~#
Regarding those two no such object error, I grepped them in the images list and there are no images matching, I think some containers are using these but we don't have the images if I'm not being wrong:
root@server:~# docker images | grep merchant.staging.1.7.13
root@server:~# docker images | grep customer.staging.1.4.6
root@server:~# docker ps | grep merchant.staging.1.7.13
931e52807265 registry.example.com/repo/products/app:merchant.staging.1.7.13 "docker-entrypoint.s..." 39 minutes ago Up 39 minutes 4300/tcp app_merchant_app_merchant.1.sdxy7wl8fnt7yanhri1ymxq5f
root@server:~# docker ps | grep customer.staging.1.4.6
01950400f703 registry.example.com/repo/products/app:customer.staging.1.4.6 "docker-entrypoint.s..." 20 hours ago Up 20 hours 4200/tcp app_customer_app_customer.1.sbwfxjznw
The script has a DRY_RUN
mode to simulate deletions. Any advice on improving its handling of date parsing, array usage, or making it more robust would be greatly appreciated!
2 Answers 2
I don't see where DRY_RUN
gets set to anything but true
. If we arrange that its value is either true
or false
, we could simply execute it rather than comparing:
if $dry_run
then
echo ...
Note that I changed it to lower-case, to avoid conflict/confusion with environment variables (which convention says are written in all-caps).
Is $IMAGE_ID
really a regular expression? If not, then perhaps grep -qw "$IMAGE_ID"
should be using -F
.
Avoid unnecessary operations
The final cut
looks unnecessary here:
IN_USE_IMAGES=$(docker ps --format "{{.Image}}" | xargs -I{} docker inspect --format '{{.Id}}' {} | cut -d: -f2)
The cut
here is used to extract the part after the colon :
from input like this:
sha256:aded1e1a5b3705116fa0a92ba074a5e0b0031647d9c315983ccba2ee5428ec8b sha256:74cc54e27dc41bb10dc4b2226072d469509f2f22f1a3ce74f4a59661a1d44602
However, the only usage of IN_USE_IMAGES
is this:
echo "$IN_USE_IMAGES" | grep -qw "$IMAGE_ID"
Thanks to the -w
flag of grep
, the sha256:
prefix of the lines won't make a difference in practice.
Or, if this level of precision is really important to you, then keep the cut
, and make the grep
more strict to match complete lines with -x
instead of -w
. It's probably also good to throw in there -F
, for simple string matching instead of regex matching:
... | grep -qxF "$IMAGE_ID"
Another unnecessary operation is for CLEANED_DATE
:
CREATED_AT=$(echo "$line" | awk '{print 3ドル" "4ドル" "5ドル" "6ドル}') ... CLEANED_DATE=$(echo "${CREATED_AT}" | sed 's/ UTC//')
The UTC
at the end of CREATED_AT
comes from 6ドル
in awk
.
If you don't want it, then simply drop from there.
In awk '{print 3ドル" "4ドル" "5ドル" "6ドル}'
, the " "
are to add spaces in between the variables. Instead of that, you could simply use commas, to let awk
print the values using the default separator, which is already space:
awk '{print 3,ドル 4,ドル 5,ドル 6ドル}'
Reduce the number of processes in loops
Considering reducing the pipelines executed in loops.
Instead of this:
while read -r line; do REPO_TAG=$(echo "$line" | awk '{print 1ドル}') IMAGE_ID=$(echo "$line" | awk '{print 2ドル}') CREATED_AT=$(echo "$line" | awk '{print 3ドル" "4ドル" "5ドル" "6ドル}')
You can store the values directly in variables using read
itself:
while read -r repo_tag image_id ymd time tzoffset rest; do
created_at="$ymd $time $tzoffset"
Instead of:
REPO_NAME=$(echo "$REPO_TAG" | cut -d':' -f1)
You can use the parameter expansion feature ${parameter%%word}
to remove a matching suffix:
repo_name=${repo_tag%%:*}
A minor safety issue
This command hides potential errors:
IMAGE_DATE=$(date -d "${CLEANED_DATE}" +%s 2>/dev/null)
I would drop the 2>/dev/null
, so that when something's wrong with the date parsing, it's probably good to get a signal about it.
Use here-strings
Instead of:
if echo "$IN_USE_IMAGES" | grep -qw "$IMAGE_ID"; then
A here-string is better, without an echo
:
if grep -qw "$IMAGE_ID" <<< "$IN_USE_IMAGES"; then
Prefer lowercase names for variables in scripts
As @Ed Morton wrote in a comment:
Don't use all upper case for non-exported variable names, see correct-bash-and-shell-script-variable-capitalization.
IMAGE_LIST
content) as you may be able to replace the first 4 lines of your loop withwhile read -r repo_tag image_id created_at
depending on that content. \$\endgroup\$CURRENT_DATE=$(date +%s)
- do you really need to do that inside the loop rather than doing it once before the loop starts? \$\endgroup\$IN_USE_IMAGES
andIMAGE_LIST
when you ran the script to produce the sample output) then we can start to help you. We shouldn't need a scroll bar to read the input or output though, less than 10 lines should be plenty otherwise it gets hard to read/understand the data. \$\endgroup\$