I have an archive of photos stored in a directory tree on my Mac like:
./2016/05/17/photo-312.jpg
./2016/05/19/photo-1234.jpg
./2016/05/19/photo-5678.jpg
I want to create MD5 hashes of each file that can be used to verify the photos have not been altered or corrupted. My goals are:
- One MD5 file per photo
- Store the MD5 files in the same directory their corresponding photos
- Use the same base name as the photo, but switch the extension to
.md5
- Capture only the hash value (e.g.
b1046abbe7bbf2a2473e9489599f38e0
) without any trailing spaces or newlines
For example, the above directory structure would look like this after the process runs:
./2016/05/17/photo-312.jpg
./2016/05/17/photo-312.md5
./2016/05/19/photo-1234.jpg
./2016/05/19/photo-1234.md5
./2016/05/19/photo-5678.jpg
./2016/05/19/photo-5678.md5
(Note: I only need to run this process one time. The process I use to move photos into the archive will create the necessary MD5 files for new photos from this point forward.)
Here's the one-liner I came up with:
find . -type f -name "*.jpg" -exec bash -c 'printf "%s" $(md5 -q "0ドル") > "${0%.*}.md5"' {} \;
(Note: my machine has md5
instead of md5sum
which I often see referenced. So, I'm using that.)
Here's a few details on how I understand this to work:
The first section runs a basic
find
command on the current directory (i.e. ".
") looking for.jpg
files and sends them to bash with-exec bash -c
find . -type f -name "*.jpg" -exec bash -c
Bash runs
printf
to setup for a string that doesn't have a newline:printf "%s"
This section generates the hash that is used to feed the string into
printf
:$(md5 -q "0ドル")
The
-q
flag tellsmd5
to output only the hash instead of the standard MD5 output which would look something line:MD5 (photo-312.jpg) = b1046abbe7bbf2a2473e9489599f38e0
The value of
0ドル
is the relative path to the source.jpg
file thatfind
sent to bash.This section creates the file path to store the value in where the original extension is replaced by
.md5
:"${0%.*}.md5"
More details about what's going on there can be found in the
${parameter%word}
section of the Bash Manual.The last little bit is:
{} \;
I'm not sure why, but the
{}
is necessary to make this run. (My understanding is that it's a reference to the file path. I don't know how that ties in, butmd5: bash: No such file or directory
errors pop up if it's not there.)Finally, the
\;
identifies the end of find's-exec
.
While I normally use other languages for this type of work, I decided to try this with bash to get some practice with it. I've done some basic testing and everything appears to work as expected. Given my infrequent use of bash, I'd like to make sure I'm not getting myself in trouble. So, my questions are:
Are there any gotchas in this code that are waiting to bite me?
Is there a more standard or efficient way to do this?
UPDATE: I modified my code based on the answers. In case it's useful, here's what I ended up with:
find . -type f \( -name '*.cr2' -or -name '*.jpg' \) -execdir sh -c 'sha1sum "{}" > "${1%.*}".sha1' -- {} \;
Which:
- Allows for multiple file extension to be processed at the same time.
- Uses
-execdir
instead of-exec
so the default output of the hashing algorithm don't contain paths. (Which is one reasons I was trying to strip them originally). - Instead of
md5
usessh1sum
which provides asha1sum -c
flag for verifying files and didn't require installation via homebrew. - Uses the more appropriate
${1%.*}
(with the help of the--
at the end) instead of${0%.*}
to remove the initial file extension.
2 Answers 2
A gotcha, sort of...
Although the one-liner works, the use of 0ドル
is inappropriate. From man bash
:
-c If the -c option is present, then commands are read from the
first non-option argument command_string. If there are argu-
ments after the command_string, the first argument is
assigned to 0ドル and any remaining arguments are assigned to
the positional parameters. The assignment to 0ドル sets the
name of the shell, which is used in warning and error mes-
sages.
That is, the file names to compute MD5 for are not appropriate values as the "shell". Positional arguments are in 1ドル
, 2ドル
, and so on, that would be appropriate for this purpose. You can fix that using the --
special argument, that signals the end of options and disables further option processing:
find ... -exec bash -c 'printf $(md5 -q "1ドル") > "${1%.*}.md5"' -- {} \;
Simplify
Is it really important to strip the .jpg
at the end of filenames?
Would it be terrible if the files looked like this?
./2016/05/17/photo-312.jpg
./2016/05/17/photo-312.jpg.md5
./2016/05/19/photo-1234.jpg
./2016/05/19/photo-1234.jpg.md5
./2016/05/19/photo-5678.jpg
./2016/05/19/photo-5678.jpg.md5
Because that would simplify the script a bit.
Is it really important to print only the MD5 digest without trailing newline?
That would simplify the script a bit more.
And that would get rid of the -q
flag which is only supported by BSD's md5
tool, and not by GNU's md5sum
.
With this change, the script would be usable in Linux by defining md5
as an alias to md5sum
.
Actually, it would be great to install GNU's md5sum
(md5sha1sum
package in Brew and MacPorts), because it has a -c
flag to verify the file easily. For example, you would be able to verify the checksum of all files with:
find . -name '*.md5' -execdir md5sum -c {} \;
To create the files:
find . -type f -name "*.jpg" -execdir sh -c 'md5sum "{}" > "{}".md5' \;
Note that -execdir
is necessary instead of -exec
, so that the filenames don't have the directory part. Otherwise the .md5
files would contain the full path of files, and the md5sum -c
verification would only work if invoked from the same relative path from where the digest file was created.
Use sh
if good enough
The Bash script in -exec
doesn't do anything Bash specific, so you could replace bash
with sh
.
Redundant "%s"
in printf
Instead of printf "%s" something
you could simply write printf something
.
-
\$\begingroup\$ Thanks for notes! One reason I was stripping down to just the hash was because I was seeing paths in the normal MD5 generation. Thanks for the pointer to
-execdir
which takes care of that. -- There's no technical reason for me to remove the original extensions. Just aesthetic. Also, seemed like a good practice exercise for when it might matter in the future. -- I also ended up switching tosh1sum
which seems to be installed by default and provides the-c
flag for checking. \$\endgroup\$Alan W. Smith– Alan W. Smith2017年05月20日 16:51:29 +00:00Commented May 20, 2017 at 16:51
gotchas
- The main issue I see with your one-liner is: what happens if you end up with a file that contains spaces? You have defined the input set so that it isn't a problem here, but including appropriate quoting or escaping to handle that would be a good idea in general. Typically you're taking the output of
find
and passing it along to something else so you end up with afind -print0 | xargs -0
sort of arrangement.xargs
wouldn't easily work for your example though. - Since you're not doing any interpolation in
-name "*.jpg"
it would be clearer to use single quotes which don't do any magic inside:-name '*.jpg'
efficiency
- You are invoking
bash
once per file. This can be a significant overhead if you're processing thousands of files. It might be good to turn the-exec
'd part of things into its own script that can handle multiple arguments. Since you're only doing this once I wouldn't worry about it.
suggestions
- While one-liners are cool and all: why not turn this into a script? That would make it easier to add error handling for arguments and keep your notes in comments. While it would slow it down slightly you could also add some progress indicator like printing the current file that it is operating on.
- You can get free software things on the Mac that you are missing like
md5sum
fromhomebrew
. Or just run Linux in a VM.
-
\$\begingroup\$ Thanks for the review! -- I think it handles spaces spaces in paths appropriately. I believe the quotes around 0ドル add that protection. (Just ran a couple tests that seemed to work fine, but I may be missing something) -- Good note on single quotes for the file extension. Making that change. -- And as to why I'm not turning this into a real script: mainly I just wanted to see if I could do this in bash and since I only need it one time, it seemed like a good exercise to get a little practice. \$\endgroup\$Alan W. Smith– Alan W. Smith2017年05月20日 15:47:08 +00:00Commented May 20, 2017 at 15:47
md5
I moved to usingsha1sum
which seems to come installed by default on Macs running 10.12 and provides the-c
for verification. \$\endgroup\$