I have a generator of files running, where each file has a name alphabetically following the previous one. At first I was doing my loop like for file in /path/to/files*; do...
, but I soon realized that the glob will only expand before the loop, and any new files created while looping won't be processed.
My current way of doing this is quite ugly:
while :; do
doneFileCount=$(wc -l < /tmp/results.csv)
i=0
for file in *; do
if [[ $((doneFileCount>i)) = 1 ]]; then
i=$((i+1))
continue
else
process-file "$file" # prints single line to stdout
i=$((i+1))
fi
done | tee -a /tmp/results.csv
done
Is there any simple way to loop over ever-increasing list of files, without the hack described above?
-
1Another idea would be to save the filenames into an array as you process them, then skip processing for filenames that exist in the array. That would skip files that get removed and replaced, though.Jeff Schaller– Jeff Schaller ♦2017年12月03日 13:27:15 +00:00Commented Dec 3, 2017 at 13:27
1 Answer 1
I think the usual way would be to have new files appear in one directory, and rename/move them to another after processing, so that they don't hit the same glob again. So something like this
cd new/
while true; do
for f in * ; do
process file "$f" move to "../processed/$f"
done
sleep 1 # just so that it doesn't busyloop
done
Or similarly with a changing file extension:
while true; do
for f in *.new ; do
process file "$f" move to "${f%.new}.done"
done
sleep 1 # just so that it doesn't busyloop
done
On Linux, you could also use inotifywait
to get notifications on new files.
inotifywait -q -m -e moved_to,close_write --format "%f" . | while read -r f ; do
process file "$f"
done
In either case, you'll want to watch for files that are still being written to. A large file created in-place will not appear atomically, but your script might start processing it when it's only halfway written.
The inotify close_write
event above will see files when the writing process closes them (but it also catches modified files), while the create
event would see the file when it's first created (but it might still be written to). moved_to
simply catches files that are moved to the directory being watched.