I'm trying to write a script that will copy all file listed in a text file, around 3 million lines, which contains two columns, the source and the destination with a new filename:
path/to/source/directory/filename.pdf path/to/destination/directory/Newfilename.pdf
path/to/source/directory/filename2.pdf path/to/destination/directory/Newfilename2.pdf
path/to/source/directory/filename3.pdf path/to/destination/directory/Newfilename3.pdf
...
All files are PDF format, where Newfilename.pdf is the new filename for the same source PDF file.
ADDITIONALLY, I would like to copy the file and add information to its destination filename, i.e.:
From:
Newfilename.pdf
To:
Newfilename_yyyyMMddHHmmss.pdf (e.g. Newfilename_20200225095823.pdf)
Where yyyyMMddHHmmss
is the actual copy executing date and time for each file and this is the same format for all, causing the destination file to be copied with its complemented name:
path/to/destination/directory/Newfilename_20200225095823.pdf
path/to/destination/directory/Newfilename2_20200225095824.pdf
path/to/destination/directory/Newfilename3_20200225095830.pdf
...
I do not have enough knowledge to handle commands, an idea of what I was researching is the following:
#!/bin/bash
filename=1ドル
while read -r source destination; do
# reading each value
cp -p source destination
done < $filename
However, I read some similar publications, for performance, the while loop and read are tremendously slow when reading from a file or a pipe, because the read shell built-in reads one character at a time. Reference here.
How it could be done with a better solution?
I will greatly appreciate your help.
1 Answer 1
Leaving the performance aspect aside, the first part of your question can be solved using bash's variable manipulation methods:
timestamp="$(date +%Y%m%d%H%M%S)"
while read -r source destination; do
newname="${destination/%.pdf/_$timestamp.pdf}"
cp -p "$source" "$newname"
done < "$filename"
If the timestamp is to be the "moment of copying" rather than that of calling the script, the call to date
must be placed inside the loop:
while read -r source destination; do
timestamp="$(date +%Y%m%d%H%M%S)"
newname="${destination/%.pdf/_$timestamp.pdf}"
cp -p "$source" "$newname"
done < "$filename"
Update: As pointed out by @Jetchisel, bash
from v4.2 upwards has builtin functionality to format dates using the printf
command, which would make the call to the external date
command unnecessary:
while read -r source destination; do
printf -v timestamp '%(%Y%m%d%H%M%S)T'
newname="${destination/%.pdf/_$timestamp.pdf}"
cp -p "$source" "$newname"
done < "$filename"
-
I think it should be
cp -p "$source" "$newname"
. As the example in the question shows different time stamps for the individual files, it might be necessary to calldate
inside the loop. (This is not clearly specified in the question.)Bodo– Bodo2020年02月25日 15:39:01 +00:00Commented Feb 25, 2020 at 15:39 -
Thank you very much for your answer @AdminBee, it's a great help.msarcom– msarcom2020年02月25日 15:55:51 +00:00Commented Feb 25, 2020 at 15:55
-
For sure @AdminBee, in the meantime I will perform tests processing this amount of records and monitoring that the server does not have excessive use of resources.msarcom– msarcom2020年02月25日 16:06:42 +00:00Commented Feb 25, 2020 at 16:06
-
1The presumed performance aspect is hard to work around anyway, as long as one is limited to using the standard
cp
or such an external tool. Since the filenames are unique, it just can't be done without launching a newcp
for each file, and that's going to dwarf anything related to reading some lines from a file. (Then there's the I/O itself, which of course depends on the hardware.)ilkkachu– ilkkachu2020年02月25日 17:17:02 +00:00Commented Feb 25, 2020 at 17:17 -
1Instead of date you can use printf in bash, well if your bash has that builtin that can format date.
printf '%(%Y%m%d%H%M%S)T'
should give the same result as that date format, that waydate
is not called/run every line, just my two cents.Jetchisel– Jetchisel2020年03月04日 23:45:41 +00:00Commented Mar 4, 2020 at 23:45
cut
) to process a single line instead of passing the whole file to the program. Or a manipulation of the input data implemented in shell script code instead of using specialized programs. In your case you have to runcp
for every combination of source and destination file name, so I don't see anything wrong with your loop. Please specify if you want to have the same time stamp for all destination files or individual time stamps when copying every single file started.