I'm converting a bash of files from iso-latin-1
(aka, iso-8859-1
) to utf-8
. In this process, I have the opportunity to rename the files. And, I would like to seize this opportunity to change the error-prone written name-format of the files.
The files have names as such:
tree Dados/Jan/
Dados/Jan/
├── 201301_Licitacoes
│ ├── 201301_EmpenhosRelacionados.csv
│ ├── 201301_ItemLicitaЗ╞o.csv
│ ├── 201301_LicitaЗ╞o.csv
│ └── 201301_ParticipantesLicitaЗ╞o.csv
├── 201401_Licitacoes
│ ├── 201401_EmpenhosRelacionados.csv
│ ├── 201401_ItemLicitaЗ╞o.csv
│ ├── 201401_LicitaЗ╞o.csv
│ └── 201401_ParticipantesLicitaЗ╞o.csv
├── 201501_Licitacoes
│ ├── 201501_EmpenhosRelacionados.csv
│ ├── 201501_ItemLicitaЗ╞o.csv
│ ├── 201501_LicitaЗ╞o.csv
│ └── 201501_ParticipantesLicitaЗ╞o.csv
├── 201601_Licitacoes
│ ├── 201601_EmpenhosRelacionados.csv
│ ├── 201601_ItemLicitaЗ╞o.csv
│ ├── 201601_LicitaЗ╞o.csv
│ └── 201601_ParticipantesLicitaЗ╞o.csv
(...)
I'm executing the following:
find Dados/Jan/ -maxdepth 2 -name '*.csv' -exec sh -c 'conv {}' \;
in which, conv
is the following script:
#!/usr/bin/env bash
## adapted from https://stackoverflow.com/questions/62918711/convert-multiple-csv-files-to-utf-8-encoding-using-a-script-windows-command-prom
for file in $@; do
iconv -f ISO-8859-1 UTF-8 <"$file" >"$file".tmp &&
mv "$file.tmp" "$file"
done
In this process, I would like to remove the "З╞o" text from the file names, which came as it is, when I unziped the files (probably someone used "~" on the names of the files etc).
4 Answers 4
Use shell "parameter expansion" when mv
ing the file to its final destination. Like
mv "$file.tmp" "${file//З╞o}"
Is it always the same character sequence?
-
Yes! It is. This will do. Thank you. But, the order of the
/
are switched. I would like to accept both of your awnsers, because both are correct.BuddhiLW– BuddhiLW2022年01月03日 20:06:26 +00:00Commented Jan 3, 2022 at 20:06
Use bash
's "pattern substitution" (read man bash
) and do something like:
echo mv "$file.tmp" "${file/3|-o//}"
Remove the echo
if you like the result. Never test with the actual mv
command, data loss could result.
-
This is correct, and worked out. I would like to accept both your awnser and RudiC's, but I can't. So, I will accept his awnser, because his were first. Thank you for you time and explanation.BuddhiLW– BuddhiLW2022年01月03日 20:07:44 +00:00Commented Jan 3, 2022 at 20:07
Most probably LicitaЗ╞o
is meant to be Licitação
, which is portuguese for Licitation.
You could do a hard conversion by using ${var//icitaЗ╞o/icitação}
like:
for file in "$@"; do
filedest="${file//icitaЗ╞o/icitação}"
iconv -f ISO-8859-1 UTF-8 <"$file" >"$file".tmp &&
mv "$file.tmp" "$filedest"
[[ $file != $filedest ]] && rm "$file"
done
-
Oh, no, I'm trying to avoid giving utf-8 names to files. It's what cause this issue, on the first place. But thanks; although, I understood that already, because I'm Brazilian :)BuddhiLW– BuddhiLW2022年01月05日 14:51:21 +00:00Commented Jan 5, 2022 at 14:51
Write the converted file to its required target filename and then remove the original
for file in "$@"
do
iconv -f ISO-8859-1 -t UTF-8 <"$file" >"${file/З╞o/}" &&
rm -f -- "$file"
done
Notice also that "$@"
is now double-quoted. This is required so that it doesn't act (wrongly) like $*
.
Finally, the find
can be simplified since this script can accept multiple parameters:
find Dados/Jan/ -maxdepth 2 -name '*.csv' -exec conv {} +
-
Thank you, I added that in my scriptBuddhiLW– BuddhiLW2022年01月05日 14:49:40 +00:00Commented Jan 5, 2022 at 14:49