4

I'm converting a bash of files from iso-latin-1 (aka, iso-8859-1) to utf-8. In this process, I have the opportunity to rename the files. And, I would like to seize this opportunity to change the error-prone written name-format of the files.

The files have names as such:

tree Dados/Jan/
Dados/Jan/
├── 201301_Licitacoes
│ ├── 201301_EmpenhosRelacionados.csv
│ ├── 201301_ItemLicitaЗ╞o.csv
│ ├── 201301_LicitaЗ╞o.csv
│ └── 201301_ParticipantesLicitaЗ╞o.csv
├── 201401_Licitacoes
│ ├── 201401_EmpenhosRelacionados.csv
│ ├── 201401_ItemLicitaЗ╞o.csv
│ ├── 201401_LicitaЗ╞o.csv
│ └── 201401_ParticipantesLicitaЗ╞o.csv
├── 201501_Licitacoes
│ ├── 201501_EmpenhosRelacionados.csv
│ ├── 201501_ItemLicitaЗ╞o.csv
│ ├── 201501_LicitaЗ╞o.csv
│ └── 201501_ParticipantesLicitaЗ╞o.csv
├── 201601_Licitacoes
│ ├── 201601_EmpenhosRelacionados.csv
│ ├── 201601_ItemLicitaЗ╞o.csv
│ ├── 201601_LicitaЗ╞o.csv
│ └── 201601_ParticipantesLicitaЗ╞o.csv
(...)

I'm executing the following:

find Dados/Jan/ -maxdepth 2 -name '*.csv' -exec sh -c 'conv {}' \;

in which, conv is the following script:

#!/usr/bin/env bash
## adapted from https://stackoverflow.com/questions/62918711/convert-multiple-csv-files-to-utf-8-encoding-using-a-script-windows-command-prom
for file in $@; do
 iconv -f ISO-8859-1 UTF-8 <"$file" >"$file".tmp &&
 mv "$file.tmp" "$file"
done

In this process, I would like to remove the "З╞o" text from the file names, which came as it is, when I unziped the files (probably someone used "~" on the names of the files etc).

Chris Davies
128k16 gold badges176 silver badges323 bronze badges
asked Jan 3, 2022 at 19:08
0

4 Answers 4

4

Use shell "parameter expansion" when mving the file to its final destination. Like

mv "$file.tmp" "${file//З╞o}"

Is it always the same character sequence?

answered Jan 3, 2022 at 19:35
1
  • Yes! It is. This will do. Thank you. But, the order of the / are switched. I would like to accept both of your awnsers, because both are correct. Commented Jan 3, 2022 at 20:06
4

Use bash's "pattern substitution" (read man bash) and do something like:

echo mv "$file.tmp" "${file/3|-o//}"

Remove the echo if you like the result. Never test with the actual mv command, data loss could result.

answered Jan 3, 2022 at 19:38
1
  • This is correct, and worked out. I would like to accept both your awnser and RudiC's, but I can't. So, I will accept his awnser, because his were first. Thank you for you time and explanation. Commented Jan 3, 2022 at 20:07
4

Most probably LicitaЗ╞o is meant to be Licitação, which is portuguese for Licitation.

You could do a hard conversion by using ${var//icitaЗ╞o/icitação} like:

for file in "$@"; do
 filedest="${file//icitaЗ╞o/icitação}"
 iconv -f ISO-8859-1 UTF-8 <"$file" >"$file".tmp &&
 mv "$file.tmp" "$filedest"
 [[ $file != $filedest ]] && rm "$file"
done
dhag
16.3k4 gold badges56 silver badges66 bronze badges
answered Jan 4, 2022 at 0:38
1
  • Oh, no, I'm trying to avoid giving utf-8 names to files. It's what cause this issue, on the first place. But thanks; although, I understood that already, because I'm Brazilian :) Commented Jan 5, 2022 at 14:51
3

Write the converted file to its required target filename and then remove the original

for file in "$@"
do
 iconv -f ISO-8859-1 -t UTF-8 <"$file" >"${file/З╞o/}" &&
 rm -f -- "$file"
done

Notice also that "$@" is now double-quoted. This is required so that it doesn't act (wrongly) like $*.

Finally, the find can be simplified since this script can accept multiple parameters:

find Dados/Jan/ -maxdepth 2 -name '*.csv' -exec conv {} +
answered Jan 3, 2022 at 23:15
1
  • Thank you, I added that in my script Commented Jan 5, 2022 at 14:49

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.