I have a dataset (shapefile) with the same problem as the post below: https://stackoverflow.com/questions/11436594/how-to-fix-double-encoded-utf8-characters-in-an-utf-8-table
"A previous LOAD DATA INFILE was run under the assumption that the CSV file is latin1-encoded. During this import the multibyte characters were interpreted as two single character and then encoded using utf-8 (again). This double-encoding created anomalies like ñ instead of ñ."
However, the solution given is in mysql and not postgres, i tried it on postgres and it didn't work, just worked on mysql:
UPDATE tablename SET
field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8);
I need to import and fix this shapefile using postgres because I will need to use postgis to do various spatial analyzes.
How can I solve this using postgis?
1 Answer 1
That is pretty similar in PostgreSQL:
convert_from(convert_to(textcol, 'LATIN1'), 'UTF8')
-
Thanks, but it didn't work. I tried this: SELECT( convert_from( convert_to(nome_area, 'LATIN1'), 'UTF8')) FROM sigef_sp_multiparte_latin1; Give a error: ERROR: invalid byte sequence for encoding "UTF8": 0xc3 0x3f SQL state: 22021hugonbg– hugonbg2020年12月17日 21:40:35 +00:00Commented Dec 17, 2020 at 21:40
-
Then that string is not "doubly UTF-8 encoded".Laurenz Albe– Laurenz Albe2020年12月18日 06:56:28 +00:00Commented Dec 18, 2020 at 6:56