0

I have a dataset (shapefile) with the same problem as the post below: https://stackoverflow.com/questions/11436594/how-to-fix-double-encoded-utf8-characters-in-an-utf-8-table

"A previous LOAD DATA INFILE was run under the assumption that the CSV file is latin1-encoded. During this import the multibyte characters were interpreted as two single character and then encoded using utf-8 (again). This double-encoding created anomalies like ñ instead of ñ."

However, the solution given is in mysql and not postgres, i tried it on postgres and it didn't work, just worked on mysql:

UPDATE tablename SET
 field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8);

I need to import and fix this shapefile using postgres because I will need to use postgis to do various spatial analyzes.

How can I solve this using postgis?

asked Dec 17, 2020 at 15:22

1 Answer 1

1

That is pretty similar in PostgreSQL:

convert_from(convert_to(textcol, 'LATIN1'), 'UTF8')
answered Dec 17, 2020 at 15:38
2
  • Thanks, but it didn't work. I tried this: SELECT( convert_from( convert_to(nome_area, 'LATIN1'), 'UTF8')) FROM sigef_sp_multiparte_latin1; Give a error: ERROR: invalid byte sequence for encoding "UTF8": 0xc3 0x3f SQL state: 22021 Commented Dec 17, 2020 at 21:40
  • Then that string is not "doubly UTF-8 encoded". Commented Dec 18, 2020 at 6:56

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.