Decoding botched escaped unicode from SQL Server in Python?

Asked 7 years ago

Viewed 81 times

I have a CSV file, appears to be UTF-16, dumped from SQL Server. This file contains properly encoded accents (spanish) but some of the rows are encoded differently. Like this:

0xd83d0xde1b0xd83d0xde1b0xd83d0xde1b

This seems to be a strange encoding for

\ud83d\ude1b\ud83d\ude1b\ud83d\ude1b

\ud83d\ude1b are surrogate pairs for an emoji

I need to convert everything to a nice, neat UTF-8 file. I tried endless combinations of bytearray(), encode(), decode(), and so on.

How can I convert this file of mixed UTF-16 and escaped UTF-16 into proper Python 3 strings, and finally save them to a new UTF-8 file?

Improve this question

edited Dec 20, 2018 at 8:26

snakecharmerb's user avatar

snakecharmerb

57.2k13 gold badges137 silver badges200 bronze badges

asked Dec 19, 2018 at 19:21

hjf's user avatar

hjf

4635 silver badges17 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default

You can convert the hex data like this:

>>> import binascii
>>> s = '0xd83d0xde1b0xd83d0xde1b0xd83d0xde1b'
>>> # Remove the leading '0x'
>>> hs = s.replace('0x', '')
>>> # Convert from hex to bytes
>>> bs = binascii.unhexlify(hs)
>>> bs
b'\xd8=\xde\x1b\xd8=\xde\x1b\xd8=\xde\x1b'
# Decode to str
>>> bs.decode('utf-16be')
'😛😛😛'

Improve this answer

answered Dec 20, 2018 at 8:25

snakecharmerb's user avatar

snakecharmerb

57.2k13 gold badges137 silver badges200 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

default

CollectivesTM on Stack Overflow

Decoding botched escaped unicode from SQL Server in Python?

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related