Last Updated: February 25, 2016

· chluehr

Debugging encodings and character sets.

#utf8

#utf-8

#character

#umlaut

#encodings

Garbled text on your screen?

Put your data in a plain text file (using vim - you do not want BOMs in your data!)
use the command hexdump -C file
locate the strange characters and determine the byte (sequences)
look them up, e.g. here: utf8 charset table (german)

An example, the german umlaut ü ("ue"):

Correct utf8 encoding is (you would see c3 bc in the hexdump):

U+00FC ü c3 bc LATIN SMALL LETTER U WITH DIA.

A valid UTF-8 character sequence that displays identically, but is not a "ü" (again, 75 cc 88 in the hexdump):

U+0075 u 75 LATIN SMALL LETTER U
U+0308 ̈ cc 88 COMBINING DIAERESIS

Written by Christoph Lühr

Say Thanks

Respond

Related protips

Mac OS X: ValueError: unknown locale: UTF-8 in Python

179.3K

JDBC: Inserting unicode UTF-8 characters into MySQL

78.06K

Dealing with Unicode in Go

65.44K

Have a fresh tip? Share with Coderwall community!

Post

Post a tip

Best #Utf8 Authors

masnun

179.3K

#utf8

#PHP

#Python

moezzie

78.06K

65.45K

aleemb

50.65K

#utf8

#C#

#Open Source

vjt

15.56K

Related Tags