hex dump w/ or w/out utf-8 chars

Tue Jul 9 02:46:40 EDT 2013

On 2013年7月09日 00:32:00 +0100, MRAB wrote:
> On 08/07/2013 23:02, Joshua Landau wrote:
>> On 8 July 2013 22:38, MRAB <python at mrabarnett.plus.com> wrote:
>>> On 08/07/2013 21:56, Dave Angel wrote:
>>>> Characters do not have a width.
>>>>>> [snip]
>>>>>> It depends what you mean by "width"! :-)
>>>>>> Try this (Python 3):
>>>>>>>>> print("A\N{FULLWIDTH LATIN CAPITAL LETTER A}")
>>> AＡ
>>>> Serious question: How would one find the width of a character by that
>> definition?
>>> >>> import unicodedata
> >>> unicodedata.east_asian_width("A")
> 'Na'
> >>> unicodedata.east_asian_width("\N{FULLWIDTH LATIN CAPITAL LETTER
> >>> A}")
> 'F'
>> The possible widths are:
>> N = Neutral
> A = Ambiguous
> H = Halfwidth
> W = Wide
> F = Fullwidth
> Na = Narrow
>> All you then need to do is find out what those actually mean...

In some East-Asian encodings, there are code-points for Latin characters 
in two forms: "half-width" and "full-width". The half-width form took up 
a single fixed-width column; the full-width forms took up two fixed-width 
columns, so they would line up nicely in columns with Asian characters.
See also:
http://www.unicode.org/reports/tr11/
and search Wikipedia for "full-width" and "half-width".
-- 
Steven