hex dump w/ or w/out utf-8 chars

wxjmfauth at gmail.com wxjmfauth at gmail.com
Tue Jul 9 05:34:08 EDT 2013


Le mardi 9 juillet 2013 09:00:02 UTC+2, Steven D'Aprano a écrit :
> On 2013年7月08日 10:53:18 -0700, ferdy.blatsco wrote:
>>>> > Not using python 3, for me (a programmer which was present at the
>> > beginning of computer science, badly interacting with many languages
>> > from assembler to Fortran and from c to Pascal and so on) it was an hard
>> > job to arrange the abrupt transition from characters only equal to bytes
>>>> Characters have *never* been equal to bytes. Not even Perl treats the 
>> character 'A' as equal to the byte 0x0A:
>>>> if (0x0A eq 'A') {print "Equal\n";}
>> else {print "Unequal\n";}
>>>> will print Unequal, even if you replace "eq" with "==". Nor does Perl 
>> consider the character 'A' equal to 65.
>>>> If you have learned to think of characters being equal to bytes, you have 
>> learned wrong.
>>>>>> > to some special characters defined with 2, 3 bytes and even more. I
>> > should have preferred another solution... but i'm not Guido....!
>>>> What's a special character?
>>>> To an Italian, the characters J, K, W, X and Y are "special characters" 
>> which do not exist in the ordinary alphabet. To a German, they are not 
>> special, but S is special because you write SS as ß, but only in 
>> lowercase.
>>>> To a mathematician, σ is just as ordinary as it would be to a Greek; but 
>> the mathematician probably won't recognise ς unless she actually is 
>> Greek, even though they are the same letter.
>>>> To an American electrician, Ω is an ordinary character, but ω isn't.
>>>> To anyone working with angles, or temperatures, the degree symbol ° is an 
>> ordinary character, but the radian symbol is not. (I can't even find it.)
>>>> The English have forgotten that W used to be a ligature for VV, and 
>> consider it a single ordinary character. But the ligature Æ is considered 
>> an old-fashioned way of writing AE.
>>>> But to Danes and Norwegians, Æ is an ordinary letter, as distinct from AE 
>> as TH is from Þ. (Which English used to have.) And so on... 
>>>> I don't know what a special character is, unless it is the ASCII NUL 
>> character, since that terminates C strings.

--------
The concept of "special characters" does not exist.
However, the definition of a "character" is a problem
per se (character, glyph, grapheme, ...).
You are confusing Unicode, typography and linguistic.
There is no symbole for radian because mathematically
radian is a pure number, a unitless number. You can
hower sepecify a = ... in radian (rad).
Note the difference between SS and ẞ
'FRANZ-JOSEF-STRAUSS-STRAẞE'
jmf


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /