Re: Clearing up misconceptions about characters vs bytes in the manual
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: Clearing up misconceptions about characters vs bytes in the manual
- From: Rapin Patrick <rapin.patrick@...>
- Date: Fri, 2 Nov 2012 16:33:56 +0100
Would it be a good idea to make a distinction between characters and
bytes, or do you guys feel that this is already clear in the manual
(and PiL)?
For C programmers, characters and bytes have always been synonyms...
But for programmers used to Unicode aware languages, I admit that Lua denomination is confusing.
I searched for "character" and "byte" in Lua 5.2 reference manual.
There are a lot more of "characters" than "bytes". Most of the time, "character" is used to refer to a literal ASCII character as in 'k'.
I don't think it would help to write for example "the byte 'k' ". instead of "the character 'k' ".
In the string library chapter, a character generally means a byte. Note however that at the start of the chapter there is this sentence:
"The string library assumes one-byte character encodings.
"
Also, for the # operator, the reference states:
"The length of a string is its number of bytes
(that is, the usual meaning of string length when each
character is one byte).
"
It is however funny to note that the function `void luaL_addchar (luaL_Buffer *B, char c)` is documented as "Adds the byte c to the buffer B".
So yes, there is a place for confusion. But I don't think that `reference_manual:gsub("character", "byte") ` has the correct syntax to fix the situation.