lua-users home
lua-l archive

Re: [Q] handling 0xC2A0 (space in utf8)

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Thank you Roberto,
this 
 + string.char(0xc2, 0xa0)
worked
Also thank you for all the responses,
now I understand that 0xc2a0 is not a UTF-8 space
but instead a special HTML character that is rendered
by web browsers as space, and that is 
represented differently in UTF-8. Some
how was not handeled by PHP's html_entity_decode
(this function is supposed to get rid of all the
HTML stuff for me)
Peter, I use that dijit.Editor Javascript editor because it allows
to define buttons, that am going to use to allow users
to do 'Blocks of code insert' -- instead of just having them typing
in text. I am only disgarding HTML tags when passing to my
compiler written in Lua, otherwise, I will be saving the text as is
in UTF-8 enabled postgreSQL.
... by the way, I added to my online-resume that I programmed in Lua
(my compiler is just over 1.6k lines, but have used also luabind C++
library
for another project)
and got a call today from a recruiter about my LUA experience :-).
On 2008年10月16日 17:02:33 -0300, "Roberto Ierusalimschy"
<roberto@inf.puc-rio.br> said:
> > In lua, I have specifed for LPEG the following grammar for space
> > 
> > local space=lpeg.S('\r\n\f\t ')^1
> > 
> > [...]
> > 
> > I am thinking now that this messes up LPeg when trying to match
> > for the space. I would like to tell LPeg to also understand
> > 0xC2A0 as a space.
> 
> local space = (lpeg.S('\r\n\f\t ') + string.char(0xc2, 0xa0))^1
> 
> -- Roberto
-- 
 V S P
 toreason@fastmail.fm
-- 
http://www.fastmail.fm - Access your email from home and the web

AltStyle によって変換されたページ (->オリジナル) /