Re: code page
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: code page
- From: Marco Antonio Abreu <mabreu.ti@...>
- Date: 2009年5月13日 15:48:42 -0300
Hi Ignacio,
The DLL works, don't trancate the vales any more. But now returns the problem writing wrong chars (garbage) at the destination database. In our example, it now writes 'Flávia' in the field even with the 'N' flag before the string. Should we resolve it with a gsub substituition or you know a better solution?
tks
Marco
2009年5月13日 Ignacio Burgueño
<ignaciob@inconcertcc.com>
David Given wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Marco Antonio Abreu wrote:
When a field
value has one accented char, it truncate the last one ('Flávia' comes
like 'Fl??vi' - ?? are especial chars), if the text has two accented
chars it has the last two chars cutted and so on...
This is a classic symptom of UTF-8 misparsing.
Kind of. In fact the problem is that LuaCOM is truncating characters.
The issue is this. There's a function to convert from BSTR (utf-16 strings, as used by COM) to Lua strings.
When converting "Flávia", it computes its size (6) and converts to utf-8 (which gives a 7 byte string: Flávia) BUT, it pushes just 6 bytes to Lua (instead of the required 7).
So, the strings got truncated depending on the amount of codepoints present (roughly).
I'll push a fix for that to LuaCOM.
Regards,
Ignacio Burgueño
--
Marco Antonio Abreu
Analista de Sistemas