lua-users home
lua-l archive

Re: xml pull parser

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 21-Mar-05, at 7:54 AM, PA wrote:
Alternatively, somebody, somewhere, somehow must have written this a dozen time already. Is there not a little code sample somewhere to show how to decode an XML string in Lua? Sigh...
Probably. Although the issues are subtle. I don't address any of them here; this is simply a working reimplementation of the same transformation.
do
 local ents = {
 lt = '<',
 gt = '>',
 amp = '&',
 quot = '"',
 apos = "'"
 }
 local maxutf8 = tonumber('10FFFF', 16)
 local function entity2char(hash, str)
 if hash == '#' then
 -- turn hex into c-style hex
 local utfcode = tonumber((string.gsub(str, '^x', '0x')))
 if utfcode and utfcode < 256 then
 return string.char(utfcode)
 end
 elseif ents[str] then
 return ents[str]
 end
 return '&'..hash..str..';'
 end
 function decode(str)
 return str and string.gsub(str, '&(#?)(%w+);', entity2char)
 end
end
--- some tests
=decode '&amp;apos; is how you write &apos;'
=decode '&#38;amp; is an ampersand.'
=decode '&quot;I said &apos;Stop right there!&apos; &amp; I &lt;strong&gt;meant it!&lt;/strong&gt;&quot; the webmaster shouted, htmlifying instinctively'
-- Codes and noncodes
=decode 'Some invalid numeric escapes include &#7b2;, &#x24g;'
=decode "Take out the &garbage;! Don't leave it for ma&#xf1;ana! The sooner the &#x3b2;!"
-- What was I saying about iso-8859-1?
=decode 'ma&#xf1;ana or ma&#xc3;&#xb1;ana?'
-->
> =decode '&amp;apos; is how you write &apos;'
&apos; is how you write '
> =decode '&#38;amp; is an ampersand.'
&amp; is an ampersand.
> =decode '&quot;I said &apos;Stop right there!&apos; &amp; I &lt;strong&gt;meant it!&lt;/strong&gt;&quot; the webmaster shouted, htmlifying instinctively' "I said 'Stop right there!' & I <strong>meant it!</strong>" the webmaster shouted, htmlifying instinctively
> -- Codes and noncodes
> =decode 'Some invalid numeric escapes include &#7b2;, &#x24g;'
Some invalid numeric escapes include &#7b2;, &#x24g;
> =decode "Take out the &garbage;! Don't leave it for ma&#xf1;ana! The sooner the &#x3b2;!" Take out the &garbage;! Don't leave it for ma?ana! The sooner the &#x3b2;!
> -- What was I saying about iso-8859-1?
> =decode 'ma&#xf1;ana or ma&#xc3;&#xb1;ana?'
ma?ana or mañana?

AltStyle によって変換されたページ (->オリジナル) /