lua-users home
lua-l archive

Re: Cloning XML

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2015年05月18日 8:22 GMT+02:00 Dirk Laurie <dirk.laurie@gmail.com>:
> 2015年05月18日 2:24 GMT+02:00 Tim Channon <tc@gpsl.net>:
>> There are many XML decoding libraries of varying degrees of capability and
>> portability. The focus of these is decode.
>>
>> Discussion of regenerating identical XML from the decode seems to get lost
>> and particularly when the user has no idea of the XML meaning, simply wants
>> to intercept textually known entities within a context.
I found it quite easy to regenerate identical XML from the output of Roberto's
parser on the lua-users.org Wiki, but in a very specialized context: the XML
files were those produced by `pdftohtml -xml`, for which the DTD is
self-contained and a mere 49 lines long. A far cry from SVG's over 300 lines
mainly pulling in other files, I'll admit.
Here is what I did: I tweaked Roberto's code to provide a metatable
for every table-valued item.
element_mt = { __tostring =
 function(s)
 if type(s)=='string' then return s end
 assert(s.tag)
 local render = render[s.tag]
 if type(render)=='string' then return render
 elseif type(render)=='function' then return render(s)
 else error("Can't convert type '"..s.tag.."' to s string")
 end
 end }
Then the regenerate routine becomes:
local function assemble(document)
 local s = {}
 local box = document.first
 while box do
 s[#s+1] = tostring(box)
 box = box.next
 end
 return tconcat(s,'\n')
end
The global or upvalue "render" is
local render = {
 pdf2xml = lines,
 page = lines,
 document = assemble,
 text = contents,
 b = function(s) return '**'..contents(s)..'**' end,
 i = function(s) return '*'..contents(s)..'*' end,
 a = contents,
 outline = "<outline>",
 fontspec = "<fontspec>" }
with
contents = bind_concat""
lines = bind_concat"\n"
where
function bind_concat(sep)
--- table.concat with bound separator and `tostring` filter
 return function(t)
 local u={}
 if type(t)=='string' then return t end
 for k,v in ipairs(t) do u[k]=tostring(v) end
 return tconcat(u,sep)
 end
end
Note that "render" depends on the DTD. Writing a module that
can generate generate "render" from an arbitrary DTD was not
part of my purposes; writing a different "render" than converts
to say Markdown rather than plain text is, but not yet to the
level where I can share it.

AltStyle によって変換されたページ (->オリジナル) /