Module:Sandbox/AbstractWikipedia/TextAssembler
Appearance
From Meta, a Wikimedia project coordination wiki
This module is part of the Abstract Wikipedia template-renderer prototype. It corresponds to the last block in the proposed NLG architecture.
It exposes the function constructText, which responsible for assembling a string of text from a list of lexemes passed to it.
While assembling the text, it takes care of spacing, punctuation and capitalization, according to the information given in the trailing_punctuation and capitalization tables.
Note that previously this code was part of the main module, and as such it is not mentioned in the recorded demo of the prototype.
The above documentation is transcluded from Module:Sandbox/AbstractWikipedia/TextAssembler/doc. (edit | history)
Editors can experiment in this module’s sandbox (create | mirror) and testcases (create) pages.
Please add categories to the /doc subpage. Subpages of this module.
Editors can experiment in this module’s sandbox (create | mirror) and testcases (create) pages.
Please add categories to the /doc subpage. Subpages of this module.
localp={} -- The following gives a list of trailing punctuation signs, and their relative -- rank. Lower rank (i.e. higher number) means that a punctuation mark is -- superseded by an adjacent higher rank mark. Between punctuation marks of equal -- rank, the latter supersedes. trailing_punctuation={['.']=1,[',']=2} -- The following lists punctuation marks which should trigger capitalization: capitalization={['.']=true} -- This functions constructs the final string of the lexemes. -- It reduces spans of multiple spacings to a single one, handles punctuation -- specially, and concatenates the rest of the text. -- It also handles capitalization (except in the first sentence). functionp.constructText(lexemes) localresult='' localpending_space='' localpending_punctuation='' forindex,lexemeinipairs(lexemes)do localtext=tostring(lexeme) iflexeme.pos=='spacing'then pending_space=text elseiflexeme.pos=='punctuation'andtrailing_punctuation[text]then if#pending_punctuation==0ortrailing_punctuation[pending_punctuation]>trailing_punctuation[text]then pending_punctuation=text end -- Trailing punctuation removes prior space pending_space='' elseiftext~=""then-- Empty text can be ignored ifresult==''orcapitalization[pending_punctuation]then text=mw.getLanguage(language):ucfirst(text) end result=result..pending_punctuation..pending_space..text pending_punctuation='' pending_space='' end end result=result..pending_punctuation returnresult end returnp