Module:Sandbox/AbstractWikipedia/TextAssembler

Module documentation

This module is part of the Abstract Wikipedia template-renderer prototype. It corresponds to the last block in the proposed NLG architecture.

It exposes the function constructText, which responsible for assembling a string of text from a list of lexemes passed to it.

While assembling the text, it takes care of spacing, punctuation and capitalization, according to the information given in the trailing_punctuation and capitalization tables.

Note that previously this code was part of the main module, and as such it is not mentioned in the recorded demo of the prototype.

The above documentation is transcluded from Module:Sandbox/AbstractWikipedia/TextAssembler/doc. (edit | history)
Editors can experiment in this module’s sandbox (create | mirror) and testcases (create) pages.
Please add categories to the /doc subpage. Subpages of this module.

 localp={}

 -- The following gives a list of trailing punctuation signs, and their relative
 -- rank. Lower rank (i.e. higher number) means that a punctuation mark is 
 -- superseded by an adjacent higher rank mark. Between punctuation marks of equal 
 -- rank, the latter supersedes.
 trailing_punctuation={['.']=1,[',']=2}
 -- The following lists punctuation marks which should trigger capitalization:
 capitalization={['.']=true}

 -- This functions constructs the final string of the lexemes. 
 -- It reduces spans of multiple spacings to a single one, handles punctuation
 -- specially, and concatenates the rest of the text.
 -- It also handles capitalization (except in the first sentence).
 functionp.constructText(lexemes)
 localresult=''
 localpending_space=''
 localpending_punctuation=''
 forindex,lexemeinipairs(lexemes)do
 localtext=tostring(lexeme)
 iflexeme.pos=='spacing'then
 pending_space=text
 elseiflexeme.pos=='punctuation'andtrailing_punctuation[text]then
 if#pending_punctuation==0ortrailing_punctuation[pending_punctuation]>trailing_punctuation[text]then
 pending_punctuation=text
 end
 -- Trailing punctuation removes prior space
 pending_space=''
 elseiftext~=""then-- Empty text can be ignored
 ifresult==''orcapitalization[pending_punctuation]then
 text=mw.getLanguage(language):ucfirst(text)
 end
 result=result..pending_punctuation..pending_space..text
 pending_punctuation=''
 pending_space=''
 end
 end
 result=result..pending_punctuation
 returnresult
 end

 returnp

Retrieved from "https://meta.wikimedia.org/w/index.php?title=Module:Sandbox/AbstractWikipedia/TextAssembler&oldid=24221185"