Jump to content
Wikipedia The Free Encyclopedia

Module:WikitextParser

From Wikipedia, the free encyclopedia
Module documentation[view] [edit] [history] [purge]
This module is rated as alpha. It is ready for limited use and third-party feedback. It may be used on a small number of pages, but should be monitored closely. Suggestions for new features or adjustments to input and output are welcome.
Page template-protected This module is currently protected from editing.
See the protection policy and protection log for more details. Please discuss any changes on the talk page; you may submit an edit request to ask an administrator to make an edit if it is uncontroversial or supported by consensus. You may also request that this page be unprotected.

This module is a general-purpose wikitext parser. It's designed to be used by other Lua modules and shouldn't be called directly by templates.

Usage

First, require WikitextParser and get some wikitext to parse. For example:

localparser=require('Module:WikitextParser')
localtitle=mw.title.getCurrentTitle()
localwikitext=title:getContent()

Then, use and combine the available methods. For example:

localsections=parser.getSections(wikitext)
forsectionTitle,sectionContentinpairs(sections)do
localsectionFiles=parser.getFiles(sectionContent)
-- Do stuff
end

Methods

getLead

getLead( wikitext )

Returns the lead section from the given wikitext. The lead section is defined as everything before the first section title. If there's no lead section, an empty string will be returned.

getSections

getSections( wikitext )

Returns a table with the section titles as keys and the section contents as values. This method doesn't get the lead section (use getLead for that).

getSection

getSection( wikitext, sectionTitle )

Returns the content of the section with the given section title, including subsections. If you don't want subsections, use getSections instead. If the given section title appears more than once, only the first will be returned. If the section is not found, nil will be returned.

getSectionTag

getSectionTag( wikitext, tagName )

Returns the contents of the <section> tag with the given tag name (see Help:Labeled section transclusion). If the tag is not found, nil will be returned.

getLists

getLists( wikitext )

Returns a table with each value being a list (ordered or unordered).

getParagraphs

getParagraphs( wikitext )

Returns a table with each value being a paragraph. Paragraphs are defined as block-level elements that are not lists, templates, files, categories, tables or section titles.

getTemplates

getTemplates( wikitext )

Returns a table with each value being a template.

getTemplate

getTemplate( wikitext, templateName )

Returns the template with the given template name.

getTemplateName

getTemplateName( templateWikitext )

Returns the name of the given template. If the given wikitext is not recognized as that of a template, nil will be returned.

getTemplateParameters

getTemplateParameters( templateWikitext )

Returns a table with the parameter names as keys and the parameter values as values. For unnamed parameters, the keys are numerical. If the given wikitext is not recognized as that of a template, nil will be returned.

getTags

getTags( wikitext )

Returns a table with each value being a tag and its contents (like <div>, <gallery>, <ref>, <noinclude>). Tags inside tags will be ignored. If you're interested in getting them, run this method again for each of the returned tags.

getTagName

getTagName( tagWikitext )

Returns the name of the tag in the given wikitext. For example 'div', 'span', 'gallery', 'ref', etc.

getTagAttribute

getTagAttribute( tagWikitext, attribute )

Returns the value of an attribute in the given tag. For example the id of a div or the name of a reference.

getGalleries

getGalleries( wikitext )

Returns a table with each value being a gallery.

getReferences

getReferences( wikitext )

Returns a table with each value being a reference. This includes self-closing references (like <ref name="foo" />) as well as full references.

getTables

getTables( wikitext )

Returns a table with each value being a wiki table.

getTableAttribute

getTableAttribute( tableWikitext, attribute )

Returns the value of an attribute in the given wiki table. For example the id or the class.

getTable

getTable( wikitext, id )

Returns the wiki table with the given id. If not found, nil will be returned.

getTableData

getTableData( tableWikitext )

Returns a Lua table representing the data of the given wiki table.

getLinks( wikitext )

Returns a Lua table with each value being a wiki link. For external links, use getExternalLinks instead.

getFileLinks( wikitext )

Returns a Lua table with each value being a file link.

getFileName

getFileName( fileWikitext )

Returns the name of the given template. If the given wikitext is not recognized as that of a file, nil will be returned.

getCategories

getCategories( wikitext )

Returns a Lua table with each value being a category link.

getExternalLinks( wikitext )

Returns a Lua table with each value being an external link. For internal links, use getLinks instead.

See also

The above documentation is transcluded from Module:WikitextParser/doc. (edit | history)
Editors can experiment in this module's sandbox (edit | diff) and testcases (edit | run) pages.
Subpages of this module.

 -- Module:WikitextParser is a general-purpose wikitext parser
 -- Documentation and master version: https://en.wikipedia.org/wiki/Module:WikitextParser
 -- Authors: User:Sophivorus, User:Certes, User:Aidan9382, et al.
 -- License: CC-BY-SA-4.0
 localWikitextParser={}

 -- Private helper method to escape a string for use in regexes
 localfunctionescapeString(str)
 returnstring.gsub(str,'[%^%$%(%)%.%[%]%*%+%-%?%%]','%%%0')
 end

 -- Get the lead section from the given wikitext
 -- The lead section is any content before the first section title.
 -- @param wikitext Required. Wikitext to parse.
 -- @return Wikitext of the lead section. May be empty if the lead section is empty.
 functionWikitextParser.getLead(wikitext)
 wikitext='\n'..wikitext
 wikitext=string.gsub(wikitext,'\n==.*','')
 wikitext=mw.text.trim(wikitext)
 returnwikitext
 end

 -- Get the sections from the given wikitext
 -- This method doesn't get the lead section, use getLead for that
 -- @param wikitext Required. Wikitext to parse.
 -- @return Map from section title to section content
 functionWikitextParser.getSections(wikitext)
 localsections={}
 wikitext='\n'..wikitext..'\n=='
 fortitleinstring.gmatch(wikitext,'\n==+ *([^=]-) *==+')do
 localsection=string.match(wikitext,'\n==+ *'..escapeString(title)..' *==+(.-)\n==')
 section=mw.text.trim(section)
 sections[title]=section
 end
 returnsections
 end

 -- Get a section from the given wikitext (including any subsections)
 -- If the given section title appears more than once, only the section of the first instance will be returned
 -- @param wikitext Required. Wikitext to parse.
 -- @param title Required. Title of the section
 -- @return Wikitext of the section, or nil if it isn't found. May be empty if the section is empty or contains only subsections.
 functionWikitextParser.getSection(wikitext,title)
 title=mw.text.trim(title)
 title=escapeString(title)
 wikitext='\n'..wikitext..'\n'
 locallevel,wikitext=string.match(wikitext,'\n(==+) *'..title..' *==.-\n(.*)')
 ifwikitextthen
 localnextSection='\n=='..string.rep('=?',#level-2)..'[^=].*'
 wikitext=string.gsub(wikitext,nextSection,'')-- remove later sections at this level or higher
 wikitext=mw.text.trim(wikitext)
 returnwikitext
 end
 end

 -- Get the content of a <section> tag from the given wikitext.
 -- We can't use getTags because unlike all other tags, both opening and closing <section> tags are self-closing.
 -- @param wikitext Required. Wikitext to parse.
 -- @param name Required. Name of the <section> tag
 -- @return Content of the <section> tag, or nil if it isn't found. May be empty if the section tag is empty.
 functionWikitextParser.getSectionTag(wikitext,name)
 name=mw.text.trim(name)
 name=escapeString(name)
 localsections={}
 forsectioninstring.gmatch(wikitext,'< *section +begin *= *["\']? *'..name..' *["\']? */>(.-)< *section +end= *["\']? *'..name..' *["\']? */>')do
 table.insert(sections,section)
 end
 if#sections>0then
 returntable.concat(sections)
 end
 end

 -- Get the lists from the given wikitext.
 -- @param wikitext Required. Wikitext to parse.
 -- @return Sequence of lists.
 functionWikitextParser.getLists(wikitext)
 locallists={}
 wikitext='\n'..wikitext..'\n\n'
 forlistinstring.gmatch(wikitext,'\n([*#].-)\n[^*#]')do
 table.insert(lists,list)
 end
 returnlists
 end

 -- Get the paragraphs from the given wikitext.
 -- @param wikitext Required. Wikitext to parse.
 -- @return Sequence of paragraphs.
 functionWikitextParser.getParagraphs(wikitext)
 localparagraphs={}

 -- Remove non-paragraphs
 wikitext='\n'..wikitext..'\n'-- add newlines to simplfy patterns
 wikitext=string.gsub(wikitext,'%f[^\n]<!%-%-.-%-%->%f[\n]','')-- remove comments
 wikitext=string.gsub(wikitext,'%f[^\n]%[%b[]%]%f[\n]','')-- remove files and categories
 wikitext=string.gsub(wikitext,'%f[^\n]%b{} *%f[\n]','')-- remove tables and block templates
 wikitext=string.gsub(wikitext,'%f[^\n]%b{} *%b{} *%f[\n]','')-- remove neighboring tables and block templates
 wikitext=string.gsub(wikitext,'%f[^\n]%b{} *<!%-%-.-%-%-> *%b{} *%f[\n]','')-- remove neighboring tables and block templates with a comment among them
 wikitext=string.gsub(wikitext,'%f[^\n][*#].-%f[\n]','')-- remove lists
 wikitext=string.gsub(wikitext,'%f[^\n]==+[^=]+==+ *%f[\n]','')-- remove section titles
 wikitext=mw.text.trim(wikitext)

 forparagraphinmw.text.gsplit(wikitext,'\n\n+')do
 ifmw.text.trim(paragraph)~=''then
 table.insert(paragraphs,paragraph)
 end
 end
 returnparagraphs
 end

 -- Get the templates from the given wikitext.
 -- @param wikitext Required. Wikitext to parse.
 -- @return Sequence of templates.
 functionWikitextParser.getTemplates(wikitext)
 localtemplates={}
 fortemplateinstring.gmatch(wikitext,'{%b{}}')do
 ifstring.sub(template,1,3)~='{{#'then-- skip parser functions like #if
 table.insert(templates,template)
 end
 end
 returntemplates
 end

 -- Get the requested template from the given wikitext.
 -- If the template appears more than once, only the first instance will be returned
 -- @param wikitext Required. Wikitext to parse.
 -- @param name Name of the template to get
 -- @return Wikitext of the template, or nil if it wasn't found
 functionWikitextParser.getTemplate(wikitext,name)
 localtemplates=WikitextParser.getTemplates(wikitext)
 locallang=mw.language.getContentLanguage()
 for_,templateinpairs(templates)do
 localtemplateName=WikitextParser.getTemplateName(template)
 iflang:ucfirst(templateName)==lang:ucfirst(name)then
 returntemplate
 end
 end
 end

 -- Get name of the template from the given template wikitext.
 -- @param templateWikitext Required. Wikitext of the template to parse.
 -- @return Name of the template
 -- @todo Strip "Template:" namespace?
 functionWikitextParser.getTemplateName(templateWikitext)
 returnstring.match(templateWikitext,'^{{ *([^}|\n]+)')
 end

 -- Get the parameters from the given template wikitext.
 -- @param templateWikitext Required. Wikitext of the template to parse.
 -- @return Map from parameter names to parameter values, NOT IN THE ORIGINAL ORDER.
 -- @return Order in which the parameters were parsed.
 functionWikitextParser.getTemplateParameters(templateWikitext)
 localparameters={}
 localparamOrder={}
 localparams=string.match(templateWikitext,'{{[^|}]-|(.*)}}')
 ifparamsthen
 -- Temporarily replace pipes in subtemplates and links to avoid chaos
 forsubtemplateinstring.gmatch(params,'{%b{}}')do
 params=string.gsub(params,escapeString(subtemplate),string.gsub(subtemplate,'.',{['%']='%%',['|']="@@:@@",['=']='@@_@@'}))
 end
 forlinkinstring.gmatch(params,'%[%b[]%]')do
 params=string.gsub(params,escapeString(link),string.gsub(link,'.',{['%']='%%',['|']='@@:@@',['=']='@@_@@'}))
 end
 localcount=0
 localparts,name,value
 forparaminmw.text.gsplit(params,'|')do
 parts=mw.text.split(param,'=')
 name=mw.text.trim(parts[1])
 if#parts==1then
 value=name
 count=count+1
 name=count
 else
 value=table.concat(parts,'=',2);
 value=mw.text.trim(value)
 end
 value=string.gsub(value,'@@_@@','=')
 value=string.gsub(value,'@@:@@','|')
 parameters[name]=value
 table.insert(paramOrder,name)
 end
 end
 returnparameters,paramOrder
 end

 -- Get the tags from the given wikitext.
 -- @param wikitext Required. Wikitext to parse.
 -- @return Sequence of tags.
 functionWikitextParser.getTags(wikitext)
 localtags={}
 localtag,tagName,tagEnd
 -- Don't match closing tags like </div>, comments like <!--foo-->, comparisons like 1<2 or things like <3
 fortagStart,tagOpeninstring.gmatch(wikitext,'()(<[^/!%d].->)')do
 tagName=WikitextParser.getTagName(tagOpen)

 -- If we're in a self-closing tag, like <ref name="foo" />, <references/>, <br/>, <br>, <hr>, etc.
 ifstring.match(tagOpen,'<.-/>')ortagName=='br'ortagName=='hr'then
 tag=tagOpen

 -- If we're in a tag that may contain others like it, like <div> or <span>
 elseiftagName=='div'ortagName=='span'then
 localposition=tagStart+#tagOpen-1
 localdepth=1
 whiledepth>0do
 tagEnd=string.match(wikitext,'</ ?'..tagName..' ?>()',position)
 iftagEndthen
 tagEnd=tagEnd-1
 else
 break-- unclosed tag
 end
 position=string.match(wikitext,'()< ?'..tagName..'[ >]',position+1)
 ifnotpositionthen
 position=tagEnd+1
 end
 ifposition>tagEndthen
 depth=depth-1
 else
 depth=depth+1
 end
 end
 tag=string.sub(wikitext,tagStart,tagEnd)

 -- Else we're probably in tag that shouldn't contain others like it, like <math> or <strong>
 else
 tagEnd=string.match(wikitext,'</ ?'..tagName..' ?>()',tagStart)
 iftagEndthen
 tag=string.sub(wikitext,tagStart,tagEnd-1)

 -- If no end tag is found, assume we matched something that wasn't a tag, like <no. 1>
 else
 tag=nil
 end
 end
 table.insert(tags,tag)
 end
 returntags
 end

 -- Get the name of the tag in the given wikitext
 -- @param tag Required. Tag to parse.
 -- @return Name of the tag or nil if not found
 functionWikitextParser.getTagName(tagWikitext)
 localtagName=string.match(tagWikitext,'^< *(.-)[ />]')
 iftagNamethentagName=string.lower(tagName)end
 returntagName
 end

 -- Get the value of an attribute in the given tag.
 -- @param tagWikitext Required. Wikitext of the tag to parse.
 -- @param attribute Required. Name of the attribute.
 -- @return Value of the attribute or nil if not found
 functionWikitextParser.getTagAttribute(tagWikitext,attribute)
 local_quote,value=string.match(tagWikitext,'^<[^/>]*'..attribute..' *= *(["\']?)([^/>]-)%1[ />]')
 returnvalue
 end

 -- Get the content of the given tag.
 -- @param tagWikitext Required. Wikitext of the tag to parse.
 -- @return Content of the tag. May be empty if the tag is empty. Will be nil if the tag is self-closing.
 -- @todo May fail with nested tags
 functionWikitextParser.getTagContent(tagWikitext)
 returnstring.match(tagWikitext,'^<.->.-</.->')
 end

 -- Get the <gallery> tags from the given wikitext.
 -- @param wikitext Required. Wikitext to parse.
 -- @return Sequence of gallery tags.
 functionWikitextParser.getGalleries(wikitext)
 localgalleries={}
 localtags=WikitextParser.getTags(wikitext)
 for_,taginpairs(tags)do
 localtagName=WikitextParser.getTagName(tag)
 iftagName=='gallery'then
 table.insert(galleries,tag)
 end
 end
 returngalleries
 end

 -- Get the <ref> tags from the given wikitext.
 -- @param wikitext Required. Wikitext to parse.
 -- @return Sequence of ref tags.
 functionWikitextParser.getReferences(wikitext)
 localreferences={}
 localtags=WikitextParser.getTags(wikitext)
 for_,taginpairs(tags)do
 localtagName=WikitextParser.getTagName(tag)
 iftagName=='ref'then
 table.insert(references,tag)
 end
 end
 returnreferences
 end

 -- Get the reference with the given name from the given wikitext.
 -- @param wikitext Required. Wikitext to parse.
 -- @param referenceName Required. Name of the reference.
 -- @return Wikitext of the reference
 functionWikitextParser.getReference(wikitext,referenceName)
 localreferences=WikitextParser.getReferences(wikitext)
 for_,referenceinpairs(references)do
 localcontent=WikitextParser.getTagContent(reference)
 localname=WikitextParser.getTagAttribute(reference,'name')
 ifcontentandname==referenceNamethen
 returnreference
 end
 end
 end

 -- Get the tables from the given wikitext.
 -- @param wikitext Required. Wikitext to parse.
 -- @return Sequence of tables.
 functionWikitextParser.getTables(wikitext)
 localtables={}
 wikitext='\n'..wikitext
 fortinstring.gmatch(wikitext,'\n%b{}')do
 ifstring.sub(t,1,3)=='\n{|'then
 t=mw.text.trim(t)-- exclude the leading newline
 table.insert(tables,t)
 end
 end
 returntables
 end

 -- Get the id from the given table wikitext
 -- @param tableWikitext Required. Wikitext of the table to parse.
 -- @param attribute Required. Name of the attribute.
 -- @return Value of the attribute or nil if not found
 functionWikitextParser.getTableAttribute(tableWikitext,attribute)
 local_quote,value=string.match(tableWikitext,'^{|[^\n]*'..attribute..' *= *(["\']?)([^\n]-)%1[^\n]*\n')
 returnvalue
 end

 -- Get a table by id from the given wikitext
 -- @param wikitext Required. Wikitext to parse.
 -- @param id Required. Id of the table
 -- @return Wikitext of the table or nil if not found
 functionWikitextParser.getTable(wikitext,id)
 localtables=WikitextParser.getTables(wikitext)
 for_,tinpairs(tables)do
 ifid==WikitextParser.getTableAttribute(t,'id')then
 returnt
 end
 end
 end

 -- Get the data from the given table wikitext
 -- @param tableWikitext Required. Wikitext of the table to parse.
 -- @return Table data
 -- @todo Test and make more robust
 functionWikitextParser.getTableData(tableWikitext)
 localtableData={}
 tableWikitext=mw.text.trim(tableWikitext);
 tableWikitext=string.gsub(tableWikitext,'^{|.-\n','')-- remove the header
 tableWikitext=string.gsub(tableWikitext,'\n|}$','')-- remove the footer
 tableWikitext=string.gsub(tableWikitext,'^|%+.-\n','')-- remove any caption
 tableWikitext=string.gsub(tableWikitext,'|%-.-\n','|-\n')-- remove any row attributes
 tableWikitext=string.gsub(tableWikitext,'^|%-\n','')-- remove any leading empty row
 tableWikitext=string.gsub(tableWikitext,'\n|%-$','')-- remove any trailing empty row
 forrowWikitextinmw.text.gsplit(tableWikitext,'|-',true)do
 localrowData={}
 rowWikitext=string.gsub(rowWikitext,'||','\n|')
 rowWikitext=string.gsub(rowWikitext,'!!','\n|')
 rowWikitext=string.gsub(rowWikitext,'\n!','\n|')
 rowWikitext=string.gsub(rowWikitext,'^!','\n|')
 rowWikitext=string.gsub(rowWikitext,'^\n|','')
 forcellWikitextinmw.text.gsplit(rowWikitext,'\n|')do
 cellWikitext=mw.text.trim(cellWikitext)
 table.insert(rowData,cellWikitext)
 end
 table.insert(tableData,rowData)
 end
 returntableData
 end

 -- Get the internal links from the given wikitext (includes category and file links).
 -- @param wikitext Required. Wikitext to parse.
 -- @return Sequence of internal links.
 functionWikitextParser.getLinks(wikitext)
 locallinks={}
 forlinkinstring.gmatch(wikitext,'%[%b[]%]')do
 table.insert(links,link)
 end
 returnlinks
 end

 -- Get the file links from the given wikitext.
 -- @param wikitext Required. Wikitext to parse.
 -- @return Sequence of file links.
 functionWikitextParser.getFiles(wikitext)
 localfiles={}
 locallinks=WikitextParser.getLinks(wikitext)
 for_,linkinpairs(links)do
 localnamespace=string.match(link,'^%[%[ *(.-) *:')
 ifnamespaceandmw.site.namespaces[namespace]andmw.site.namespaces[namespace].canonicalName=='File'then
 table.insert(files,link)
 end
 end
 returnfiles
 end

 -- Get name of the file from the given file wikitext.
 -- @param fileWikitext Required. Wikitext of the file to parse.
 -- @return Name of the file
 functionWikitextParser.getFileName(fileWikitext)
 returnstring.match(fileWikitext,'^%[%[ *.- *: *(.-) *[]|]')
 end

 -- Get the category links from the given wikitext.
 -- @param wikitext Required. Wikitext to parse.
 -- @return Sequence of category links.
 functionWikitextParser.getCategories(wikitext)
 localcategories={}
 locallinks=WikitextParser.getLinks(wikitext)
 for_,linkinpairs(links)do
 localnamespace=string.match(link,'^%[%[ -(.-) -:')
 ifnamespaceandmw.site.namespaces[namespace]andmw.site.namespaces[namespace].canonicalName=='Category'then
 table.insert(categories,link)
 end
 end
 returncategories
 end

 -- Get the external links from the given wikitext.
 -- @param wikitext Required. Wikitext to parse.
 -- @return Sequence of external links.
 functionWikitextParser.getExternalLinks(wikitext)
 locallinks={}
 forlinkinstring.gmatch(wikitext,'%b[]')do
 ifstring.match(link,'^%[//')orstring.match(link,'^%[https?://')then
 table.insert(links,link)
 end
 end
 returnlinks
 end

 returnWikitextParser

AltStyle によって変換されたページ (->オリジナル) /