API:Parsing wikitext
MediaWiki Action API |
---|
Basics |
Authentication |
Accounts and Users |
Page Operations |
|
Search |
Developer Utilities |
Tutorials |
v · d · e |
GET/POST request to parse content of a page and obtain the output.
API documentation
[edit ]action=parse
- This module requires read rights.
- Source: MediaWiki
- License: GPL-2.0-or-later
Parses content and returns parser output.
See the various prop-modules of action=query to get information from the current version of a page.
There are several ways to specify the text to parse:
- Specify a page or revision, using page, pageid, or oldid.
- Specify content explicitly, using text, title, revid, and contentmodel.
- Specify only a summary to parse. prop should be given an empty value.
- title
Title of page the text belongs to. If omitted, contentmodel must be specified, and API will be used as the title.
- text
Text to parse. Use title or contentmodel to control the content model.
- revid
Revision ID, for
{{REVISIONID}}
and similar variables.- Type: integer
- summary
Summary to parse.
- page
Parse the content of this page. Cannot be used together with text and title.
- pageid
Parse the content of this page. Overrides page.
- Type: integer
- redirects
If page or pageid is set to a redirect, resolve it.
- Type: boolean (details)
- oldid
Parse the content of this revision. Overrides page and pageid.
- Type: integer
- prop
Which pieces of information to get:
- text
- Gives the parsed text of the wikitext.
- langlinks
- Gives the language links in the parsed wikitext.
- categories
- Gives the categories in the parsed wikitext.
- categorieshtml
- Gives the HTML version of the categories.
- links
- Gives the internal links in the parsed wikitext.
- templates
- Gives the templates in the parsed wikitext.
- images
- Gives the images in the parsed wikitext.
- externallinks
- Gives the external links in the parsed wikitext.
- sections
- Gives the sections in the parsed wikitext.
- revid
- Adds the revision ID of the parsed page.
- displaytitle
- Adds the title of the parsed wikitext.
- subtitle
- Adds the page subtitle for the parsed page.
- headhtml
- Gives parsed doctype, opening
<html>
,<head>
element and opening<body>
of the page. - modules
- Gives the ResourceLoader modules used on the page. To load, use
mw.loader.using()
. Either jsconfigvars or encodedjsconfigvars must be requested jointly with modules. - jsconfigvars
- Gives the JavaScript configuration variables specific to the page. To apply, use
mw.config.set()
. - encodedjsconfigvars
- Gives the JavaScript configuration variables specific to the page as a JSON string.
- indicators
- Gives the HTML of page status indicators used on the page.
- iwlinks
- Gives interwiki links in the parsed wikitext.
- wikitext
- Gives the original wikitext that was parsed.
- properties
- Gives various properties defined in the parsed wikitext.
- limitreportdata
- Gives the limit report in a structured way. Gives no data, when disablelimitreport is set.
- limitreporthtml
- Gives the HTML version of the limit report. Gives no data, when disablelimitreport is set.
- parsetree
- The XML parse tree of revision content (requires content model
wikitext
) - parsewarnings
- Gives the warnings that occurred while parsing content (as wikitext).
- parsewarningshtml
- Gives the warnings that occurred while parsing content (as HTML).
- headitems
- Deprecated. Gives items to put in the
<head>
of the page.
- Values (separate with | or alternative): categories, categorieshtml, displaytitle, encodedjsconfigvars, externallinks, headhtml, images, indicators, iwlinks, jsconfigvars, langlinks, limitreportdata, limitreporthtml, links, modules, parsetree, parsewarnings, parsewarningshtml, properties, revid, sections, subtitle, templates, text, wikitext, headitems
- Default: text|langlinks|categories|links|templates|images|externallinks|sections|revid|displaytitle|iwlinks|properties|parsewarnings
- wrapoutputclass
CSS class to use to wrap the parser output.
- Default: mw-parser-output
- usearticle
Use the ArticleParserOptions hook to ensure the options used match those used for article page views
- Type: boolean (details)
- parsoid
Generate HTML conforming to the MediaWiki DOM spec using Parsoid.
- Type: boolean (details)
- pst
Do a pre-save transform on the input before parsing it. Only valid when used with text.
- Type: boolean (details)
- onlypst
Do a pre-save transform (PST) on the input, but don't parse it. Returns the same wikitext, after a PST has been applied. Only valid when used with text.
- Type: boolean (details)
- effectivelanglinks
- Deprecated.
Includes language links supplied by extensions (for use with prop=langlinks).
- Type: boolean (details)
- section
Only parse the content of the section with this identifier.
When new, parse text and sectiontitle as if adding a new section to the page.
new is allowed only when specifying text.
- sectiontitle
New section title when section is new.
Unlike page editing, this does not fall back to summary when omitted or empty.
- disablepp
- Deprecated.
Use disablelimitreport instead.
- Type: boolean (details)
- disablelimitreport
Omit the limit report ("NewPP limit report") from the parser output.
- Type: boolean (details)
- disableeditsection
Omit edit section links from the parser output.
- Type: boolean (details)
- disablestylededuplication
Do not deduplicate inline stylesheets in the parser output.
- Type: boolean (details)
- showstrategykeys
Whether to include internal merge strategy information in jsconfigvars.
- Type: boolean (details)
- generatexml
- Deprecated.
Generate XML parse tree (requires content model
wikitext
; replaced by prop=parsetree).- Type: boolean (details)
- preview
Parse in preview mode.
- Type: boolean (details)
- sectionpreview
Parse in section preview mode (enables preview mode too).
- Type: boolean (details)
- disabletoc
Omit table of contents in output.
- Type: boolean (details)
- useskin
Apply the selected skin to the parser output. May affect the following properties: text, langlinks, headitems, modules, jsconfigvars, indicators.
- One of the following values: apioutput, authentication-popup, cologneblue, fallback, json, minerva, modern, monobook, timeless, vector, vector-2022
- contentformat
Content serialization format used for the input text. Only valid when used with text.
- One of the following values: application/json, application/octet-stream, application/unknown, application/x-binary, text/css, text/javascript, text/plain, text/unknown, text/x-wiki, unknown/unknown
- contentmodel
Content model of the input text. If omitted, title must be specified, and default will be the model of the specified title. Only valid when used with text.
- One of the following values: GadgetDefinition, JsonSchema, MassMessageListContent, NewsletterContent, Scribunto, SecurePoll, css, flow-board, javascript, json, sanitized-css, text, translate-messagebundle, unknown, wikitext
- mobileformat
Return parse output in a format suitable for mobile devices.
- Type: boolean (details)
- templatesandboxprefix
Template sandbox prefix, as with Special:TemplateSandbox.
- Separate values with | or alternative.
- Maximum number of values is 50 (500 for clients that are allowed higher limits).
- templatesandboxtitle
Parse the page using templatesandboxtext in place of the contents of the page named here.
- templatesandboxtext
Parse the page using this page content in place of the page named by templatesandboxtitle.
- templatesandboxcontentmodel
Content model of templatesandboxtext.
- One of the following values: GadgetDefinition, JsonSchema, MassMessageListContent, NewsletterContent, Scribunto, SecurePoll, css, flow-board, javascript, json, sanitized-css, text, translate-messagebundle, unknown, wikitext
- templatesandboxcontentformat
Content format of templatesandboxtext.
- One of the following values: application/json, application/octet-stream, application/unknown, application/x-binary, text/css, text/javascript, text/plain, text/unknown, text/x-wiki, unknown/unknown
- Parse a page.
- api.php?action=parse&page=Project:Sandbox [open in sandbox]
- Parse wikitext.
- api.php?action=parse&text={{Project:Sandbox}}&contentmodel=wikitext [open in sandbox]
- Parse wikitext, specifying the page title.
- api.php?action=parse&text={{PAGENAME}}&title=Test [open in sandbox]
- Parse a summary.
- api.php?action=parse&summary=Some+[[link]]&prop= [open in sandbox]
Example 1: Parse content of a page
[edit ]GET request
[edit ]Response
[edit ]{ "parse":{ "title":"Pet door", "pageid":3276454, "revid":852892138, "text":{ "*":"<div class=\"mw-parser-output\"><div class=\"thumb tright\"><div class=\"thumbinner\" style=\"width:222px;\"><a href=\"/wiki/File:Doggy_door_exit.JPG\" class=\"image\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/7/71/Doggy_door_exit.JPG/220px-Doggy_door_exit.JPG\" width=\"220\" height=\"165\" class=\"thumbimage\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/7/71/Doggy_door_exit.JPG/330px-Doggy_door_exit.JPG 1.5x, ... } } }
Sample code
[edit ]Python
[edit ]#!/usr/bin/python3 """ parse.py MediaWiki API Demos Demo of `Parse` module: Parse content of a page MIT License """ import requests S = requests.Session() URL = "https://en.wikipedia.org/w/api.php" PARAMS = { "action": "parse", "page": "Pet door", "format": "json" } R = S.get(url=URL, params=PARAMS) DATA = R.json() print(DATA["parse"]["text"]["*"])
PHP
[edit ]<?php /* parse.php MediaWiki API Demos Demo of `Parse` module: Parse content of a page MIT License */ $endPoint = "https://en.wikipedia.org/w/api.php"; $params = [ "action" => "parse", "page" => "Pet door", "format" => "json" ]; $url = $endPoint . "?" . http_build_query( $params ); $ch = curl_init( $url ); curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true ); $output = curl_exec( $ch ); curl_close( $ch ); $result = json_decode( $output, true ); echo( $result["parse"]["text"]["*"] );
JavaScript
[edit ]/** * parse.js * * MediaWiki API Demos * Demo of `Parse` module: Parse content of a page * * MIT License */ consturl="https://en.wikipedia.org/w/api.php?"+ newURLSearchParams({ origin:"*", action:"parse", page:"Pet door", format:"json", }); try{ constreq=awaitfetch(url); constjson=awaitreq.json(); console.log(json.parse.text["*"]); }catch(e){ console.error(e); }
MediaWiki JS
[edit ]/** * parse.js * * MediaWiki API Demos * Demo of `Parse` module: Parse content of a page * MIT License */ constparams={ action:'parse', page:'Pet door', format:'json' }; constapi=newmw.Api(); api.get(params).done(data=>{ console.log(data.parse.text['*']); });
Example 2: Parse a section of a page and fetch its table data
[edit ]GET request
[edit ]Response
[edit ]Response |
---|
{ "parse":{ "title":"Wikipedia:Unusual articles/Places and infrastructure", "pageid":38664530, "wikitext":{ "*":"===Antarctica===\n<!--[[File:Grytviken church.jpg|thumb|150px|right|A little church in [[Grytviken]] in the [[Religion in Antarctica|Antarctic]].]]-->\n{| class=\"wikitable\"\n|-\n| '''[[Emilio Palma]]'''\n| An Argentine national who is the first person known to be born on the continent of Antarctica.\n|-\n| '''[[Scouting in the Antarctic]]'''\n| Always be prepared for glaciers and penguins.\n|}" } } } |
Sample code
[edit ]parse_wikitable.py |
---|
#!/usr/bin/python3 """ parse_wikitable.py MediaWiki Action API Code Samples Demo of `Parse` module: Parse a section of a page, fetch its table data and save it to a CSV file MIT license """ import csv import requests S = requests.Session() URL = "https://en.wikipedia.org/w/api.php" TITLE = "Wikipedia:Unusual_articles/Places_and_infrastructure" PARAMS = { 'action': "parse", 'page': TITLE, 'prop': 'wikitext', 'section': 5, 'format': "json" } def get_table(): """ Parse a section of a page, fetch its table data and save it to a CSV file """ res = S.get(url=URL, params=PARAMS) data = res.json() wikitext = data['parse']['wikitext']['*'] lines = wikitext.split('|-') entries = [] for line in lines: line = line.strip() if line.startswith("|"): table = line[2:].split('||') entry = table[0].split("|")[0].strip("'''[[]]\n"), table[0].split("|")[1].strip("\n") entries.append(entry) file = open("places_and_infrastructure.csv", "w") writer = csv.writer(file) writer.writerows(entries) file.close() if __name__ == '__main__': get_table() |
Possible errors
[edit ]Code | Info |
---|---|
missingtitle | The page you specified doesn't exist. |
nosuchsection | There is no section section in page. |
pagecannotexist | Namespace doesn't allow actual pages. |
invalidparammix |
|
Parameter history
[edit ]- v1.38: Introduced
showstrategykeys
- v1.32: Deprecated
disabletidy
- v1.31: Introduced
disablestylededuplication
- v1.30: Introduced
revid
,useskin
,wrapoutputclass