Module:DecodeEncode
- Afrikaans
- Аԥсшәа
- العربية
- Авар
- Azərbaycanca
- Bikol Central
- Bosanski
- Cebuano
- Eesti
- Ελληνικά
- فارسی
- Français
- Gaeilge
- Gĩkũyũ
- 한국어
- Հայերեն
- Hrvatski
- Bahasa Indonesia
- IsiZulu
- Íslenska
- Kurdî
- Lietuvių
- മലയാളം
- Malti
- მარგალური
- مازِرونی
- Монгол
- မြန်မာဘာသာ
- Nederlands
- ଓଡ଼ିଆ
- Oʻzbekcha / ўзбекча
- ਪੰਜਾਬੀ
- پښتو
- Português
- Runa Simi
- සිංහල
- Simple English
- SiSwati
- Srpskohrvatski / српскохрватски
- Tagalog
- தமிழ்
- တႆး
- ไทย
- ትግርኛ
- Türkçe
- Українська
- اردو
- 中文
Appearance
From Wikipedia, the free encyclopedia
This is the current revision of this page, as edited by Lemondoge (talk | contribs) at 20:18, 17 April 2023 (Fixed error (`a ~= (nil or '')` doesn't work; change to `a and a ~= ''`).). The present address (URL) is a permanent link to this version.Revision as of 20:18, 17 April 2023 by Lemondoge (talk | contribs) (Fixed error (`a ~= (nil or '')` doesn't work; change to `a and a ~= ''`).)
This module is rated as ready for general use. It has reached a mature state, is considered relatively stable and bug-free, and may be used wherever appropriate. It can be mentioned on help pages and other Wikipedia resources as an option for new users. To minimise server load and avoid disruptive output, improvements should be developed through sandbox testing rather than repeated trial-and-error editing.
Page template-protected This module is currently protected from editing.
See the protection policy and protection log for more details. Please discuss any changes on the talk page; you may submit an edit request to ask an administrator to make an edit if it is uncontroversial or supported by consensus. You may also request that this page be unprotected.
See the protection policy and protection log for more details. Please discuss any changes on the talk page; you may submit an edit request to ask an administrator to make an edit if it is uncontroversial or supported by consensus. You may also request that this page be unprotected.
Warning This Lua module is used on approximately 140,000 pages .
To avoid major disruption and server load, any changes should be tested in the module's /sandbox or /testcases subpages, or in your own module sandbox. The tested changes can be added to this page in a single edit. Consider discussing changes on the talk page before implementing them.
To avoid major disruption and server load, any changes should be tested in the module's /sandbox or /testcases subpages, or in your own module sandbox. The tested changes can be added to this page in a single edit. Consider discussing changes on the talk page before implementing them.
Implements Lua functions mw.text.decode, mw.text.encode in a module.
{{#invoke:decodeEncode|decode|s=Source text©}}
→Source text©
See List of XML and HTML character entity references.
Decode (© → ©)
See § Known issues for possible THIN SPACE, epsilon issues
- Decodes Named Entities from entity name into a regular (unicode) character:
©
→©
>
→>
All well-defined named entities are decoded (HTML Named character references, formally: as defined in the PHP table).
- A regular, rendered sentence:
- "At 100 °F, & with a "burning" sun above, we , we ⁄walked⁄."
- In code:
- "
At 100 °F, & with a "burning" sun above, we ⁄walked⁄.
" -- wikitext
- "
- Processing:
{{#invoke:decodeEncode|decode|s=At 100 °F, & with a "burning" sun above, we ⁄walked⁄.}}
→At 100 °F, & with a "burning" sun above, we ⁄walked⁄.
-- In code: straight characters, no named entities.
- Renders, again:
- "At 100 °F, & with a "burning" sun above, we ⁄walked⁄."
Decode a reduced set only
By setting |subset_only=true
, only these five entity names are decoded: '<', '>', '&', '"', ' ' (that is, into '<', '>', '&', '"', ' ').
- Note: There is a difference with the relevant Lua parameter. (This only concerns your task if you also work directly with the Lua mw.text.decode function). Lua documentation defines parameter
|decodeNamedEntities=
, having this effect: when omitted or false, only the reduced set of entities is recognized and decoded. This use of 'false' is inverted in using|subset_only=
:|decodeNamedEntities=false
=|subset_only=true
.
- Also, this module ignores the "omitted" logic:
|subset_only=
should be set explicitly to 'true' to be effective.
Encode (© → ©)
- Function
encode
encodes some entity-named characters into that name (for example:&
→&
).
Regular sentence:
- "At >100 °F, & with a "burning" sun above, we walked. ©"
In code:
- "
At >100 °F, & with a "burning" sun above, we walked. ©
"
Encode:
{{#invoke:decodeEncode|encode|s=At >100 °F, & with a "burning" sun above, we walked. ©|charset=&<>{{!}}°"'&©}}
- →
At >100 °F, & with a "burning" sun above, we walked. ©
- Renders as:
- "At >100 °F, & with a "burning" sun above, we walked. ©"
character set to encode
Per Lua documentation, only a small set of characters is processed. The characterset can be set (expanded) by using |charset=
.
- Example:
|charset=<>" \'&
(the default),|charset=<>°"'&©{{!}}
; characters not in the default will be replaced by their decimal entity:©
→©
(hexadecimal number, not decimal nor named ©)
Known issues
- 13 Sep 2021: NOTE: The encode function with user-supplied charset is now used productively in {{R/superscript }} and {{R/ref }}. Before implementing breaking changes here, these templates need to be adjusted accordingly!
- 26 Sep 2021: U+2009 THIN SPACE ( ,  )
- Note: Possible bug: Decoding
 
works, but 
doesn't. - Resolved in code.
- 4 Feb 2023: U+03B5 ε GREEK SMALL LETTER EPSILON (ε, ε)
- See Module talk:DecodeEncode § Bug report: bad decoding of U+03B5 ε (epsilon)
- Resolved in code.
See also
The above documentation is transcluded from Module:DecodeEncode/doc. (edit | history)
Editors can experiment in this module's sandbox (edit | diff) and testcases (create) pages.
Subpages of this module.
Editors can experiment in this module's sandbox (edit | diff) and testcases (create) pages.
Subpages of this module.
require('strict') localp={} localfunction_getBoolean(boolean_str) -- from: module:String; adapted -- requires an explicit true localboolean_value iftype(boolean_str)=='string'then boolean_str=boolean_str:lower() ifboolean_str=='true'orboolean_str=='yes'orboolean_str=='1'then boolean_value=true else boolean_value=false end elseiftype(boolean_str)=='boolean'then boolean_value=boolean_str else boolean_value=false end returnboolean_value end functionp.decode(frame) locals=frame.args['s']or'' localsubset_only=_getBoolean(frame.args['subset_only']orfalse) returnp._decode(s,subset_only) end functionp._decode(s,subset_only) -- U+2009 THIN SPACE: workaround for bug: HTML entity   is decoded incorrect. Entity   gets decoded properly s=mw.ustring.gsub(s,' ',' ') -- U+03B5 ε GREEK SMALL LETTER EPSILON: workaround for bug (phab:T328840): HTML entity ε is decoded incorrect for gsub(). Entity ε gets decoded properly s=mw.ustring.gsub(s,'ε','ε') localret=mw.text.decode(s,notsubset_only) returnret end functionp.encode(frame) locals=frame.args['s']or'' localcharset=frame.args['charset'] returnp._encode(s,charset) end functionp._encode(s,charset) -- example: charset = '_&©−°\\\"\'\=' -- do escape with backslash not %; localret ifcharsetandcharset~=''then ret=mw.text.encode(s,charset) else -- use default: chartset = '<>&"\' ' (outer quotes = lua required; space = NBSP) ret=mw.text.encode(s) end returnret end returnp