Module:Text
- Адыгабзэ
- Afrikaans
- Alemannisch
- Аԥсшәа
- العربية
- অসমীয়া
- Авар
- تۆرکجه
- বাংলা
- भोजपुरी
- Boarisch
- བོད་ཡིག
- Bosanski
- Буряад
- Cebuano
- Cymraeg
- Deutsch
- فارسی
- ગુજરાતી
- 한국어
- हिन्दी
- Ilokano
- Bahasa Indonesia
- ქართული
- Kurdî
- Ladin
- मैथिली
- മലയാളം
- ဘာသာမန်
- Bahasa Melayu
- ꯃꯤꯇꯩ ꯂꯣꯟ
- Mfantse
- Minangkabau
- Монгол
- မြန်မာဘာသာ
- नेपाली
- 日本語
- Nordfriisk
- Occitan
- Oʻzbekcha / ўзбекча
- ਪੰਜਾਬੀ
- Papiamentu
- Plattdüütsch
- Português
- සිංහල
- Simple English
- سنڌي
- Slovenščina
- کوردی
- Српски / srpski
- Srpskohrvatski / српскохрватски
- Tagalog
- தமிழ்
- တႆး
- ไทย
- Türkçe
- Українська
- Tiếng Việt
- 中文
- Jaku Iban
- Руски
- ᥖᥭᥰ ᥖᥬᥲ ᥑᥨᥒᥰ
Appearance
From Wikipedia, the free encyclopedia
Warning This Lua module is used on approximately 1,820,000 pages, or roughly 3% of all pages .
To avoid major disruption and server load, any changes should be tested in the module's /sandbox or /testcases subpages, or in your own module sandbox. The tested changes can be added to this page in a single edit. Consider discussing changes on the talk page before implementing them.
To avoid major disruption and server load, any changes should be tested in the module's /sandbox or /testcases subpages, or in your own module sandbox. The tested changes can be added to this page in a single edit. Consider discussing changes on the talk page before implementing them.
This module depends on the following other modules:
Text
– Module containing methods for the manipulation of text, wikimarkup and some HTML.
Functions for templates
All methods have an unnamed parameter containing the text.
The return value is an empty string if the parameter does not meet the conditions. When the condition is matched or some result is successfully found, strings of at least one character are returned.
- char
- Creates a string from a list of character codes.
- 1
- Space-separated list of character codes
- *
- Number of repetitions of the list in parameter 1; (Default 1).
- errors
0
– Silence errors
- concatParams
- Combine any number of elements into a list, like
table.concat()
in Lua. - From a template:
- 1
- First element; missing and empty elements are ignored.
- 2 3 4 5 6 ...
- Further list elements
- From Lua
- args
- table (sequence) of the elements
- apply
- Separator between elements; defaults to
|
- adapt
- optional formatting, which will be applied to each element; must contain
%s
.
- containsCJK
- Returns whether the input string contains any CJK characters
- Returns nothing if there are no CJK characters
- removeDelimited
- Remove all text between delimiters, including the delimiters themselves.
- getPlain
- Remove wikimarkup (except templates): comments, tags, bold, italic, nbsp
- isLatinRange
- Returns some content, unless the string contains a character that would not normally be found in Latin text.
- Returns nothing if there is a non-Latin string.
- isQuote
- Returns some content if the parameter passed is a single character, and that character is a quote, such as
'
.- Returns nothing for multiple characters, or if the character passed is not a quote.
- listToText
- Formats list elements analogously to mw.text.listToText().
- The elements are separated by a comma and space ; the word "and" appears between the first and last.
- Unnamed parameters become the list items.
- Optional parameters for
#invoke
:format
– Every list element will first be formatted with this format string; see here for how to construct this string. The string must contain at least one%s
sequence.template=1
– List elements should be taken from the calling template.
- Returns the resulting string.
- quote
- Wrap the string in quotes; quotes can be chosen for a specific language.
- 1
- Input text (will be automatically trimmed); may be empty.
- 2
- (optional) the ISO 639 language code for the quote marks; should be one of the supported languages (in German)
- 3
- (optional)
2
for second level quotes. This means the single quote marks in a statement such as: Jack said, "Jill said ‘fish’ last Tuesday."
- quoteUnquoted
- Wrap the string in quotes; quotes can be chosen for a specific language. Will not quote an empty string, and will not quote if there is a quote at the start or end of the (trimmed) string.
- 1
- Input text (will be automatically trimmed); may be empty.
- 2
- (optional) the ISO 639 language code for the quote marks; should be one of the supported languages (in German)
- 3
- (optional)
2
for second level quotes. This means the single quote marks in a statement such as: Jack said, "Jill said ‘fish’ last Tuesday."
- removeDiacritics
- Removes all diacritical marks from the input.
- 1
- Input text
- sentenceTerminated
- Is this sentence terminated? Should work with CJK, and allows quotation marks to follow.
- Returns nothing if the sentence is unterminated.
- ucfirstAll
- The first letter of every recognized word is converted to upper case. This contrasts with the parser function {{ucfirst:}} which changes only the first character of the whole string passed.
- A few common HTML entities are protected; the implementation of this may mean that numerical entities passed (e.g.
&)
are converted to&
form - uprightNonlatin
- Takes a string. Italicized non-Latin characters are un-italicized, unless they are a single Greek letter.
- zip
- Combines a tuple of lists by convolution. This is easiest to explain by example: given two lists, list1 = "a,b,c" and list2 = "1,2,3", then
zip(list1, list2, sep = ",", isep = "-", osep = "/")
outputsa-1/b-2/c-3
- 1, 2, 3, ... – Lists to be combined
sep
– A separator (in Lua regex form) used to split the lists. If empty, the lists are split into individual characters.sep1
,sep2
,sep3
, ... – Allows a different separator to be used for each list.isep
– Output separator; placed between elements which were at the same index in their lists.osep
– Output separator; placed between elements which had different original indices; i.e. between the groups joined withisep
- split
- Splits a string into chunks at the specified delimiter, and returns the first (or user-specified) chunk. This is non-Unicode-aware implementation of mw.text.split which, for ASCII-only text, can be up to 60 times faster.
- 1 (or
text
) – the text to be split - 2 (or
pattern
) – the pattern to use when splitting the text. By default, this can be a string library pattern. - 3 (or
plain
) – if set to "true",pattern
will be interpreted as plain text, not a pattern. - 4 (or
index
) – The chunk to return. If omitted, the first chunk will be returned. Can be set to a negative number to count from the end (e.g.-1
will return the last chunk).
- 1 (or
Examples and test page
There are tests available (in German) to illustrate this in practice.
Use in another Lua module
All of the above functions can be called from other Lua modules. Use require()
; the below code checks for errors loading it:
locallucky,Text=pcall(require,"Module:Text") iftype(Text)=="table"then Text=Text.Text() else -- In the event of errors, Text is an error message. return"<span class=\"error\">"..Text.."</span>" end
You may then call:
Text.char( apply, again, accept )
Text.concatParams( args, separator, format )
Text.containsCJK( s )
Text.removeDelimited( s )
Text.getPlain( s )
Text.isLatinRange( s )
Text.isQuote( c )
Text.listToText( table, format )
Text.quote( s, lang, mode )
Text.quoteUnquoted( s, lang, mode )
Text.removeDiacritics( s )
Text.sentenceTerminated( s )
Text.split( text, pattern, plain )
– non Unicode version of mw.text.splitText.gsplit( text, pattern, plain )
– non Unicode version of mw.text.gsplitText.ucfirstAll( s )
Text.uprightNonlatin( s )
Usage
This is a general library; use it anywhere.
Dependencies
- Module:Yesno
- Module:Text/data --- Lua patterns and information about quotes
See also
The above documentation is transcluded from Module:Text/doc. (edit | history)
Editors can experiment in this module's sandbox (edit | diff) and testcases (edit | run) pages.
Subpages of this module.
Editors can experiment in this module's sandbox (edit | diff) and testcases (edit | run) pages.
Subpages of this module.
localyesNo=require("Module:Yesno") localText={serial="2024年09月21日", suite="Text"} --[=[ Text utilities ]=] localfunctionfiatQuote(apply,alien,advance) -- Quote text -- Parameter: -- apply -- string, with text -- alien -- string, with language code -- advance -- number, with level 1 or 2 localr=applyandtostring(apply)or"" alien=alienor"en" advance=tonumber(advance)or0 localsuite localdata=mw.loadData('Module:Text/data') localQuoteLang=data.QuoteLang localQuoteType=data.QuoteType localslang=alien:match("^(%l+)-") suite=QuoteLang[alien]orslangandQuoteLang[slang]orQuoteLang["en"] ifsuitethen localquotes=QuoteType[suite] ifquotesthen localspace ifquotes[3]then space=" " else space="" end quotes=quotes[advance] ifquotesthen r=mw.ustring.format("%s%s%s%s%s", mw.ustring.char(quotes[1]), space, apply, space, mw.ustring.char(quotes[2])) end else mw.log("fiatQuote() "..suite) end end returnr end-- fiatQuote() Text.char=function(apply,again,accept) -- Create string from codepoints -- Parameter: -- apply -- table (sequence) with numerical codepoints, or nil -- again -- number of repetitions, or nil -- accept -- true, if no error messages to be appended -- Returns: string localr="" apply=type(apply)=="table"andapplyor{} again=math.floor(tonumber(again)or1) ifagain<1then return"" end localbad={} localcodes={} for_,vinipairs(apply)do localn=tonumber(v) ifnotnor(n<32andn~=9andn~=10)then table.insert(bad,tostring(v)) else table.insert(codes,math.floor(n)) end end if#bad>0then ifnotacceptthen r=tostring(mw.html.create("span") :addClass("error") :wikitext("bad codepoints: "..table.concat(bad," "))) end returnr end if#codes>0then r=mw.ustring.char(unpack(codes)) ifagain>1then r=r:rep(again) end end returnr end-- Text.char() localfunctiontrimAndFormat(args,fmt) localresult={} iftype(args)~='table'then args={args} end for_,vinipairs(args)do v=mw.text.trim(tostring(v)) ifv~=""then table.insert(result,fmtandmw.ustring.format(fmt,v)orv) end end returnresult end Text.concatParams=function(args,apply,adapt) -- Concat list items into one string -- Parameter: -- args -- table (sequence) with numKey=string -- apply -- string (optional); separator (default: "|") -- adapt -- string (optional); format including "%s" -- Returns: string localcollect={} returntable.concat(trimAndFormat(args,adapt),applyor"|") end-- Text.concatParams() Text.containsCJK=function(s) -- Is any CJK code within? -- Parameter: -- s -- string -- Returns: true, if CJK detected s=sandtostring(s)or"" localpatternCJK=mw.loadData('Module:Text/data').PatternCJK returnmw.ustring.find(s,patternCJK)~=nil end-- Text.containsCJK() Text.removeDelimited=function(s,prefix,suffix) -- Remove all text in s delimited by prefix and suffix (inclusive) -- Arguments: -- s = string to process -- prefix = initial delimiter -- suffix = ending delimiter -- Returns: stripped string s=sandtostring(s)or"" prefix=prefixandtostring(prefix)or"" suffix=suffixandtostring(suffix)or"" localprefixLen=mw.ustring.len(prefix) localsuffixLen=mw.ustring.len(suffix) ifprefixLen==0orsuffixLen==0then returns end locali=s:find(prefix,1,true) localr=s localj whileido j=r:find(suffix,i+prefixLen) ifjthen r=r:sub(1,i-1)..r:sub(j+suffixLen) else r=r:sub(1,i-1) end i=r:find(prefix,1,true) end returnr end Text.getPlain=function(adjust) -- Remove wikisyntax from string, except templates -- Parameter: -- adjust -- string -- Returns: string localr=Text.removeDelimited(adjust,"<!--","-->") r=r:gsub("(</?%l[^>]*>)","") :gsub("'''","") :gsub("''","") :gsub(" "," ") returnr end-- Text.getPlain() Text.isLatinRange=function(s) -- Are characters expected to be latin or symbols within latin texts? -- Arguments: -- s = string to analyze -- Returns: true, if valid for latin only s=sandtostring(s)or""--- ensure input is always string localPatternLatin=mw.loadData('Module:Text/data').PatternLatin returnmw.ustring.match(s,PatternLatin)~=nil end-- Text.isLatinRange() Text.isQuote=function(s) -- Is this character any quotation mark? -- Parameter: -- s = single character to analyze -- Returns: true, if s is quotation mark s=sandtostring(s)or"" ifs==""then returnfalse end localSeekQuote=mw.loadData('Module:Text/data').SeekQuote returnmw.ustring.find(SeekQuote,s,1,true)~=nil end-- Text.isQuote() Text.listToText=function(args,adapt) -- Format list items similar to mw.text.listToText() -- Parameter: -- args -- table (sequence) with numKey=string -- adapt -- string (optional); format including "%s" -- Returns: string returnmw.text.listToText(trimAndFormat(args,adapt)) end-- Text.listToText() Text.quote=function(apply,alien,advance) -- Quote text -- Parameter: -- apply -- string, with text -- alien -- string, with language code, or nil -- advance -- number, with level 1 or 2, or nil -- Returns: quoted string apply=applyandtostring(apply)or"" localmode,slang iftype(alien)=="string"then slang=mw.text.trim(alien):lower() else slang=mw.title.getCurrentTitle().pageLanguage ifnotslangthen -- TODO FIXME: Introduction expected 2017-04 slang=mw.language.getContentLanguage():getCode() end end ifadvance==2then mode=2 else mode=1 end returnfiatQuote(mw.text.trim(apply),slang,mode) end-- Text.quote() Text.quoteUnquoted=function(apply,alien,advance) -- Quote text, if not yet quoted and not empty -- Parameter: -- apply -- string, with text -- alien -- string, with language code, or nil -- advance -- number, with level 1 or 2, or nil -- Returns: string; possibly quoted localr=mw.text.trim(applyandtostring(apply)or"") locals=mw.ustring.sub(r,1,1) ifs~=""andnotText.isQuote(s,advance)then s=mw.ustring.sub(r,-1,1) ifnotText.isQuote(s)then r=Text.quote(r,alien,advance) end end returnr end-- Text.quoteUnquoted() Text.removeDiacritics=function(adjust) -- Remove all diacritics -- Parameter: -- adjust -- string -- Returns: string; all latin letters should be ASCII -- or basic greek or cyrillic or symbols etc. localcleanup,decomposed localPatternCombined=mw.loadData('Module:Text/data').PatternCombined decomposed=mw.ustring.toNFD(adjustandtostring(adjust)or"") cleanup=mw.ustring.gsub(decomposed,PatternCombined,"") returnmw.ustring.toNFC(cleanup) end-- Text.removeDiacritics() Text.sentenceTerminated=function(analyse) -- Is string terminated by dot, question or exclamation mark? -- Quotation, link termination and so on granted -- Parameter: -- analyse -- string -- Returns: true, if sentence terminated localr localPatternTerminated=mw.loadData('Module:Text/data').PatternTerminated ifmw.ustring.find(analyse,PatternTerminated)then r=true else r=false end returnr end-- Text.sentenceTerminated() Text.ucfirstAll=function(adjust) -- Capitalize all words -- Arguments: -- adjust = string to adjust -- Returns: string with all first letters in upper case adjust=adjustandtostring(adjust)or"" localr=mw.text.decode(adjust,true) locali=1 localc,j,m m=(r~=adjust) r=" "..r whileido i=mw.ustring.find(r,"%W%l",i) ifithen j=i+1 c=mw.ustring.upper(mw.ustring.sub(r,j,j)) r=string.format("%s%s%s", mw.ustring.sub(r,1,i), c, mw.ustring.sub(r,i+2)) i=j end end-- while i r=r:sub(2) ifmthen r=mw.text.encode(r) end returnr end-- Text.ucfirstAll() Text.uprightNonlatin=function(adjust) -- Ensure non-italics for non-latin text parts -- One single greek letter might be granted -- Precondition: -- adjust -- string -- Returns: string with non-latin parts enclosed in <span> localr localdata=mw.loadData('Module:Text/data') localPatternLatin=data.PatternLatin localRangesLatin=data.RangesLatin localNumLatinRanges=data.NumLatinRanges ifmw.ustring.match(adjust,PatternLatin)then -- latin only, horizontal dashes, quotes r=adjust else localc localj=false localk=1 localm=false localn=mw.ustring.len(adjust) localspan="%s%s<span dir='auto' style='font-style:normal'>%s</span>" localflat=function(a) -- isLatin localrange -- NumLatinRanges has to be precomputed because # does not work from loadData fori=1,NumLatinRangesdo range=RangesLatin[i] ifa>=range[1]anda<=range[2]then returntrue end end-- for i end-- flat() localfocus=function(a) -- char is not ambivalent localr=(a>64) ifrthen r=(a<8192ora>8212) else r=(a==38ora==60)-- '&' '<' end returnr end-- focus() localform=function(a) returnstring.format(span, r, mw.ustring.sub(adjust,k,j-1), mw.ustring.sub(adjust,j,a)) end-- form() r="" fori=1,ndo c=mw.ustring.codepoint(adjust,i,i) iffocus(c)then ifflat(c)then ifjthen ifmthen ifi==mthen -- single greek letter. j=false end m=false end ifjthen localnx=i-1 locals="" forix=nx,1,-1do c=mw.ustring.sub(adjust,ix,ix) ifc==" "orc=="("then nx=nx-1 s=c..s else break-- for ix end end-- for ix r=form(nx)..s j=false k=i end end elseifnotjthen j=i ifc>=880andc<=1023then -- single greek letter? m=i+1 else m=false end end elseifmthen m=m+1 end end-- for i ifjand(notmorm<n)then r=form(n) else r=r..mw.ustring.sub(adjust,k) end end returnr end-- Text.uprightNonlatin() Text.test=function(about) localr ifabout=="quote"then data=mw.loadData('Module:Text/data') r={} r.QuoteLang=data.QuoteLang r.QuoteType=data.QuoteType end returnr end-- Text.test() -- Non Unicode-aware version of mw.text.split and mw.text.gsplit -- based on [[phab:diffusion/ELUA/browse/master/includes/Engines/LuaCommon/lualib/mw.text.lua]] -- These run up to 60 times faster than the Unicode-aware versions Text.split=function(text,pattern,plain) localret={} forminText.gsplit(text,pattern,plain)do ret[#ret+1]=m end returnret end Text.gsplit=function(text,pattern,plain) locals,l=1,string.len(text) returnfunction() ifsthen locale,n=string.find(text,pattern,s,plain) localret ifnotethen ret=string.sub(text,s) s=nil elseifn<ethen -- Empty separator! ret=string.sub(text,s,e) ife<lthen s=e+1 else s=nil end else ret=e>sandstring.sub(text,s,e-1)or'' s=n+1 end returnret end end,nil,nil end -- Export localp={} for_,funcinipairs({'containsCJK','isLatinRange','isQuote','sentenceTerminated'})do p[func]=function(frame) returnText[func](frame.args[1]or"")and"1"or"" end end for_,funcinipairs({'getPlain','removeDiacritics','ucfirstAll','uprightNonlatin'})do p[func]=function(frame) returnText[func](frame.args[1]or"") end end functionp.char(frame) localparams=frame:getParent().args localstory=params[1] localcodes,lenient,multiple ifnotstorythen params=frame.args story=params[1] end ifstorythen localitems=mw.text.split(mw.text.trim(story),"%s+") if#items>0then localj lenient=(yesNo(params.errors)==false) codes={} multiple=tonumber(params["*"]) for_,vinipairs(items)do j=tonumber((v:sub(1,1)=="x"and"0"or"")..v) table.insert(codes,jorv) end end end returnText.char(codes,multiple,lenient) end functionp.concatParams(frame) localargs localtemplate=frame.args.template iftype(template)=="string"then template=mw.text.trim(template) template=(template=="1") end iftemplatethen args=frame:getParent().args else args=frame.args end returnText.concatParams(args, frame.args.separator, frame.args.format) end functionp.listToFormat(frame) locallists={} localpformat=frame.args["format"] localsep=frame.args["sep"]or";" -- Parameter parsen: Listen fork,vinpairs(frame.args)do localknum=tonumber(k) ifknumthenlists[knum]=vend end -- Listen splitten localmaxListLen=0 fori=1,#listsdo lists[i]=mw.text.split(lists[i],sep) if#lists[i]>maxListLenthenmaxListLen=#lists[i]end end -- Ergebnisstring generieren localresult="" localresult_line="" fori=1,maxListLendo result_line=pformat forj=1,#listsdo result_line=mw.ustring.gsub(result_line,"%%s",lists[j][i],1) end result=result..result_line end returnresult end functionp.listToText(frame) localargs localtemplate=frame.args.template iftype(template)=="string"then template=mw.text.trim(template) template=(template=="1") end iftemplatethen args=frame:getParent().args else args=frame.args end returnText.listToText(args,frame.args.format) end functionp.quote(frame) localslang=frame.args[2] iftype(slang)=="string"then slang=mw.text.trim(slang) ifslang==""then slang=false end end returnText.quote(frame.args[1]or"", slang, tonumber(frame.args[3])) end functionp.quoteUnquoted(frame) localslang=frame.args[2] iftype(slang)=="string"then slang=mw.text.trim(slang) ifslang==""then slang=false end end returnText.quoteUnquoted(frame.args[1]or"", slang, tonumber(frame.args[3])) end functionp.zip(frame) locallists={} localseps={} localdefaultsep=frame.args["sep"]or"" localinnersep=frame.args["isep"]or"" localoutersep=frame.args["osep"]or"" -- Parameter parsen fork,vinpairs(frame.args)do localknum=tonumber(k) ifknumthenlists[knum]=velse ifstring.sub(k,1,3)=="sep"then localsepnum=tonumber(string.sub(k,4)) ifsepnumthenseps[sepnum]=vend end end end -- sofern keine expliziten Separatoren angegeben sind, den Standardseparator verwenden fori=1,math.max(#seps,#lists)do ifnotseps[i]thenseps[i]=defaultsepend end -- Listen splitten localmaxListLen=0 fori=1,#listsdo lists[i]=mw.text.split(lists[i],seps[i]) if#lists[i]>maxListLenthenmaxListLen=#lists[i]end end localresult="" fori=1,maxListLendo ifi~=1thenresult=result..outersepend forj=1,#listsdo ifj~=1thenresult=result..innersepend result=result..(lists[j][i]or"") end end returnresult end functionp.split(frame) localtext=frame.args.textorframe.args[1]or'' localpattern=frame.args.patternorframe.args[2]or'' localplain=yesNo(frame.args.plainorframe.args[3]) localindex=tonumber(frame.args.index)ortonumber(frame.args[4])or1 locala=Text.split(text,pattern,plain) ifindex<0thenindex=#a+index+1end returna[index] end functionp.failsafe() returnText.serial end p.Text=function() returnText end-- p.Text returnp