Jump to content
Wikipedia The Free Encyclopedia

Module:Unicode convert

From Wikipedia, the free encyclopedia
Module documentation[view] [edit] [history] [purge]
This module is rated as ready for general use. It has reached a mature state, is considered relatively stable and bug-free, and may be used wherever appropriate. It can be mentioned on help pages and other Wikipedia resources as an option for new users. To minimise server load and avoid disruptive output, improvements should be developed through sandbox testing rather than repeated trial-and-error editing.
Page semi-protected Editing of this module by new or unregistered users is currently disabled.
See the protection policy and protection log for more details. If you cannot edit this module and you wish to make a change, you can submit an edit request , discuss changes on the talk page, request unprotection, log in, or create an account.

Usage

Converts Unicode character codes, always given in hexadecimal, to their UTF-8 or UTF-16 representation in upper-case hex or decimal. Can also reverse this for UTF-8. The UTF-16 form will accept and pass through unpaired surrogates e.g. {{#invoke:Unicode convert|getUTF8|D835}} → D835. The reverse function fromUTF8 accepts multiple characters, and can have both input and output set to decimal.

When using from another module, you may call these functions as e.g. unicodeConvert.getUTF8{ args = {'1F345'} }, without a proper frame object.

To find the character code of a given symbol (in decimal), use e.g. {{#invoke:ustring|codepoint|\🐱}} → 128049.

Code Output
{{#invoke:Unicode convert|getUTF8|1F345}} F0 9F 8D 85
{{#invoke:Unicode convert|getUTF8|1F345|base=dec}} 240 159 141 133
{{#invoke:Unicode convert|fromUTF8|F0 9F 8D 85}} 1F345
{{#invoke:Unicode convert|fromUTF8|240 159 141 133|base=dec|basein=dec}} 127813
{{#invoke:Unicode convert|getUTF16|1F345}} D83C DF45
{{#invoke:Unicode convert|getUTF16|1F345|base=dec}} 55356 57157

See also

The above documentation is transcluded from Module:Unicode convert/doc. (edit | history)
Editors can experiment in this module's sandbox (create | mirror) and testcases (create) pages.
Subpages of this module.

 localp={}

 -- NOTE: all these functions use frame solely for its args member.
 -- Modules using them may therefore call them with a fake frame table
 -- containing only args.

 p.getUTF8=function(frame)
 localch=mw.ustring.char(tonumber(frame.args[1]or'0',16)or0)
 localbytes={mw.ustring.byte(ch,1,-1)}
 localformat=({
 ['10']='%d',
 dec='%d'
 })[frame.args['base']]or'%02X'
 fori=1,#bytesdo
 bytes[i]=format:format(bytes[i])
 end
 returntable.concat(bytes,' ')
 end

 p.getUTF16=function(frame)
 localcodepoint=tonumber(frame.args[1]or'0',16)or0
 localformat=({-- TODO reduce the number of options.
 ['10']='%d',
 dec='%d'
 })[frame.args['base']]or'%04X'
 ifcodepoint<=0xFFFFthen-- NB this also returns lone surrogate characters
 returnformat:format(codepoint)
 elseifcodepoint>0x10FFFFthen-- There are no codepoints above this
 return''
 end
 codepoint=codepoint-0x10000
 bit32=require('bit32')
 return(format..' '..format):format(
 bit32.rshift(codepoint,10)+0xD800,
 bit32.band(codepoint,0x3FF)+0xDC00)
 end

 p.fromUTF8=function(frame)
 localbasein=frame.args['basein']=='dec'and10or16
 localformat=frame.args['base']=='dec'and'%d 'or'%02X '
 localbytes={}
 forbyteinmw.text.gsplit(frame.args[1],'%s')do
 table.insert(bytes,tonumber(byte,basein))
 end
 localchars={mw.ustring.codepoint(string.char(unpack(bytes)),1,-1)}
 returnformat:rep(#chars):sub(1,-2):format(unpack(chars))
 end

 returnp

AltStyle によって変換されたページ (->オリジナル) /