Jump to content
Wikipedia The Free Encyclopedia

Module:Lang/data/iana languages/make

From Wikipedia, the free encyclopedia
This is the current revision of this page, as edited by Trappist the monk (talk | contribs) at 14:55, 10 July 2024 (fix module names;). The present address (URL) is a permanent link to this version.Revision as of 14:55, 10 July 2024 by Trappist the monk (talk | contribs) (fix module names;)
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)
Module documentation[view] [edit] [history] [purge]

This is a crude tool that reads a local copy of an IANA language-subtag-registry file and extracts the information necessary to create the data tables held by:

The tool skips records that contain the words: 'Deprecated', 'Preferred-Value', and 'Private use'.

At this writing, the tool extracts only the subtag code and description(s) from language, script, region, and variant records.

Usage

[edit ]

To use this tool:

  1. Open a blank sandbox page and paste the following at the top:
    {{#invoke:Language/data/iana languages/make|iana_extract}}
  2. Go to the current language-subtag-registry file (or any of the files held by archive.org). Copy the whole (or just as much as you need) and paste it into the sandbox page below the {{#invoke:}}.
  3. Click Show preview
  4. Wait
  5. Copy result

There is some crude error checking that will insert an error message in the output. No guarantees that such messaging will be helpful. Search for the word 'error' in the tool's output.

The above documentation is transcluded from Module:Lang/data/iana languages/make/doc. (edit | history)
Editors can experiment in this module's sandbox (create | mirror) and testcases (create) pages.
Subpages of this module.

 require('strict');


 --[=[------------------------< G E T _ V A R I A N T _ P A R T S >---------------------------------------------

 We get a record that looks more-or-less like this:
 	%%\n
 	Type: variant\n
 	Subtag: bohoric\n
 	Description: Slovene in Bohorič alphabet\n
 	Added: 2012年06月27日\n
 	Prefix: sl\n

 Each line is terminated with a \n character.

 Type, for this function can only be 'variant'

 Subtag is the code of Type

 Prefix is a language code to which this variant applies; one language code per Prefix line. There can be
 more than one prefix line.

 Description associates Subtag with a proper name or names; one name per Description line. There can be more
 than one Description line and Description lines can wrap to the next line. When they do, the first two
 characters of the continuation line are spaces.

 Comments: lines can also be continued so once in a Comments line (which is otherwise ignored) all further
 continuations in the record are also ignored. This is a crude mechanism to prevent comment continuations
 from being concatenated onto the end of descriptions and relies on Description line occuring in the record
 before the Comments line.

 Records with private use subtags are ignored.

 ]=]

 localfunctionget_variant_parts(record)
 localcode;
 localdescriptions={};
 localprefixes={};
 localin_comments=false;

 ifstring.find(record,'Deprecated',1,true)orstring.find(record,'Preferred-Value',1,true)
 orstring.find(record,'Private use',1,true)then
 return'skip';
 end

 forlineinstring.gmatch(record,'([^\n]+)\n')do-- get a \n terminated line of text (without the \n)
 locallabel=string.match(line,"(.-):")

 ifnotlabelandstring.find(line,'^ .+')andnotin_commentsthen-- if a continuation line but not a comments continuation
 descriptions[#descriptions]=string.gsub(descriptions[#descriptions],'\"$','');-- remove trailing quote mark from previous description
 descriptions[#descriptions]=descriptions[#descriptions]..' '..string.match(line,'^ (.+)')..'\"';-- extract and save the continuation with new quote mark
 elseiflabel=='Subtag'then-- if this line is the subtag line
 code=string.match(line,'Subtag: (%w+)');-- extract and save to subtag's code
 elseiflabel=='Description'then-- if this line is a description line
 localdesc=string.match(line,'Description: (.+)');-- extract the description
 desc=string.gsub(desc,'"','\\"');-- in case description contains quote marks (see 1959acad)
 table.insert(descriptions,'\"'..desc..'\"');-- save the description wrapped in quote marks
 elseiflabel=='Prefix'then-- if this line is a prefix line
 table.insert(prefixes,'\"'..string.match(line,'Prefix: (.+)'):lower()..'\"');-- extract and save the prefix wrapped in quote marks
 elseiflabel=='Comments'then-- if this line is a comments line
 in_comments=true;
 end
 end

 returncode,table.concat(prefixes,', '),table.concat(descriptions,', ');
 end


 --[=[------------------------< G E T _ L A N G _ S C R I P T _ R E G I O N _ P A R T S >-----------------------

 We get a record that looks more-or-less like this:
 	%%\n
 	Type: language\n
 	Subtag: aa\n
 	Description: Afar\n
 	Added: 2005年10月16日\n


 Each line is terminated with a \n character.

 Type, for our purposes can be 'language', 'script', or 'region'

 Subtag is the code of Type

 Description associates Subtag with a proper name or names; one name per Description line. There can be more
 than one Description line and Description lines can wrap to the next line. When they do, the first two
 characters of the continuation line are spaces.

 Comments: lines can also be continued so once in a Comments line (which is otherwise ignored) all further
 continuations in the record are also ignored. This is a crude mechanism to prevent comment continuations
 from being concatenated onto the end of descriptions and relies on Description line occuring in the record
 before the Comments line.

 Records with private use subtags are ignored.

 ]=]

 localfunctionget_lang_script_region_parts(record)
 localcode;
 localsuppress;-- Suppress script for this code if specified
 localdeprecated;-- boolean; true when subtag is deprecated
 localdescriptions={};
 localin_comments=false;

 ifrecord:find('Private use')then
 return'skip';
 end

 forlineinrecord:gmatch('([^\n]+)\n')do-- get a \n terminated line of text (without the \n)
 locallabel=line:match('(.-):');
 if'Subtag'==labelthen-- if this line is the subtag line
 code=line:match('Subtag: (%w+)');-- extract and save to subtag's code
 elseif'Description'==labelthen-- if this line is a description line
 table.insert(descriptions,'\"'..line:match('Description: (.+)')..'\"');-- extract and save the name wrapped in quote marks
 elseif'Deprecated'==labelthen
 deprecated=true;-- subtag is deprecated; set our flag
 elseif'Suppress-Script'==labelthen
 suppress=line:match('Suppress%-Script: (%S+)');
 elseif'Comments'==labelthen-- if this line is a comments line
 in_comments=true;
 elseifline:find('^ .+')andnotin_commentsthen-- if a continuation line but not a commnets continuation
 descriptions[#descriptions]=descriptions[#descriptions]:gsub('\"$','');-- remove trailing quote mark from previous description
 descriptions[#descriptions]=descriptions[#descriptions]..' '..line:match('^ (.+)')..'\"';-- extract and save the continuation with new quote mark
 end
 end

 returncode,table.concat(descriptions,', '),suppress,deprecated;
 end


 --[=[------------------------< I A N A _ E X T R A C T >-------------------------------------------------------

 read a local copy of the IANA language-subtag-registry file and from it build tables to replace the tables in:
 	[[Module:Lang/data/iana languages]]
 	[[Module:Lang/data/iana regions]]
 	[[Module:Lang/data/iana scripts]]
 	[[Module:Lang/data/iana supressed cripts]]
 	[[Module:Lang/data/iana variants]]

 current language-subtag-registry file can be found at: http://www.iana.org/assignments/language-subtag-registry
 archive.org has copies of previous versions see: https://web.archive.org/web/*/http://www.iana.org/assignments/language-subtag-registry

 ]=]

 localfunctioniana_extract(frame)
 localpage=mw.title.getCurrentTitle();-- get a page object for this page
 localcontent=page:getContent();-- get unparsed content
 locallang_table={};-- languages go here
 locallang_dep_table={};-- deprecated languages go here
 localscript_table={};-- scripts go here
 localregion_table={};-- regions go here
 localvariant_table={};-- variants go here
 localsuppress_table={};-- here we collect suppressed scripts and associated language codes
 localiso_639_1_table={};-- ISO 639-1 languages; not used by Module:Lang but included here to ensure Module:Lang/data/ISO_639-1 gets updated
 localfile_date;-- first line

 localcode;
 localdescriptions;
 localprefixes;-- used for language variants only
 localsuppress;-- a code's suppress script
 localdeprecated;-- boolean: true when subtag is deprecated

 file_date=content:match('(File%-Date: %d%d%d%d%-%d%d%-%d%d)');-- get the file date line from this version of the source file

 forrecordinstring.gmatch(content,'%%%%([^%%]+)')do-- get a %% delimited 'record' from the file; leave off the delimiters
 localrecord_type=string.match(record,'Type: (%w+)')
 ifrecord_type=='language'then-- if a language record
 code,descriptions,suppress,deprecated=get_lang_script_region_parts(record);-- get the code, description(s), suppress script, and deprecated flag

 ifcodeand('skip'~=code)then
 ifdeprecatedthen
 table.insert(lang_dep_table,"[\""..code.."\"] = {"..descriptions.."}");-- make table entries
 else
 table.insert(lang_table,"[\""..code.."\"] = {"..descriptions.."}");-- make table entries
 if2==code:len()then
 table.insert(iso_639_1_table,"[\""..code.."\"] = {"..descriptions.."}");-- make table entries
 end
 end
 elseifnotcodethen
 table.insert(lang_table,"[\"error\"] = {"..record.."}");-- code should never be nil, but inserting an error entry in the final output can be helpful
 end
 -- here we collect suppress stript tags and their associated language codes;
 -- prettigying the data in this table must wait until all language codes have been read
 ifsuppressthen-- if this code has a suppressed script
 localsuppressed_code=table.concat({'\"',code,'\"'});-- wrap the code in quotes

 ifsuppress_table[suppress]then-- if there is an entry for this script
 table.insert(suppress_table[suppress],suppressed_code);-- insert the new code
 else
 suppress_table[suppress]={};-- add new script and empty table
 table.insert(suppress_table[suppress],suppressed_code);-- insert the new code
 end
 end

 elseifrecord_type=='script'then-- if a script record
 code,descriptions=get_lang_script_region_parts(record);-- get the code and description(s)

 ifcodeand('skip'~=code)then
 table.insert(script_table,"[\""..code.."\"] = {"..descriptions.."}");-- make table entries
 elseifnotcodethen
 table.insert(script_table,"[\"error\"] = {"..record.."}");-- code should never be nil, but ...
 end

 elseifrecord_type=='region'then-- if a region record
 code,descriptions=get_lang_script_region_parts(record);-- get the code and description(s)

 ifcodeand('skip'~=code)then
 table.insert(region_table,"[\""..code.."\"] = {"..descriptions.."}");-- make table entries
 elseifnotcodethen
 table.insert(region_table,"[\"error\"] = {"..record.."}");-- code should never be nil, but ...
 end

 elseifrecord_type=='variant'then-- if a variant record
 code,prefixes,descriptions=get_variant_parts(record);-- get the code, prefix(es), and description(s)

 ifcodeand('skip'~=code)then
 table.insert(variant_table,
 table.concat({
 "[\"",
 code,
 "\"] = {<br />&#9;&#9;[\"descriptions\"] = {",
 descriptions,
 "},<br />&#9;&#9;[\"prefixes\"] = {",
 prefixes,
 "},<br />&#9;&#9;}"
 })
 );
 elseifnotcodethen
 table.insert(variant_table,"[\"error\"] = {"..record.."}");-- code should never be nil, but ...
 end
 end
 end
 -- now prettify the supressed script table
 localpretty_suppressed={};

 forscript,code_tblinpairs(suppress_table)do
 localLIMIT=11;-- max number of subtags on a line before a line break
 localfragment_tbl={};-- groups of LIMIT number of subtags collected here

 fori=1,#code_tbl,LIMITdo
 localstop=((i+LIMIT-1)>#code_tbl)and#code_tblori+LIMIT-1;-- calculate a table.concat stop position
 table.insert(fragment_tbl,table.concat(code_tbl,', ',i,stop));-- get the fragment and save it
 end

 table.insert(pretty_suppressed,-- and make all pretty
 table.concat({'[\"',script,'\"] = {',table.concat(fragment_tbl,',\n\t\t\t\t'),'}'})
 );
 end
 table.sort(pretty_suppressed);

 -- make final output pretty
 return'<br /><pre>------------------------------< I A N A L A N G U A G E S >--------------------------------------------------<br />--'..
 file_date.."<br />local active = {<br />&#9;"..table.concat(lang_table,',<br />&#9;').."<br />&#9;}<br /><br />"..
 "local deprecated = {<br />&#9;"..table.concat(lang_dep_table,',<br />&#9;').."<br />&#9;}<br /><br />"..
 "return {<br />&#9;active = active,<br />&#9;deprecated = deprecated,<br />&#9;}<br /><br />"..
 '------------------------------< I A N A S C R I P T S >------------------------------------------------------<br />--'..
 file_date.."<br />return {<br />&#9;"..table.concat(script_table,',<br />&#9;').."<br />&#9;}<br /><br />"..
 '------------------------------< I A N A R E G I O N S >------------------------------------------------------<br />--'..
 file_date.."<br />return {<br />&#9;"..table.concat(region_table,',<br />&#9;').."<br />&#9;}<br /><br />"..
 '------------------------------< I A N A V A R I A N T S >----------------------------------------------------<br />--'..
 file_date.."<br />return {<br />&#9;"..table.concat(variant_table,',<br />&#9;').."<br />&#9;}<br /><br />"..
 '------------------------------< I A N A S U P P R E S S E D S C R I P T S >--------------------------------<br />--'..
 file_date.."<br />return {<br />&#9;"..table.concat(pretty_suppressed,',<br />&#9;').."<br />&#9;}<br /><br />"..
 '------------------------------< I S O 6 3 9 - 1 >------------------------------------------------------------<br />--'..
 file_date.."<br />return {<br />&#9;"..table.concat(iso_639_1_table,',<br />&#9;').."<br />&#9;}<br /><br />".."</pre>";
 end


 --[[--------------------------< E X P O R T E D F U N C T I O N >--------------------------------------------
 ]]

 return{
 iana_extract=iana_extract,
 }

AltStyle によって変換されたページ (->オリジナル) /