Jump to content
Wikipedia The Free Encyclopedia

Wikipedia:Scripts/mwlink

From Wikipedia, the free encyclopedia

This Ruby program has two modes. It can run as a daemon or text processor (daemon mode is preferred, since it's more efficient).

In text-scanning mode, it interprets its command line (or stdin if no command line given) as text possibly containing [[wikilinks]]. It preserves the original text and adds a text hyperlink (the http: address contained in <> braces).

In daemon mode, it receives HTTP requests like http://localhost:4242/mwlink?page= wiki-page-name and redirects to the appropriate Wikimedia page. It's convenient for scripts to just use that URL rather than constructing one themselves--all they have to do is URL-escape the text between [[ and ]].

#!/usr/bin/ruby
# This script is dual-licensed under the GPL version 2 or any later
# version, at your option. See http://www.gnu.org/licenses/gpl.txt for more
# details.
=begin
=NAME
mwlink-Linkifymediawiki-stylewikilinksinplaintext
=SYNOPSIS
mwlink[options][text-to-wikilink]
--daemon[=port]RunasHTTPdaemon
--encodingDefaultcharactersetencoding(utf-8)
--default-wikiDefaultwiki(wikipedia)
--default-languageDefaultlanguage(en)
=DESCRIPTION
Intext-scanningmode(withoutthe--daemonargument)Themwlinkprogramscans
itsarguments(oritsstandardinput,intheeventofnoarguments)for
wikilinksoftheform[[link]].ItexpandssuchlinksintoURLsandinserts
themintotheoriginaltextafterthe[[link]]insharpbraces((({<}))and
(({>}))).Optionsareprovidedforspecifyingadefaultwiki(thewikitolink
toifnoqualifierisgiveninthelink)andadefaultlanguage(thelanguage
toassumeifnoqualifierisgiven)aswellasthecharactersetencodingin
use.Thebuilt-indefaultsare((*wikipedia*)),((*en*))and((*utf-8*)),
respectively.
Indaemonmode(nowpreferred),ItreceivesHTTPrequestsoftheform
"http://.../page=((*wikipedia page*))"(the((*wikipediapage*))nameiswhat
wouldappearwithina[[wikilink]].URL-escapingisrequiredbutnoother
processing,makingitconvenienttousefromscripts.
==InitializationFile
Thenamesofnamespacesvaryindifferentlanguages(especiallydueto
language.Forexample,"User:"inEnglishis"Benutzer:"inGerman.Youcan
specifylistsofnamespacestouseforparticularlanguagesinan
initializationfile(({~/.mwlinkrc})). This is simply a line with the
 language, a colon, and a space-separated list of namespaces in that
 language. When interpreting links for that language (either because
 ((*--default-language*)) was specified or there is a language qualifier in
 the link, mwlink will recognize it as a namespace appropriately. All the
 namespaces must appear on one line--line continuation is not supported.
 Comments (lines introduced with (({#}})) (pound sign)) are comments, and
 are ignored, along with blank lines.
 Here is an example configuration containing (only) some namespaces from the
 German Wikipedia. ((*Note*)): To be kind to the wiki when this script is
 uploaded, I have broken the line, but it ((*may not be broken*)) in order
 to work with mwlink.
 de: Spezial Spezial_diskussion Diskussion Benutzer Benutzer_diskussion
 Bild Bild_diskussion Einordnung Einordnung_diskussion Wikipedia
 Wikipedia_talk WP Hilf Hilf_diskussion
 = WARNINGS
 * The program (like mediawiki) assumes links are not broken across line
 boundaries.
 * The mechanism for providing an alternate list of namespaces only works
 per-language; other wikis could have different namespaces, too.
 * The list of wikis and their abbreviations is doubtlessly incomplete.
 * The initialization file mechanism is not that useful for a shared daemon.
 * In command-line mode, it's very difficult to process ASCII em-dashes (--)
 correctly and still honor command-line options. mwlink gets it wrong, and
 that's one reason daemon mode is preferred.
 = AUTHOR
 Demi @ Wikipedia - http://en.wikipedia.org/wiki/User:Demi
=end
require'cgi'
require'iconv'
require'getoptlong'
require'webrick'
includeWEBrick
$opt={
'default-wiki'=>'wikipedia',
'default-language'=>'en',
'encoding'=>'utf-8'
}
classString
definitcap()
new=self.dup
# Okay, I consider it dumb that a string subscripted produces an
# integer --Demi
new[0]=new[0].chr.upcase
returnnew
end
definitcap!()
self[0]=self[0].chr.upcase
returnself
end
end
classCanon
definitialize()
@ns={}
@ns_array=%w(Media Special Talk User User_talk Project Project_talk
 Image Image_talk MediaWiki MediaWiki_talk Template Template_talk Help
 Help_talk Category Category_talk Wikipedia Wikipedia_talk WP)
@ns['default']={}
@ns_array.each{|nspc|@ns['default'][nspc]=nspc}
ifFile::readable?(ENV['HOME']+'/.mwlinkrc')
IO::foreach(ENV['HOME']+'/.mwlinkrc'){|line|
nextifline=~ /^\s*\#/
nextifline=~ /^\s*$/
line.chomp!
ifm=line.match(/^(\w+)\:(.*)$/)
lang=m[1]
nslist=m[2].split
@ns[lang]={}
nslist.each{|nspc|@ns[lang][nspc]=nspc}
end
}
end
@wiki={
'Wiktionary'=>'wiktionary',
'Wikt'=>'wiktionary',
'W'=>'wikipedia',
'M'=>'meta',
'N'=>'news',
'Q'=>'quote',
'B'=>'books',
'Meta'=>'meta',
'Wikibooks'=>'books',
'Commons'=>'commmons',
'Wikisource'=>'source'
}
@wikispec={
'wikipedia'=>{'domain'=>'wikipedia.org','lang'=>1},
'wiktionary'=>{'domain'=>'wiktionary.org','lang'=>1},
'meta'=>{'domain'=>'meta.wikimedia.org','lang'=>0},
'books'=>{'domain'=>'wikibooks.org','lang'=>1},
'commons'=>{'domain'=>'commmons.wikimedia.org','lang'=>0},
'source'=>{'domain'=>'sources.wikimedia.org','lang'=>0},
'news'=>{'domain'=>'wikinews.org','lang'=>1},
}
@cs=Iconv.new("iso-8859-1",$opt['encoding'])
end
#TODO The % part of the # section of the URL should become a dot.
defurlencode(s)
CGI::escape(s).gsub(/%3[Aa]/,':').gsub(/%2[Ff]/,'/').gsub(/%23/,'#')
end
defcanonword(word)
s=word.strip.squeeze(' ').tr(' ','_').initcap
begin
@cs.iconv(s)
rescueIconv::IllegalSequence
s
end
end
defparselink(link)
l={
'namespace'=>'',
'language'=>$opt['default-language'],
'wiki'=>$opt['default-wiki'],
'title'=>''
}
terms=link.split(':')
l['title']=canonword(terms.pop)
terms.each{|term|
nextifterm.nil?orterm.empty?
t=canonword(term)
if@ns[l['language']]
then
ns=@ns[l['language']]
else
ns=@ns['default']
end
ifns.key?(t)
l['namespace']=ns[t]
elsif@wiki.key?(t)
l['wiki']=@wiki[t]
else
l['language']=t.downcase
end
}
l
end
defcanonicalize(link)
linkdesc=parselink(link.sub(/\|.*$/,''))
if@wikispec.key?(linkdesc['wiki'])
ws=@wikispec[linkdesc['wiki']]
host=ws['domain']
ifws['lang']!=0
host=linkdesc['language']+'.'+host
end
else
host=linkdesc['wiki']+'.'+'wikimedia.org'
end
uri=
iflinkdesc['namespace'].length>0
linkdesc['namespace']+':'+linkdesc['title']
else
linkdesc['title']
end
r=urlencode('http://'+host+'/wiki/'+uri)
r
end
defto_s()
"Namespace sets: "+@ns.keys.join(', ')+
"; Wikis: "+@wiki.to_a.join(', ')
end
end
deflinkexpand(c,bracketlink)
linktext=
ifm= /\[\[([^\]]+)\]\]/.match(bracketlink)
m[1]
else
bracketlink
end
bracketlink+
" <"+c.canonicalize(linktext)+">"
end
c=Canon.new()
re= /\[\[\s*[^\s\\][^\]]+\]\]/
classMwlinkServlet<HTTPServlet::AbstractServlet
definitialize(server,canonicalizer)
super(server)
@c=canonicalizer
end
defdo_GET(rq,rs)
p=CGI.parse(rq.query_string)
# Just for testing
l=@c.canonicalize(p['page'][0])
rs.status=302
rs['Location']=l
rs.body="<html><body>\n"+
"<a href=\"#{l}\">#{p['page'][0]}</a>\n"+
"</body></html>\n"
end
end
begin
GetoptLong::new(
['--default-wiki',GetoptLong::REQUIRED_ARGUMENT],
['--default-language',GetoptLong::REQUIRED_ARGUMENT],
['--encoding',GetoptLong::REQUIRED_ARGUMENT],
['--daemon',GetoptLong::OPTIONAL_ARGUMENT]
).eachdo|k,v|
k=k.sub(/^--/,'')
casek
when'default-wiki','default-language','encoding'
$opt[k]=v
when'daemon'
$opt['daemon']=true
ifv.empty?
$opt['port']=4242
else
$opt['port']=v
end
end
end
rescueGetoptLong::InvalidOption
true
end
if$opt['daemon']
port=$opt['port'].to_i
puts"Starting daemon on port #{port}"
s=HTTPServer.new(:Port=>port)
s.mount("/mwlink",MwlinkServlet,c)
trap('INT'){s.shutdown}
s.start
else
# Note, there are various combinations of -- appearing in normal text that
# will break this. --daemon is the recommended method.
ifARGV.empty?
STDIN.each_line{|line|
putsline.chomp.gsub(re){|expr|linkexpand(c,expr)}
}
else
putsARGV.join(' ').gsub(re){|expr|linkexpand(c,expr)}
end
end

Example output:

 [[Ashland (disambiguation)]] is an example of a
 [[Wikipedia:Disambiguation]] page.
 [[Ashland (disambiguation)]] <http://en.wikipedia.org/wiki/Ashland_%28disambiguation%29> is an example of a
 [[Wikipedia:Disambiguation]] <http://en.wikipedia.org/wiki/Wikipedia:Disambiguation> page.
 GET http://localhost:4242/mwlink?page=Ashland+%28disambiguation%29
 GET http://localhost:4242/mwlink?page=Ashland+%28disambiguation%29 --> 302 Found
 GET http://en.wikipedia.org/wiki/Ashland_%28disambiguation%29 --> ...(page content)

The GET program is a utility distributed with Perl's libwww. Also, note that wikimedia servers forbid scripts based on the LWP Perl module.

AltStyle によって変換されたページ (->オリジナル) /