Data dumps/xml2sql
NOTE: This is not the recommended method of importing XML dumps.
See mw:Manual:Importing XML dumps for an overview.
xml2sql is a tool to convert xml dumps which can be download at http://download.wikimedia.org/ to sqldump which can be imported with mysql, mysqlimport or psql.
This tool is written in ANSI C. To compile it, expat and zlib are required. This tool has been developed on Linux, it also works on FreeBSD, NetBSD, MacOS X, and Windows. Feel free to use it. :)
Download
[edit ]- xml2sql-0.5.tar.gz (source code) 2006年02月08日
- MD5SUM: 8a1d905636900e3ea07055dd645276f8
- SHA1SUM: ad4ccb37ccbef1a682a86e4b929b43ac0f578744
- xml2sql-0.5-win32.zip (win32 executable) 2006年02月08日
- MD5SUM: 9665424dc6d6f5abf6241298e727a5a3
- SHA1SUM: 403bc96a1f679259bcd904f7c9c9bae92252a266
- GitHub: mediawiki-xml2sql
patch for recent versions of mw (>=1.10)
[edit ]because the revision table contains two new datasets since 1.10 (rev_len, rev_parent_id) the xml slightly changed. apply this patch to make it work again:
--- xml2sql-0.5/xml2sql.c 2008年01月16日 15:32:28.000000000 +0100 +++ xml2sql-0.5 (2)/xml2sql.c 2008年02月17日 15:06:34.000000000 +0100 @@ -741,6 +741,10 @@ putcolumnf(&rev_tbl, "%d", revision.minor); /* rev_deleted */ putcolumn(&rev_tbl, "0", 0); + + putcolumn(&rev_tbl, "NULL", 0); + putcolumn(&rev_tbl, "NULL", 0); + finrecord(&rev_tbl); if(page.lastts == 0 || strcmp(page.lastts, revision.timestamp) < 0) {
Install
[edit ]*nix, MacOS
[edit ]The source package contains standard `configure' script. Just expand the package and make. (On *BSD, you may add --with-expat=/usr/local option to configure.)
(you need on debian/etch : gcc, libc6-dev, expat, libexpat1-dev)
$ ./configure $ make # makeinstall
Windows
[edit ]Win32 executable is now available. Download it and unzip.
Easy to use
[edit ]$ wget http://download.wikimedia.org/enwiki/20080103/enwiki-20080103-pages-meta-current.xml.bz2 $ bunzip2 -c pages-meta-current.xml.bz2 | xml2sql $ mysqlimport -u root -p --local dbname `pwd`/{page,revision,text}.txt
Note: This last line might not work. The database needs to be initialized with the correct tables. The way to do this is to install the Mediawiki software before doing the import.
Windows
[edit ]The GUI frontend can decompress gzip, bzip2 and 7-zip archive. Run xml2sql-fe.exe, choose XML file, choose option, optionally choose output directory, and then press "START!!" button.
Reference
[edit ]- usage: xml2sql [options]... [XMLFILE]
Input MediaWiki XML dumpfile from XMLFILE (or standard input), output SQL dump for MediaWiki 1.5 or later.
Options
[edit ]-i, --import | mysqlimport format. (default) Output filenames are page.txt, revision.txt, and text.txt. You can use mysqlimport program to import this format. |
---|---|
-m, --mysql | MySQL's INSERT format. Output filenames are page.sql, revision.sql, and text.sql. You can use mysql program to import this format. |
-p, --postgresql[=version] | PostgreSQL's COPY format. Output filenames are page.sql, revision.sql, and text.sql. If the version is omitted, 8.0 and earlier is assumed. You can use psql program to import this format. |
-c, --compress[={old,full}] | Compress text table with deflate. (default: old) When output format is postgresql, this option is ignored because PostgreSQL will compress table data itself. |
-r, --renumber | Renumber page id and revision id. |
-N, --namespace=ns,ns,... | Output only specific namespaces. Namespaces can be specified by both namespace number and namespace name. |
-t, --no-text | Exclude text table |
-o, --output-dir=OUTDIR | Specifies output directory (default: current directory) |
-t, --tmpdir=TMPDIR | Specifies temporary directory (default: OUTDIR) Temporary file is used only if --compress=old. |
-v, --verbose | Show progress |
-h, --help | Display help and exit |
--version | Display version information and exit |
COPYRIGHT
[edit ]xml2sql, MediaWiki XML to SQL converter.
Copyright © Tietew.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
See also
[edit ]- Data dumps - database dump download and import.