lua-users home
lua-l archive

Re: Parsing big compressed XML files

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Apr 4, 2014, at 12:00 AM, Valerio Schiavoni <valerio.schiavoni@gmail.com> wrote:
> 18 hours is the cumulative time for _all_ the files , not 18 hours per file :-)
Aha… makes more sense… ok, so, as of April 4th, there was 161 'pages-meta-history’ files, ranging in size from 80 MB to 31 GB…
Looking at the largest compressed file, it takes a whopping 5 hours to inflate on my consumer grade system:
$ time bzcat < enwiki-20140304-pages-meta-history16.xml-p005043453p005137507.bz2 > /dev/null
real	309m38.471s
user	305m0.095s
sys	1m55.005s
A bit overwhelming for my little setup. I hope you have a big hardware budget :D

AltStyle によって変換されたページ (->オリジナル) /