[Wikireader] Error on processing the German Wikipedia
David Reyes Samblas Martinez
david at tuxbrain.com
Fri Nov 20 15:45:38 CET 2009
Don't hold your breath :( failing at Count: 832000
David Reyes Samblas Martinez
http://www.tuxbrain.com
Open ultraportable & embedded solutions
Openmoko, Openpandora, Arduino
Hey, watch out!!! There's a linux in your pocket!!!
2009年11月20日 Tilman Baumann <tilman at baumann.name>:
>> David Reyes Samblas Martinez wrote:
>> Well spanish one give me the same error before but now it works,
> Any idea what solved it? Or is it just random and will go away if I try it
> again? :)
>>> I'm parsing the de wikipedia right now (Count: 173000) lets see whats
>> happens :)
>> I would definitely be interessted in the results...
>>> Note:Parsing the 2009-Nov-11
>> http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2
>>>> Regards
>>>> David Reyes Samblas Martinez
>> http://www.tuxbrain.com
>> Open ultraportable & embedded solutions
>> Openmoko, Openpandora, Arduino
>> Hey, watch out!!! There's a linux in your pocket!!!
>>>>>>>>>> 2009年11月20日 Tilman Baumann <tilman at baumann.name>:
>>> Can you reproduce this with a neutral locale?
>>> export LC_ALL=C
>>>>>> I'm at the moment trying the same. I had a lot of hickups, caused by
>>> many
>>> things. Among them missing tools and not enough memory.
>>>>>> This is currently where I'm stuck with the German wikipedia.
>>>>>> Count: 823000
>>> Count: 824000
>>> Count: 825000
>>> Count: 826000
>>> Count: 827000
>>> Count: 828000
>>> Count: 829000
>>> Count: 830000
>>> Count: 831000
>>> Count: 832000
>>> Count: 833000
>>> Traceback (most recent call last):
>>> File "./ArticleParser.py", line 203, in <module>
>>> main()
>>> File "./ArticleParser.py", line 168, in main
>>> process_article_text(title.encode('utf-8'), f.read(length), newf)
>>> File "./ArticleParser.py", line 197, in process_article_text
>>> newf.write(text + '\n')
>>> IOError: [Errno 32] Broken pipe
>>> make[1]: *** [parse] Error 1
>>> make[1]: Leaving directory
>>> `/home/tilli/wikireader/host-tools/offline-renderer'
>>> make: *** [parse] Error 2
>>>>>> I suppose it failed somewhere in PARSER_COMMAND
>>>>>>>>> Before that, the following steps went through without fail.
>>> make
>>> make DESTDIR=image WORKDIR=work
>>> XML_FILES=dewiki-20091028-pages-articles.xml index
>>>>>>>>> David Reyes Samblas Martinez wrote:
>>>> After the "success" of the spanish wikipedia pending to resolve the
>>>> indexing part, I was starting to work on the german wikipedia
>>>> http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2
>>>>>>>> but it fails at first step with the following error
>>>>>>>> #make DESTDIR=image WORKDIR=work
>>>> XML_FILES=dewiki-latest-pages-meta-current.xml index parse render
>>>> combine
>>>>>>>> awk: línea ord.:1: fatal: no se puede abrir el fichero
>>>> `work/counts.text' para lectura (No existe el fichero ó directorio)
>>>> cd host-tools/offline-renderer && make index \
>>>>>>>> XML_FILES="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml"
>>>> RENDER_BLOCK="0" \
>>>>>>>> WORKDIR="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work"
>>>> DESTDIR="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image"
>>>> make[1]: se ingresa al directorio
>>>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
>>>> ./ArticleIndex.py \
>>>>>>>> --article-index="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/articles.db"
>>>> \
>>>>>>>> --article-offsets="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/offsets.db"
>>>> \
>>>>>>>> --article-counts="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/counts.text"
>>>> \
>>>>>>>> --prefix="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image/pedia"
>>>> /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
>>>> Traceback (most recent call last):
>>>> File "./ArticleIndex.py", line 611, in <module>
>>>> main()
>>>> File "./ArticleIndex.py", line 172, in main
>>>> limit = processor.process(f, limit)
>>>> File
>>>> "/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer/FileScanner.py",
>>>> line 141, in process
>>>> if '#' == body[0] and 'redirect' == body[1:9].lower():
>>>> IndexError: string index out of range
>>>> Flushing databases
>>>> Writing: files
>>>> Time: 0s
>>>> Writing: articles
>>>> Time: 0s
>>>> Writing: offsets
>>>> Time: 0s
>>>> Loading: articles
>>>> Time: 0s
>>>> Loading: offsets and files
>>>> Time: 0s
>>>> make[1]: *** [index] Error 1
>>>> make[1]: se sale del directorio
>>>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
>>>> make: *** [index] Error 2
>>>>>>>> Regards
>>>>>>>> David Reyes Samblas Martinez
>>>> http://www.tuxbrain.com
>>>> Open ultraportable & embedded solutions
>>>> Openmoko, Openpandora, Arduino
>>>> Hey, watch out!!! There's a linux in your pocket!!!
>>>>>>>> _______________________________________________
>>>> Openmoko community mailing list
>>>> community at lists.openmoko.org
>>>> http://lists.openmoko.org/mailman/listinfo/community
>>>>>>>>>>>>> --
>>>>>>>>>>>> _______________________________________________
>>> Openmoko community mailing list
>>> community at lists.openmoko.org
>>> http://lists.openmoko.org/mailman/listinfo/community
>>>>>>> _______________________________________________
>> Openmoko community mailing list
>> community at lists.openmoko.org
>> http://lists.openmoko.org/mailman/listinfo/community
>>>>> --
>>>> _______________________________________________
> Openmoko community mailing list
> community at lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community
>
More information about the community
mailing list