-
Notifications
You must be signed in to change notification settings - Fork 1
WP scrape
Simon Worthington edited this page Jan 19, 2026
·
3 revisions
- Bring back content missing from clean.index.wiki. A. References/Footnotes B. Image/Image Caption content
- Figure and Figure caption used on Images and Tables - find a fix in .wiki.
- Top and tail the .wiki files - There is content just not needed at start and end of files.
- Fix Headers: There are a number or processes needed to fix headers. A. Outline number B. Boxs C. Figure/Figure Caption D. Manually review completeness and order by 2x people.
- Re-order header values
- Format Footnotes and References according to MediaWiki Cite rules. This has been explored using VS Code Copilot and there are solutions.
- NEXT PHASES: MediaWiki/Wikibase import issues (This is a new phase as it is where the requirement of the system: MW/WB impact on the files)
| HTML | wikitext | MediaWiki import |
|---|---|---|
| 1. Top & tail | 6. Outline numbering | 10. Headers ToC/No ToC |
| 2. Boxes | 7. ref/foot text inline - MediaWiki Cite | 11. MediaWiki Template |
| 3. Figure/Figure caption images | 8. Header levels | |
| 4. Figure/Figure caption tables | 9. Chopping | |
| 5. Ref/Footnote anomaly |