Jump to content
Wikipedia The Free Encyclopedia

User:Giantflightlessbirds/Wikisource 2025

From Wikipedia, the free encyclopedia

World Wikisource Conference • 14–16 February 2025 • Denpasar, Indonesia

[edit ]

Following the example of User:Ainali, this is my Wikisource 2025 experience. Please add notes if we discussed something or if I have misremembered or misrepresented a session. I've tagged things I need to follow up back in NZ with ▶ and things that hit home with 🎯.

I attended this conference, the first Wikisource meetup since , as one of the sixty-odd scholarships amongst the hundred-odd attendees. I was very pleased that my conference experience was Post-It free, especially since a Post-It would probably curl and drop off in the tropical conditions. So it can be done.

Getting to the conference

[edit ]
Wikimedia meetup at the State Library Victoria, Melbourne, Australia

My flight to Indonesia had a long layover in Melbourne, so I contacted Wikimedia Australia about a lunchtime meetup. They kindly organised a meetup event which over a dozen people attended at the State Library Victoria. This coincided with Kerrie Burns's and Ellie Watts's beginning their joint Wikipedian in Residence stint at the library, so with Amanda Lawrence we discussed tips, problems, and different different strategies around being a Wikimedian in Residence.

After 20 hours of travel I arrived in Indonesia. I had requested a flight to Bali several days before the conference to help recover from jet lag (Bali is five hours behind New Zealand; at a previous conference in Bali I was so tired I ended up sleeping through an entire morning's sessions). I spent four days working in the mountain village of Munduk, which I incidentally photographed for Commons.

Thursday meetup

[edit ]
  • Met Voytek (User:Draco flavus) who shared a tool for overcoming 100 MB upload limits.
  • Checked into hotel and met early attendees.

A persistant problem was getting people's names. The programme and list of participants used usernames only which made it tricky to connect these with presenters' actual names (the only thing scribbled on name tags) and any contact details. This is a real problem with Wikimedia conferences, which assume complete anonymity and have no options for making names and email addresses available even if participants want to share them. I handed out a lot of business cards.

Friday

[edit ]

Nicolas and Satdeep: the state of Wikisource

[edit ]
  • There are now six million texts, with Polish and English the largest corpus. Challenges are annotation, integrating Wikidata better, lack of indexing by Google, and whether to form a Wikisource Hub to help make tools consistently available across different languages.
  • Presentation by a guest researcher known as Prof Mumu about the state of Indonesian manuscripts: 40% of the 31,000 known are in Bali. Sadly for a conference about reading, his slides were illegible. I asked a question raised by this digitisation: what was the copyright status of the manuscripts (supposedly all public domain, since most of the lontara are 19th century) and were the owners, who treat these as revered objects, aware of how they could be reused once they were in very-profane Commons? (This was apparently "the responsibility of the researcher to explain".)
  • A presentation on the creation of a Balinese keyboard. Not terribly relevant to most of the attendees, and in line with what I've observed in many conferences that invited guest speakers are usually the worst presentations.

Conversation with Nitesh from Nepal (which has over 100 languages!), who was also concerned about sacred texts going into Commons. We had a good talk about language revitalisation in New Zealand through kohanga reo, and the difference between this and Esperanto and Klingon language fan communities. 🎯

Met with Jacqueline Chen, ESEAP program officer with WMF, about a rapid grant we'd asked (twice) for advice on. She later gave a workshop on grant applications, which was very useful and reiterated that we should get feedback from program officers before submitting.

Digitising through Competitions • Rahma from Wikimedia Indonesia

[edit ]
  • "Wikisource Competion" (Kompetisi Wikisource) since 2020. Twelve competitions in 4 Indonesian languages; the previous 2012 winner organised 2020. Prizes: two travel scholarships to Jakarta, 10 million Rp budget (NZ1000ドル), laptop first prize, tablet second. 77 participated. Quality patrol by committee.
  • 29 members of ID Wikisource community, weekly proofreads, 2 online meetings, 3 workshops, competitions, on Instagram. They need a transclusion skills workshop! Most can only proofread, including the presenter! 🎯
  • ▶ How do they recruit and train? How long does it run? How do you determine winners and award points? Do people cheat, or try to game the system? How much work is it to set up (including scanning and loading to Wikisource) and run? What is the retention rate of participants?

A good chat over lunch about the problems of attending Wikimania in a host country where it's illegal to be gay (like Kenya); had never occurred to me. 🎯

My talk • The Wikisource to Public Libraries pipeline

[edit ]

Went well. Andy Mabbett asked about the step-by-step guide for manually uploading to Overdrive/Libby (which I really must put together with my former boss at Westland District Library, as she has upload privileges). I challenged people to approach their local libraries and engage about replicating this (except in France, where of course there's a single State-run library system).

Bangla Wikisource • Bodhisattva • West Bengal

[edit ]
  • Developed their own open source OCR4Wikisource tool, Google-based.
  • Putting Wikimedian in Residences in libraries just to do scanning; WMUK paid for scanning of Bangla works in the British Library.
    Bookeye 4 commercial book scanner, Sörmlands Museum, Nyköping, Sweden

Conversation with Emre and other Turkish delegate about the book scanning process: walked them through manual scanning, flatbed book scanners, and dedicated book scanners like the Bookeye 4 I saw in Sweden.

Saturday

[edit ]

Early in the morning wrote up my idea for a lightning talk, "Wikisource as a Gateway Drug", an idea which popped up in yesterday's presentation.

Training Wikisource • Nanteza Divine Gabriella • Uganda

[edit ]
  • With Alice Kibombo, trained a dozen university students, 2 from the community and 4 external from the National Library helpers.
  • 101: Basics of editing, 102: Transferring texts from Commons (Sam Wilson helped with this part)
  • Worked collaboratively in pairs, proofreading and validating. Weekly tasks, supposed to be for 6 weeks but went for 3 months. Translated ▶ proofreading guide to Lugandam but Luganda Wikisource is not launched yet.
  • Transcribed 7 works, all in English relating to Uganda. Improved their digital literacy, as only 60% were comfortable editing on a laptop at the start. 70% were still editing WIkisource after the course! 🎯(That's a great retention rate, but check stats).
  • Students had course demands, and the Internet Archive hack happened in the middle of this, which halted importing (lesson: get all files ahead of time)

Persian Wikisource • Darafsh

[edit ]

Darafsh came to my Wikisource workshop in Uruguay.

  • Persian: 120 million speakers (10 million abroad), only 20 active Wikisource users, 28,000 texts. Ayatollahs disapprove, can't get WMF funding.
  • Two online training sessions via Google Meet, one-to-one teaching, collaborative editing and small-group projects outside Iran.
  • University of Hamburg has 3000 Persian manuscripts, not digitised. I suggested to Darafsh that he position himself as the Persian Wikisourceror At Large, and work his way through every museum and university collection scanning, transcribing, and running events with expatriates.

Me • Wikisource as a Gateway Drug

[edit ]
  • Space opened up for lightning talks so I quickly ran through my slides explaining why Wikisource is better for beginners than the standard Wikipedia edit-a-thon onboarding experience, and suggested we promote the platform for this. Slides.

Andy Mabbett • Annotate QID

[edit ]
  • Andy showed us his tool AQID that adds a Wikidata description popup annotation to Wikisource text, with the form:

{{AQID|Q123456789|Text}}

  • This has great potential and can be ignored completely on export. However it does rely on actually-useful Wikidata descriptions, not the bare bones disambiguators like "species of plant" or "river in New Zealand"

Wikisource Translation • Alberto from Brazil

[edit ]
  • Portuguese has an "In Translation" space in Wikisource; Alberto demonstrate mirror translation mode.

Wikidata and Wikisource • Bodhi

[edit ]
  • Wikisource is not a digital library: it's a transcription and prooreading platform. The descriptive metadata is usually unstructured (in Commons book template, Wikisource Index pages). Better: to store book data in a Q ID and fetch it everywhere; avoids redundancy, better search, better visualisations, regular updates.
  • The ontology is the FRBR book model (see Wikidata Books): WORK (written work in our case) → EXPRESSION/MANIFESTATION (which we collapse as ms drafts, editions, or translations) → ITEM (an individual book). He used a Drake meme to drive this home.
  • Properties to remember: Edition number, Document file on Wikimedia Commons, Wikisource index pager URL
  • Wikipedia is about Works, Wikidata is about Editions
  • Wikidata periodicals have newspaper, part of the series (volume), part of (issue), published in (article) (so every newspaper story is a Wikidata item? Check)
  • So we can use {{book|wikidata = Q123456789}} to populate Commons.
  • English Index and main namespace just enter QID at first field. Author info from Wikidata too? Same with Publisher, Anniversary pages (check these)
  • Annotation connecting through Wikidata: {{wdl}}
  • Tools: Luthor (adds example sentences to Wikidata lexemes), Kartographer for maps, Inventaire. (Don't use Listeria.)

Wiktionary • Noé

[edit ]
  • 79 Wikisources, 131 Wiktionaries. One of 13 "sister projects"; he didn't accept the centrality of Wikidata.
  • Using Wiktionary definitions in Wiksource? Hovercards displaying definitions in EN and FR.
  • Tools: Lingua Libre (FR) – a web app to record sounds. SpellWiki – mobile app to record pronunciation (Tamil?). Discothèque – side-by-side Wikisource text examples vs trditional dictionary.

Google • Cassie Chan • APAC Search PArtnerships Manager

[edit ]
  • I got brownie points for calling out "AltaVista" as a pre-Google search engine.
  • Cassie was mostly showing off cool Google tools, like Google Lens, and Art Selfie. She kept referring to Wikipedia as "Wiki".
  • She quizzed us with an image question, supposedly "Mozart's baton". I said false, pointing out conductors didn't much use batons until the 19th century. She said yes, false, but because Mozart was a composer not a conductor. I objected: composers often conducted their own works; had she not seen Amadeus?
  • This was a bit petty, but I was annoyed she wasted our time not addressing the elephant in the room: why does Google bury Wikisource results? Even Project Gutenberg is often on page 1. Not answered!

The RATs finally arrived and I tested negative. But was feeling crappy so skipped the cultural excursion, and ate overpriced nasi campur across the road from the hotel. Heard Savanna nightjars calling from the rooftops.

Sunday

[edit ]
  • Andy Mabbett sent me the link for his talk on voice-recording and the Diff post for our pēpēhā project.

Transkribus, the Future of OCR • Andy Stauder

[edit ]
  • Read-Coop SCE (non-profit co-op with 250 members) created in 2013 from Transcriptoriu, to avoid grant cycles.
  • A pile of sponsor logos on a screen is called a "logo cemetery" in German.
  • Handwriting or early printing without standardised fonts. Uses word shapes to figure out characters. Can train models to recognise two-column layouts.
  • ScanTent, a portable book scanning setup with an Android phone app perched on top.
  • LLMs can extract text too, but they add in some complete fabrications. I asked about LLMs in Transkribus, pointing out they are the kiss of death for consumers; he denied they'd be integrated (or maybe they would host their own)

The Wikimedia Board of Trustees • Victoria Doronina

[edit ]
  • A board of trustees in a non-profit hires the CEO and does oversight, not day-to-day planning. They approve the annual plan and do the performance assessment for the CEO.
  • Three pilots: Product and Tech Advisory Council (more mobile tools); Global Funds Dissemination (call for candidates is out, deadline end of March), and Affliates
  • Adding two CAST trustees in 2025; the current 6 are all in Europe. One to two meetings a week, a meeting in New York, and at Wikimania.
  • AffCom issues: the Lusophone Wikimedia User Group was rejected by AffCom. New Zealand has been waiting for over a year for chapter status.
  • Sister projects: see thee Community Affairs Committee, Jan 2023. Reviewing Wikinews in 2025.

The Digitisation Process • Sam Wilson

[edit ]
  • His workflow: scan all the page images, zip with "_images" filename, upload to the Internet Archive which generates a PDF (this takes an hour), and import from there to Commons. He's made a video of the process. Could do with a workflow recipe book!
  • Talked about calibration cards: black, white, and 18% grey (not 50%) as the visual midpoint. Geologists use them; it's hard to get the grey correctly calibrated. ▶ I happened to have my photography calibration card from the Dunedin meeting in my pocket, and realised we could easily print off Wikisource-branded cards or rulers for scanning. (A long strip is probably better)

To follow up

[edit ]
  • Ideas for better conference swag: embroidered Wikimedia patches; metal keychains; a Wikipedia tie.
  • Ask James Grant about any Wiki work done with Opera Australia
  • Create step-by-step guide for manually uploading to Overdrive/Libby
  • Get a HDMI to USB-C adapter of my own rather than a stick-on hub
  • Darafsh: send him Wikipedian in Residence contracts, MOU, best version of the Wikisource cheat sheet.

General notes

[edit ]

On Saturday I was feel quite crappy, and was worried that my cold was COVID. The conference had basic surgical masks but no RATs, a change from Wikimania which had thousands. The hotel had no RATs, and suggest I try a pharmacy. I trudged to two in the morning heat, but neither stocked them. The nearest one that might have some they said was 40 minutes walk away, so I gave up and trudged back. Evenutually one of the organisers went out for a small bag of tests (negative), but I was not impressed. From the website, the conference seems to have had no COVID policy. Although the rooms were generally ventilated, and breaks were outside, there were no CO2 monitors. It seemed like the organisers would have been unprepared if an attendee had tested positive and possibly infected others. Next time I'll be bringing my own masks, tests, and CO2 monitor.

AltStyle によって変換されたページ (->オリジナル) /