A collection of perspectives on wiki life.

Wikicite 2017, and the 7 features Wikidata needs most

At Wikicite 2017, discussions revolved around an ambitious goal to use Wikidata to create a central citation database for all the references on Wikipedia. Citations are the essential building blocks of verifiability, a core tenet of Wikipedia. This project aims to give citations the first-class treatment they deserve.

We saw three important questions emerge at the conference:

What does a good citation database look like?
How can we build this on Wikidata?
How can we integrate this with Wikipedia?

These are hard questions. To answer them, Wikicite brought together:

Expert ontologists and librarians specializing in citation and reference modeling
Groups like Crossref with treasure troves of rich bibliographic data.
Developers and data scientists with experience importing datasets into Wikidata.

[画像:image]

Wikicite may be young, but clear progress has been made already. Wikidata now boasts some great collections of bibliographic data, like the Zika corpus and data from the genewiki project. Some Wikipedias, like French and Russian, are experimenting with generating citations using Wikidata. Some citation databases are integrated with Visual Editor to make it easy to add rich citations on Wikipedia–which can hopefully one day be added to Wikidata for further reuse and tracking.

There are still a few features that Wikidata needs to be a first-class host for citation data. Even the best structured data takes time to define in Wikidata’s precise terms of items, properties, modifiers, and qualifications. Although it’s possible to use some handy tools on Wikidata for bulk actions, it often requires changing your dataset to match the tool’s specific format, or writing bespoke code for your dataset. It’s still challenging to ensure data is high quality, well-sourced, and ready for long-term maintenance.

[画像:image]

In listening to researchers’ talks, discussing with experts in working groups, and workshopping code with some of Wikidata’s soon-to-be biggest users, we determined that Wikidata needs seven features for true Wikicite readiness:

Bulk data import. There must be an easy process for loading large amounts of data onto Wikidata. There are a few partial tools, like QuickStatements, which, while itself aptly-named, is just one part of an often-arduous workflow. Other people have written custom bots to import their specific dataset, on top of libraries like Wikidata Integrator or pywikibot. Without help from an experienced Wikidata contributor, there is not an easy self-service way to move data in bulk.
Sets. Wikidata needs a feature to track and curate specific groups of items. Sets are a necessary concept to answer questions about a complete group. Right now, you can use Wikidata to tell you facts about the states in Austria, but it cannot tell you the complete list of all states in Austria. Sets are key for curators to perform this sort of cross-sectional data management.
Data management tools. Data curators need tools to monitor data of interest. Wikidata is big. The basic tools like watchlists were designed for Wikipedians articles on a much smaller scale, with a much coarser granularity than the Wikidata statement or qualifier. An institution that donates data to Wikidata may want to monitor thousands (maybe hundreds of thousands) of items and properties. Donors of complete datasets will want to watch their data for deletions, additions, and edits.
Grouping edit events. At the moment, many community members are adding data to Wikidata in bulk, but this is a fact that Wikidata’s user interface struggle to represent. Wikidata currently offers a piecemeal history of user’s individual edits, and encourages editors to add citations and references for individual statements. These features are vital, but we need a higher-level grouping feature for higher-level data uploads. For instance, it would be helpful to have an “upload ID” for associated edits across many claims. It would also be useful to have a dedicated namespace for human- and machine-readable documentation of the data load process, a kind of README that addresses the whole action. This kind of documentation not only helps community members get answers to questions before, during, and after large-scale activity, but it also helps future data donors learn about and follow best practices.
A draft or staging space. There should be a way for people to add content to Wikidata without directly modifying “live” data. Currently, when something is added to Wikidata it is immediately mixed in with everything else. It’s daunting for new users to have to get it right on the first try, let alone take quick corrective action in the face of inevitable mistakes. Modeling a dataset in Wikidata’s terms requires using Wikidata’s specific collection of items and properties. You may not see how your data fits into Wikidata—perhaps requiring new properties and items—until you begin to add it. Experienced Wikidata volunteers may review data to ensure it’s high quality, but it would be better to enable this collaborative process before data is part of the project’s official collection. You should be able to upload your data to a staging space on Wikidata, ensure it’s high quality and properly structured, and then publish it when it’s ready. The PrimarySources tool is a community-driven start to this, but such a vital feature needs support from the core. In the longer term, this feature is a small step toward maximizing Wikidata consistency, by setting the stage to transactionally add and modify large-scale data. It would be helpful to have data cleanup tools, similar to OpenRefine, available for data staging.
Data models. Wikidata needs new ways to collaborate on new kinds of items. Specifically, we need a better way to reach consensus on models for certain standard types of data. Currently, it’s possible describe the same entity in multiple ways, and lacking a forum for this process, it’s hard to discuss the differences. See, for instance, the drastically different ways that various subway lines are described as Wikidata items. Additionally, some models may want to impose certain constraints on instances, or at least indicate if an item complies with its model. Looking to the future, tools for collaborative data modeling would grow to include a library of data models unlike any other.
Point in time links. There should be a way to share a dataset from Wikidata at a given point in time. Wikidata, like Wikipedia, is continuously changing. Wikipedia supports linking to a specific revision of an article at a point in time using a permalink, and you can do the same for a specific Wikidata item. However, Wikidata places special emphasis on relationships between items, yet does not extend the permalink feature to these relationships. If you run a query on the Wikidata Query Service (the SPARQL endpoint for Wikidata), and then share the query with someone else, they may see different results.

These seven features came up consistently across several groups and discussions at Wikicite. As a room full of problem solvers, several good projects are already underway to provide community-based solutions in these areas. Among the handful that were started at the conference, we are pleased to share we’ve started work on Handcart, a tool for simplifying medium-sized bulk imports, for citation data, and much, much more. We believe trying to fix a problem is the best way to learn its details and nuances.

Wikicite made a strong case that Wikidata has a lot of valuable potential for citations, and citations are crucial for Wikipedia. As we work to address these missing features in Wikidata, we are happy to be part of the Wikicite movement to build a more verifiable world.

[画像:image]

Thanks for inviting us, Wikicite, hope to see everyone again next time!

(Photos are CC-BY and can be found here and here)

ventolinmono liked this
hat-note posted this

<<< previous next>>>

[フレーム]

{{Hatnote}}

Wikicite 2017, and the 7 features Wikidata needs most