Research:ReferenceRisk

This research documentation page is currently under construction.

Created

15:22, 22 February 2024 (UTC)

Contact

Diego Sáez Trumper

Wikimedia Foundation

Francisco Navas

Wikimedia Enterprise

Collaborators

Pablo Aragón

Wikimedia Foundation

Aitolkyn Baigutanova

KAIST

Duration: 2024-February – 2024-August

References, Knowledge Integrity, Disinformation

Research:Projects

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

A typical Wikipedia article has three atomic units that combine to craft the claims — the editor that creates the edit, the edit itself and the reference that informs the edit. This project focuses on the latter of the three. Wikipedia's verifiability principle, expects all editors to be responsible for the content they add, ruling that the "burden to demonstrate verifiability lies with the editor who adds or restores material".

Should this edict be followed to the letter, every claim across Wikipedia would be dutifully cited inline. Of course, life falls short of perfection, and it is exactly that inherently imperfect participation of the human editor that leads to change, debate and flux, creating "quality" by any standard in the long term.

Then, there is he additional task of understanding the quality of a reference.

A basic visualization of this ML model

As a collaboration between Wikimedia Enterprise and Research with the set goal of refining and productionizing previous work by the Research’s Citation quality ML model from the following paper — "Longitudinal Assessment of Reference Quality on Wikipedia", this project aims to cater to everyone from individual volunteer editors to high-volume third-party reusers. We seek to lessen the burden of understanding the quality of a reference.

Both Research and Enterprise understand that a broad range of actors in the online knowledge environment stand to benefit from the ability to evaluate citations at scale and in near real time. Manually inspecting sources or developing external algorithmic methods are costly for reusers and Research and Enterprise would like to host a scoring model that may be leveraged by customers and the community to automatically identify low and high quality citation data.

Research:ReferenceRisk

Related Projects

Related Reading