Jump to content
Wikimedia Meta-Wiki

Research:ReferenceRisk

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by FNavas-WMF (talk | contribs) at 15:36, 6 March 2024. It may differ significantly from the current version .

This research documentation page is currently under construction.

Created
15:22, 22 February 2024 (UTC)
Duration:  2024-February – 2024-August
References, Knowledge Integrity, Disinformation

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


A typical Wikipedia article has three atomic units that combine to craft the claims — the editor that creates the edit, the edit itself and the reference that informs the edit. This project focuses on the latter of the three. Wikipedia's verifiability principle, expects all editors to be responsible for the content they add, ruling that the "burden to demonstrate verifiability lies with the editor who adds or restores material".

Should this edict be followed to the letter, every claim across Wikipedia would be dutifully cited inline. Of course, life falls short of perfection, and it is exactly that inherently imperfect participation of the human editor that leads to change, debate and flux, creating "quality" by any standard in the long term.

Then, there is he additional task of understanding the quality of a reference.

A basic visualization of this ML model

As a collaboration between Wikimedia Enterprise and Research with the set goal of refining and productionizing previous work by the Research’s Citation quality ML model from the following paper — "Longitudinal Assessment of Reference Quality on Wikipedia", this project aims to cater to everyone from individual volunteer editors to high-volume third-party reusers. We seek to lessen the burden of understanding the quality of a reference.

Both Research and Enterprise understand that a broad range of actors in the online knowledge environment stand to benefit from the ability to evaluate citations at scale and in near real time. Manually inspecting sources or developing external algorithmic methods are costly for reusers and Research and Enterprise would like to host a scoring model that may be leveraged by customers and the community to automatically identify low and high quality citation data.

AltStyle によって変換されたページ (->オリジナル) /