Community Wishlist Survey 2023/Larger suggestions/Allow querying the Commons tabular data with the Wikidata Query Service to better support large numerical datasets
Appearance
From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Mahir256 (talk | contribs) at 00:29, 13 February 2023 (→{{dynamite|title=Community Wishlist Survey/Voting|t=yes}} ). It may differ significantly from the current version .
This proposal is a larger suggestion that is out of scope for the Community Tech team. Participants are welcome to vote on it, but please note that regardless of popularity, there is no guarantee this proposal will be implemented. Supporting the idea helps communicate its urgency to the broader movement.
Allow querying the Commons tabular data with the Wikidata Query Service to better support large numerical datasets
- Problem: Wikidata are notoriously bad at storing large numerical datasets. The user interface of Wikidata and some downstream applications may currently fail on large items (items with too many statements). Therefore, some potentially useful quantitative data such as annual average temperature records or precise population data split by ethnicity cannot be currently accessed by Wikidata users. The Wikidata community maintains that large numerical datasets should instead go to tabular data files[1] [2] [3] , CSV-like tables stored on Wikimedia Commons. There are also plenty of types of data that will never have properties on Wikidata that could be stored on Wikimedia Commons that still would be useful to be able to query about or reuse on Wikipedia. One of the reasons these tables are not so widely used is their inaccessibility to the Wikidata Query Service.
- ↑ https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2022/03#Pandemie_covidu-19_v_Rakousku_(Q86847911)_is_full_(no,_literally)
- ↑ https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2022/02#Population:_P1082_or_P4179?
- ↑ https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2022/11#Tabular_style_data_on_items?
- Proposed solution: Wikidata Query Service should be extended to be able to read from tabular data on Wikimedia Commons. It will likely require some standardization of the field names in the CSV files in Wikimedia Commons and a community discussion on that should be a part of the overall process.
- Who would benefit: WikiProject Tabular data, the whole Wikidata community, and, by extension, the whole world profitting from a better open data infrastructure.
- Phabricator tickets: phab:T181319
- Proposer: Vojtěch Dostál (talk) 07:54, 30 January 2023 (UTC) and ♥Ainali talk contributions 08:03, 30 January 2023 (UTC) [reply ]
Discussion
- This is likely to be too big to be in scope for Community Tech. It is a valid proposal though, so I will move to Larger Suggestions. Thanks for participating. DWalden (WMF) (talk) 13:29, 30 January 2023 (UTC) [reply ]
- Both structured data from Wikidata (via WDQS) and tabular data (from Commons) can be read in a machine readable form, so it is up to the user to use common spreadsheet software to combine both datasets (LibreOffice Calc, MS Excel, Google Docs, but also DataFrames as in Python/pandas etc.). I don't think we should impose format constraints to make both worlds "compatible", and I don't think that WDQS should be loaded with even more secondary functionality. It is useful for the rather simple stuff, but it is not the one tool to solve problems of arbitrary complexity. —MisterSynergy (talk) 21:14, 10 February 2023 (UTC) [reply ]
Voting
- Support Support Strainu (talk) 20:19, 10 February 2023 (UTC) [reply ]
- Support Support EpicPupper (talk) 05:38, 11 February 2023 (UTC) [reply ]
- Support Support OwenBlacker (Talk) 14:44, 11 February 2023 (UTC) [reply ]
- Support Support Bluerasberry (talk) 15:04, 11 February 2023 (UTC) [reply ]
- Support Support CROIX (talk) 15:20, 11 February 2023 (UTC) [reply ]
- Support Support Novak Watchmen (talk) 17:52, 11 February 2023 (UTC) [reply ]
- Support Support Matěj Suchánek (talk) 18:39, 11 February 2023 (UTC) [reply ]
- Support Support Moebeus (talk) 00:21, 13 February 2023 (UTC) [reply ]
- Support Support Mahir256 (talk) 00:29, 13 February 2023 (UTC) [reply ]