Wikimedia Forum

Shortcut:
WM:FORUM

The Wikimedia Forum is a central place for questions, announcements and other discussions about the Wikimedia Foundation and its projects. (For discussion about the Meta wiki, see Meta:Babel.)
This is not the place to make technical queries regarding the MediaWiki software; please ask such questions at the MediaWiki support desk; technical questions about Wikimedia wikis, however, can be placed on Tech page.

You can reply to a topic by clicking the "[edit]" link beside that section, or you can start a new discussion .

Wikimedia Meta-Wiki

This box: view · talk · edit

SpBot archives all sections tagged with {{Section resolved|1=~~~~}} and sections whose most recent comment is older than 30 days.

Contradictions within Wikimedia projects

[edit ]

Latest comment: 25 days ago 3 comments2 people in discussion

Identifiable contradictions between Wikipedia articles of different languages and other Wikimedia pages is a phenomenon that could be used for improving accuracy, spotting errors, fixing outdatedness, improving categorization, and further things. It's a subject of ongoing research and development. Take a look at the new page about this and if you know of more types of examples, please add them to the page. Prototyperspective (talk) 15:20, 26 November 2025 (UTC) Reply

Link? So9q (talk) 04:24, 12 December 2025 (UTC) Reply

It's in the section header, Contradictions within Wikimedia projects. Prototyperspective (talk) 11:12, 12 December 2025 (UTC) Reply

Proposed mass import of court documents to Chinese Wikisource

[edit ]

Latest comment: 16 days ago 19 comments8 people in discussion

A longer explaination is available here. In a nutshell: A user proposed to run a bot to mass import Chinese court documents to Chinese Wikisource. This will potentially results in 85 million new content pages be created in Chinese Wikisource. He is asking other Wikimedia community users and WMF opinions on that. This will also have an potential impact on Wikidata, since there is another user running a bot (MidleadingBot) to create an item for each pages, which will result in 85 million new items in Wikidata. GZWDer (talk) 12:58, 6 December 2025 (UTC) Reply

Interesting. I'm curious if personal identifiable information (PII) is planned to be scrubbed before upload? So9q (talk) 04:26, 12 December 2025 (UTC) Reply

The data is from China Judgments Online, which is by China’s court publicity standards, already compliant with China’s privacy laws. That being said, if a court rules that some public documents once published, are no longer suitable and needs further redaction or takedown, we could ask them to file a DMCA complaint, just like what other mirrors of China Judgments Online do (on the internet there are already some mirrors, but most have paywalls; and just as I said in the original proposal, they also takedown documents due to political censorship, which we won’t follow along). SuperGrey (talk) 16:49, 16 December 2025 (UTC) Reply

You didn't answer my question. Will the document you intent to upload in Wikisource contain names, addresses and other PII? So9q (talk) 16:07, 17 December 2025 (UTC) Reply

These public documents don't have a very specific standard on their redaction levels, but from what I observe: names are occasionally redacted; personal addresses are always removed; government IDs are always removed or redacted; phone numbers are always redacted; other PIIs are most likely redacted. The situation above generally follows the Provisions mentioned below.
I don't plan to manually remove or redact PIIs by myself if they are already present on these public documents, except for the following situation: According to the The Supreme People's Court Provisions on People's Courts Release of Judgments on the Internet ,

Article 8: When releasing judgment documents on the internet, people's courts shall redact the names of the following personnel:
(1) parties and their legally-designated representatives in marriage and family cases, or inheritance disputes;
(2) Victims and their legally-designated representatives, incidental civil action plaintiffs and their legally-designated representatives, witnesses, and expert evaluators in criminal cases;
(3) Minors and their legally-designated representatives;
Article 9: The redaction of names in accordance with Article 8 of these Provisions shall be handled according to the following circumstances:
(1) Retain the surname and replace the given name with "X" [某];
(2) For the names of ethnic minorities, retain the first character and replace the rest with "X" [某];
(3) For the Chinese translations of the names of foreigners and stateless persons, retain the first character and replace the rest with "X" [某]; for the English names of foreigners and stateless persons, retain the first English letter and delete the rest;
Where different names become identical after redaction, differentiate between them by adding Arabic numerals.
Article 10: When releasing judgment documents on the internet, people's courts shall delete the following information:
(1) Natural persons' home addresses, contact information, ID numbers, bank account numbers, health conditions, vehicle license plate numbers, movable or immovable property ownership certificate numbers, and other personal information;
(2) Legal persons' and other organizations' bank account numbers, vehicle license plate numbers, movable or immovable property ownership certificate numbers and other information;
(3) Information involving commercial secrets;
(4) Information involving personal privacy in family disputes, personality rights and interests disputes and other such disputes;
(5) Information involving technical investigation measures;
(6) Other information that people's courts find inappropriate to release.
Where deleting information in accordance with the first paragraph of this Article interferes with correctly understanding the judgment document, use the symbol "×ばつ" as a partial substitute.
Article 11: When releasing judgment documents on the internet, people's courts shall retain the following information of parties, legally-designated representative, entrusted representatives, and defenders:
(1) Except where names are redacted in accordance with Article 8 of these Provisions, where the parties and their legally-designated representatives are natural persons, retain their names, dates of birth, sexes, and the districts or counties to which their domiciles belong; where the parties and their legally-designated representatives are legal persons or other organizations, retain their names, domiciles, organization codes, and the names and positions of their legally-designated representatives or principal responsible persons.
(2) Where the entrusted representatives or defenders are lawyers or basic level legal service workers, retain their names, license numbers, and the names of their law firms or basic level legal service organizations; where the entrusted representatives or defenders are other personnel, retain their names, dates of birth, sexes, the districts or counties to which their domiciles belong, and their relationship with the parties.

The above articles all begin with "people's courts shall", so naturally respecting what they have released is good enough. Still, we should open to their complaints in case they want to mend their mistakes if they found some documents unsuitable to release or need further redaction, as is required by the law.

SuperGrey (talk) 07:06, 18 December 2025 (UTC) Reply

85 million is not a normal scale. This is not what the website is for and likely would completely bring down wikimedia media files. —TheDJ (talk • contribs) 12:15, 15 December 2025 (UTC) Reply

oh wait, I figured these were pdfs, but it seems they are only the OCR'ed content ? That is slightly better I guess. Still 85 million is a lot. It likely requires significant scaling up of the wiki, and moving it to a separate database cluster. And I'm not sure if Wikidata can scale that far as well at this moment. regardless, I think it is up to Wikidata project to determine if they even want that many entries. Even for journals and a few other categories people are already discussing about potentially splitting it off from Wikidata, and the added value of all those entries to them would be near 0. —TheDJ (talk • contribs) 12:28, 15 December 2025 (UTC) Reply

+1, I started a discussion in the main Wikidata telegram channel with this as input. My view: Basically Wikidata has serious problems with scaling and the community and team have not been able to communicate well/effectively for years about it and come up with good solutions. (but this might be changing now which would be great!)

Interesting enough there are few Wikibases in or outside of Wikibase.cloud that have >1000 items what I have heard about.

It might be worth studying why Wikibase is so "slow" to take off despite being freely available and taking only minutes to set up.

Maybe new developments like federated values are needed? So9q (talk) 21:03, 15 December 2025 (UTC) Reply

User:Lydia Pintscher (WMDE) has said multiple times over the past few years that Wikidata can't handle imports on that scale, neither technically nor socially. The query service already had to be split in two because of the extremely large (and, for some reason, still ongoing) import of scientific articles. User:ASarabadani (WMF) has said that the size of the database is problematic as well and can't keep growing at the rate it currently is (d:User:ASarabadani (WMF)/Growth of databases of Wikidata).

I already responded in the Wikidata Telegram group (on the 8th of December) to the user proposing the import saying that Wikidata can't really handle that many new items and Lydia agreed with what I wrote. - Nikki (talk) 05:19, 16 December 2025 (UTC) Reply

I am in that email thread and I support this project and anyone else who has 100,000,000 things to share.

Step 1 is uploading about 10 examples and doing the data modeling. People who do this typically take 6 months for that process. @SuperGrey: is an experienced Wikimedia editor who is obviously here for the long term.

I think it is an error to dismiss a project because of its size without checking what value it can have to build out other Wikimedia content. While I do greatly doubt that the Wikimedia platform can find a use for 100 million legal documents, there are lots of examples of institutions which use Wikibase instances which exchange data with Wikidata. It could happen that we only want ~500,000 of these documents, but a Wikibase holds 100 million, and the data modeling improves Wikidata processes and our Linked Open Data infrastructure by improving modeling of courts, cities, laws, and subject matter of cases.

If anyone says they want to do a big project, then I always support them doing a pilot with 10 examples. Bluerasberry (talk) 16:02, 16 December 2025 (UTC) Reply

+1 So9q (talk) 16:21, 16 December 2025 (UTC) Reply

Thanks for the compliment and ping. I’ll upload 10 sample documents and write a proposal page here on Meta-Wiki. SuperGrey (talk) 16:38, 16 December 2025 (UTC) Reply

what about setting up a separate wikibase in wikibase.cloud and model there?

you can link to commons for example. So9q (talk) 15:58, 17 December 2025 (UTC) Reply

I just want to add that d:User:ASarabadani (WMF) was worried about the size of the db tables because we ran Wikidatas Mariadb cluster at the time of the writing of the report on small commodity servers with limited RAM. We try to store everything in RAM to keep the query servers fast. We had one master and about 10 slave read-only clones back then.

Since then 2 things have happened:

Silently WMF/WMDE beefed the servers to match those of the dbserver of OSM. I don't know who made that decision or when, I can just see in Grafana that there is no ongoing issue right now.
Elsewhere I have multiple times proposed a revision of the "keep every edit ever made to Wikdiata in one big history table in MariaDB"-strategy that WMDE has been running for years. I have seen no such revision yet. My proposal is to create a new archive MariaDB cluster with cheaper servers for all history over 2 years old. I think this would greatly reduce the memory requirement on the master mariadb cluster. Unfortunately, I have not heard back from anyone at WMDE about this.

So9q (talk) 16:27, 16 December 2025 (UTC) Reply

I encourage WMDE to develop Wikibase further to handle the data the community wants to store in an efficient and scalable way. That's hard to do. But we could start lifting out all the scientific data that the rest of Wikimedia does not need to have in Wikidata because they don't need it for interwikilinks.

I think we need to move from:

Just upload to WD because it might be useful to someone
Let's just keep and improve the large scholarly graph with overall low quality data that no other Wikimedia wiki needs to link to

->

Set up a community wikibase for any new dataset >1000 items that you would perhaps later like to be included in Wikidata
Make sure the modeling is solid and approach the Wikibase community if you think other Wikimedia wikis have a need to link to parts of it.
If the community judge that part of the collection of items fit in Wikidata, then hooray let's import it after approval. You might end up in a split situation where only some items are welcome in Wikidata.

So9q (talk) 16:33, 16 December 2025 (UTC) Reply

Just here to confirm everything @Nikki said. Lydia Pintscher (WMDE) (talk) 11:04, 17 December 2025 (UTC) Reply

SuperGrey, DMCA is for copyright issues, not for personal rights issues. Also the issue is not PII (a USA concept) but personal data (an EU concept). How well-indexed are these documents? If you search a string they contain, how likely it is to turn up on Google/Bing/Baidu?

There's a risk that we surface personal data which is currently relatively obscure. People mentioned in court rulings in China may be EU residents now, which would entitle them to GDPR rights.

The idea is generally interesting and it's definitely appropriate for Wikisource to host court rulings, though it's perhaps not the best project to provide a comprehensive database of court rulings. I recommend to implement it gradually, so that you can find issues along the way. All projects which tried something like this before have run into issues of insufficient redaction in the official databases (see JurisWiki.it in Italy, Carl Malamud in the USA), so you must assume that the same will happen here and that you will need to be in contact with the source database to help them fix mistakes. Have you already established some contacts with them? Nemo 16:18, 18 December 2025 (UTC) Reply

Most of them are not indexed by any search engines or available in websites searchable by search engines. GZWDer (talk) 14:01, 19 December 2025 (UTC) Reply

Compare Swedish Riksdag documents 250,000 Swedish legal documents from Wikidata:WikiProject Sweden/Swedish Riksdag documents, profile in Scholia. I think this is a great use of Wikidata as the works are categorized by top, have assigned topics, have disambiguated authors, and have other associated data. I would like Wikidata to be stable enough so that any country could have a law document project of this size. Bluerasberry (talk) 21:08, 20 December 2025 (UTC) Reply

Wikidata Query Service Graph Split complete 7 January 2026

[edit ]

Latest comment: 1 hour ago 10 comments3 people in discussion

Wikidata will be split into two graphs in January 2026. This will especially affect anyone who curates scientific papers in Wikidata or engages with the WikiCite project. The split is about 40 million of Wikidata's 100 million items, so is a major undertaking. The reason is that Wikidata is overtaxed with querying the full data, and the purpose of the split is to buy time until we have a new social system to decide what and how much data we want, and a technical system to manage that amount of data.

@So9q: "But we could start lifting out all the scientific data"

The WDQS programmed graph split (Q130342413) will be complete on 7 January 2026. Is that sufficient for you? This split will separate instances of scholarly publications, which are the WikiCite project, from the Wikidata main graph.

me reporting this in April 2024 en:Wikipedia:Wikipedia_Signpost/2024-05-16/Op-Ed
FAQ developed by user:TiagoLubiana earlier in 2025 d:Wikidata:SPARQL_query_service/WDQS_graph_split
This week is the last virtual hackathon to prepare for the split and anyone can ask questions on that event page d:Wikidata:Scholia/Events/2025_12

Questions from anyone? Thanks. Bluerasberry (talk) 17:07, 16 December 2025 (UTC) Reply

We discussed how a real exclusion of science items with no interwikilinks could look today in the wikibase channel in telegram.

My suggestion is to create a collaboration/partnership between WNDE/WMF and another organisation and then move the items out to at least 3 separate wikibases.

Then in QLever they could be joined with Wikidata and all is well here on after.

@Nikki might want to chime in on-wiki also. So9q (talk) 15:49, 17 December 2025 (UTC) Reply

my top 3 reasons to move the items and not just split in Blazegraph:

1) Wikibase is not built to scale in one instance to include all metadata of all knowledge. It's not planned by the WMDE team either. The db team is trying to keep the whole edit history of all items in memory, which is not ideal. 2 practical solutions exist: make an archive db cluster for old edits or get eid of the science data the Wikimedia wikis neither asked for nor use on a greater scale

2) the social cost for the smallish wikidata community is huge for 40 mio low quality items. Few wants to watch them basically.

3) the science items clutter the search function on Wikidata making it next to unusable for finding niche but important concepts. It's not great and makes it harder to see which items are missing because of all the noise. So9q (talk) 15:55, 17 December 2025 (UTC) Reply

Regarding technical system, I would say Wikibase suite is pretty ideal for modelling tons of data like the chinese dump.

But wikibase ecosystem!= wikidata.

So importing millions of whatever to a Wikimedia wiki will create repercussions for Wikidata. But importing millions of whatever in a community wikibase is fine! So9q (talk) 16:03, 17 December 2025 (UTC) Reply

@So9q: I am happy to talk through more but we will need a few exchanges.

For your thoughts on technical management of the database, post to d:Wikidata:SPARQL query service/WDQS backend update where @BTracy-WMF: will do the Wikidata Graph Split of those 40 million items on 7 January 2026.

On the WikiCite community side, there is not anyone who does database architecture or who works with Wikidata platform development. We are content creators. Over the years we have become aware of the d:Wikidata:WikiProject Limits of Wikidata and how much content Wikidata can hold. WikiCite / science papers is currently the only 40 million item dataset, but over the years, many other projects have come to Wikidata with datasets in the millions and not found a way to participate because of our data limits. In the links I have already shared, there are discussions about how to manage the amount of data and querying that we need.

The part of this that confuses and worries me the most is that we hit these limits nearly 10 years ago, and in that time, we have lacked the community social structure to have conversations about what is going to happen on what schedule. Your idea about a Wikibase suite with connected side projects is a possible solution, but federation is a major technical, labor, and financial commitment. I do not think WMF / WMDE have shared plans to support federation for a network of multiple Wikibase instances.

It is probably the case that more universities have invested money into WikiCite than the Wikimedia Foundation / WMDE have invested into Wikidata/Wikibase platform development. So9q you say of WikiCite, "get rid of the science data", "Wikimedia wikis neither asked for nor use", "science items clutter the search", "It's not great", "all the noise". Please consider your tone, content creators are not the problem here, it has been really nice to have lots of universities inspired to pay their staff to edit Wikidata through WikiCite d:Wikidata:WikiProject_PCC_Wikidata_Pilot/Participants and it is an inspiring dream to imagine being able to connect the Wikimedia ecosystem to the best sources and research. If WikiCite were not clogging the pipes, then it would have been Wikidata:Lexicographical data, OpenStreetMap data, every artwork in the world, every law including Chinese law, or any of the countless other projects that want to interconnect with Wikidata and add 10s of millions of items. Nobody - not WMF or WMDE - has given a clear plan of how much capacity Wikidata could or should have in the last 10 years or the next 10 years. It is certain that if we had pathways for institutions to participate in Wikidata, then we would have more organizations paying their staff to connect Wikidata to all sorts of popular, university, NGO, cultural sector, and government databases.

I encourage to to probe and criticize and find solutions. I am very ignorant of all of this and for my own part, just want to curate content in Wikidata and organize WikiCite as a community content curation project. I am sorry that as a content creator in the Wikidata / WikiCite project I have taken up excessive resources. Wikidata's backend has been using Blazegraph which has not been updated since 2012, and it was outdated as FOSS when we adopted it. I thought Wikidata would grow to keep up with the times. I really was not expecting Wikidata to have such limits as compared to what other databases can hold. Bluerasberry (talk) 17:01, 17 December 2025 (UTC) Reply

I have been at this for years and, honestly my conclusion is that Wikidata is broken in a few ways that hinder sustainable growth.

When I last compared the shabby servers of WMF and their server strategy and asked for transparency I didn't really see any constuctive response.

The comparison with AW I shared a few days ago in the Wikidata channel details the diferences between communit-team interactions.

I cannot tell you what the root cause of ten years of wasted time.

But I'll give you a few hints:

no external evaluation of the Wikibase database architecture has ever been conducted to my knowledge
no clear development goals
no plan to solve the issues
The IT personel of WMDE has not been clear about what they are responsible for and not.
E.g. server hosting? Database architecture? Server budget? Wikibase development?

Because of uncertainties and lack of clear communication between the community and team maybe we have ended up in a pitted conflict?

One one side the community where most are completely oblivious to the technical and social issues?

On the other a team that have seen all their energy go into band aid solutions that are not really addressing the root cause?

Maybe people on both sides are tiered of endless fruitless discussions?

Wikidata is both a resounding success story and a failure.

Talk about the latter is not appreciated.

My assessment is that this climate is in the long run not healthy to neither the community nor team.

Maybe we need a new team to take over? A new PO? A new community liaison that can help repair and reduce the friction?

I don't know.

But if nothing is done, according to yesterdays weather you can probably expect another 10 years of complaints from both sides and pointing fingers at the Wikicite community for highlighting the less than optimal database design decisions by continuing a very slow rate of upload of new scientific articles that endangers the whole project, as is stated over and over.

Perhaps it rather endangers revealing things the team is not particularly proud about and don't want to talk publicly about?

Note: I have met very few people from WMDE (all of them kind) despite having participated in two hackathons, so I might have misunderstood a lot of things.

Maybe it's time for the board to step in and help find a constructive way forward?

Send a team to help WMDE communicate more clearly and in a healthy way? Make a plan together with all stakeholders and commit to it? (including universities, scientists eager to mass upload, etc)

I dare say the technical challenges are most probably less than the social/organizational ones, but again I could be wrong. I understand the former better than the latter be honest. So9q (talk) 20:38, 17 December 2025 (UTC) Reply

Oh, we have a new Wikidata platform manager freshly hired? This could be a turning point! I'll try to reach out to him if I can find the time. So9q (talk) 20:44, 17 December 2025 (UTC) Reply

@So9q: In the context of the graph split I am writing a Signpost article on that event, and the state of Wikidata. Thanks for your thoughts on this. This is now a 10-year story, but I wish it could have been a story that was documented yearly with more smaller updates and evidence of collaborative discussion and decision making. I regret the outcome where people who curated scientific content, like me, end up with blame.

Yes, so far as I know, there is no documented public discussion where WMDE/Wikidata team and the WMF/endpoint team talk about their shared vision of the future with the community of editors. The problems we have now are 10 years old.

I want to be optimistic because there are two WMF staff hired recently who are planning an update to Wikidata's backend as documented in mw:Wikidata_Platform. If you talk to them, could you try to get some ideas for what I should report in a news update? They have one update of their own so far - mw:Wikidata:Wikidata Platform team/Newsletter. As I understand, they will benchmark alternatives to Blazegraph and plan a backend replacement soon. Bluerasberry (talk) 21:25, 20 December 2025 (UTC) Reply

Oh, thanks, that is really great news. :)

=== WDQS ===

Since scholia is already planning to set up their own QLever instance for Scholia and thus basically already forked and fixed WDQS, I'm not sure if that really needs any urgent attention right now from the platform team (assuming the Scholia team can get the resources they requested from WM Cloud and are succesfull)

=== SQL ===

The SQL scalability problems on the other hand need some love IMO. I started a discussion a few days ago in the Wikibase telegram channel and came up a new scalable backend architecture for Wikidata.

The architecture proposal possibly solves all the current mariadb scaling issues using production grade open source components (Vitess and new internal entity IDs for effective sharding and S3 for cheap append-only storage) and reuse as much as possible of the current infrastructure while being scalable to 1bn+ items.

The mariadb MW and Wikibase tables have basically grown beyond what the current master-slave setup can tolerate, and also solves the issue with SQL read/write bottlenecks. Wikidata has very high read and pretty high write pressure, which is a challenging combination for any operations team.

Maybe @ATsay-WMF and the team would be willing to review my proposal? I'm happy to attend a meeting and present it if anybody wants that. :)

See https://www.wikidata.org/wiki/Wikidata:Project_chat#Request_for_review_of_new_Wikibase_backend_architecture So9q (talk) 22:12, 26 December 2025 (UTC) Reply

Hi @So9q, thanks for sharing your proposal. We are starting a recurring office hour series focused on the Blazegraph backend migration, and you are welcome to drop by with questions there. We are a very small team focused on stabilizing Wikidata's infrastructure, but can help point questions to the right places where needed.

Also, thanks @Bluerasberry for pointing to the team newsletter! @So9q I see you've already subscribed. I noticed the link isn’t quite going to the right place, and should point to this one instead: https://www.wikidata.org/wiki/Wikidata:Wikidata_Platform_team/Newsletter. Subscribing there is a good way to stay informed about our work. ATsay-WMF (talk) 18:59, 6 January 2026 (UTC) Reply

仮アカウントを廃止しませんか?

[edit ]

Latest comment: 10 hours ago 1 comment1 person in discussion

ログインせずに投稿するユーザーは雑草取り活動には有効ですが、大胆な編集が行われた場合に対話が困難になったりします。アカウントを作成するのに特別な審査は不要でアカウント名とパスワードだけで作成できるため、仮アカウントの編集者はアカウントを作成するのが困難になるわけでもありません。Special:ListGroupRightsの「全員」の権限を、「ページを閲覧」と「新しい利用者アカウントを作成」と「短縮URLを作成」と「自身のアカウントを統合」のみに制限することを提案します。航空ファン (talk) 10:23, 6 January 2026 (UTC) Reply

New research project idea

[edit ]

Latest comment: 25 minutes ago 1 comment1 person in discussion

I have an idea for a new research project: Contributor vesting. Information about the phenomenon I intend to cover can be found on the linked page. This research project will go for the next one year. Starting a discussion here in case anyone is interested in gathering data with me, among other thoughts. Faster than Thunder (talk) 20:26, 6 January 2026 (UTC) Reply

Retrieved from "https://meta.wikimedia.org/w/index.php?title=Wikimedia_Forum&oldid=29893556"