Wikipedia:Large language models
The use of large language models (LLMs; the "engine" behind AI chatbots, such as ChatGPT) on Wikipedia presents systemic risks to maintaining the content standards required by the core content policies, specifically through the introduction of "hallucinated" statements, unsourced or unverifiable content, and algorithmic bias. Asking an LLM to "write a Wikipedia article" can lead to output that is an outright fabrication, complete with fictitious references. It might lack neutrality and libel living people. In addition, such content can be inconsistent with Wikipedia's copyright policy.
For this reason, using LLMs to generate or rewrite article content is prohibited, save for translation and for basic copyediting of one's own work.
Wikipedia is not a testing ground. Using LLMs to write one's talk page comments or edit summaries in a non-transparent way is strongly discouraged, and obviously-generated comments may be hidden. LLMs used to generate or modify text should be mentioned in the edit summary, even if their terms of service do not require it.
Risks
Original research and "hallucinations"
-
Wikipedia articles must not contain original research – i.e. facts, allegations, and ideas for which no reliable, published sources exist. This includes any analysis or synthesis of published material that serves to reach or imply a conclusion not stated by the sources. To demonstrate that you are not adding original research, you must be able to cite reliable, published sources. They should be directly related to the topic of the article and directly support the material being presented.
LLMs are pattern completion programs: They generate text by outputting the words most likely to come after the previous ones. They learn these patterns from their training data, which includes a wide variety of content from the Internet and elsewhere, including works of fiction, low-effort forum posts, unstructured and low-quality content for search engine optimization (SEO), and so on. Because of this, LLMs will sometimes "draw conclusions" which, even if they seem superficially familiar, are not present in any single reliable source. They can also comply with prompts with absurd premises, like "The following is an article about the benefits of eating crushed glass". Finally, LLMs can make things up, which is a statistically inevitable byproduct of their design, called "hallucination". All of this is, in practical terms, equivalent to original research, or worse, outright fabrication.
As LLMs often output accurate statements, and since their outputs are typically plausible-sounding and given with an air of confidence, any time that they deliver a useful-seeming result, people may have difficulty detecting the above problems. An average user who believes that they are in possession of a useful tool, who maybe did a spot check for accuracy and "didn't see any problems", is biased to accept the output as provided; but it is highly likely that there are problems. Even if 90% of the content is okay and 10% is false, that is a huge problem in an encyclopedia. LLMs' outputs become worse when they are asked questions that are complicated, about obscure subjects, or told to do tasks to which they are not suited (e.g. tasks which require extensive knowledge or analysis). An example of this tendency is shown below:
Prompt:
Greater-toothed pademelons, Thylogale gigadonti, can be seen in wildlife preserves in Queensland
Note that "greater-toothed pademelon" is a fictitious species.
Response:
Yes, greater-toothed pademelons are found in wildlife preserves in Queensland. They are shy and nocturnal, so it may be difficult to see them during the day. However, you may be able to see their tracks or hear them rustling in the bushes at night
This complete statement is false and is based on the LLM's assumption that this was a real species. In order to provide the most plausible answer, it extracted general information about a different kind of pademelon (shown in underline) from an Australian Museum article. This serves to demonstrate that LLM's can offer statements with a confident tone even when that information is factually incorrect or unverifiable.
(LLM used: Gemini)
Unsourced or unverifiable content
-
Readers must be able to check that any of the information within Wikipedia articles is not just made up. This means all material must be attributable to reliable, published sources. Additionally, quotations and any material challenged or likely to be challenged must be supported by inline citations.
LLMs do not follow Wikipedia's policies on verifiability and reliable sourcing . LLMs sometimes exclude citations altogether or cite sources that don't meet Wikipedia's reliability standards (including citing Wikipedia as a source). In some cases, they hallucinate citations of non-existent references by making up titles, authors, and URLs. LLM output can be also influenced by bogus or specifically made-up content by third parties.[2]
LLM-hallucinated content, in addition to being original research as explained above, also breaks the verifiability policy, as it can't be verified because it is made up: there are no references to find.
Algorithmic bias and non-neutral point of view
-
Articles must not take sides, but should explain the sides, fairly and without editorial bias. This applies to both what you say and how you say it.
LLMs can produce content that is neutral-seeming in tone, but not necessarily in substance . This concern is especially salient for biographies of living persons.
Copyright violations
-
If you want to import text that you have found elsewhere or that you have co-authored with others (including LLMs), you can only do so if it is available under terms that are compatible with the CC BY-SA license.