Wikistats csv
Appearance
From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Fjarlq (talk | contribs) at 14:27, 2 March 2006 (clarify, add External links section at bottom). It may differ significantly from the current version .
Statistics data created by Erik Zachte's Wikistats script are available in comma-separated values (CSV) format at http://stats.wikimedia.org/csv/csv.zip
Single measures by language (column) and month (row)
Most of these files contain the same date as StatisticsMonthly.csv
(see column number in parenthesis). Additional the second column contains
the sum of all languages (tot
).
- WikipediansContributors.csv (3)
- WikipediansNew.csv (4)
- WikipediansEditsGt5.csv (5)
- WikipediansEditsGt100.csv (6)
- ArticlesTotal.csv (7)
- ArticlesTotalAlt.csv (8)
- ArticlesNewPerDay.csv (9)
- ArticlesEditsPerArticle.csv (10)
- ArticlesBytesPerArticle.csv (11)
- ArticlesGt500Bytes.csv (12, but rounded percentage)
- ArticlesGt1500Bytes.csv (13?, but rounded percentage)
- DatabaseEdits.csv (14)
- DatabaseSize.csv (15)
- DatabaseWords.csv (16)
- DatabaseLinks.csv (17)
- DatabaseWikiLinks.csv (18)
- DatabaseImageLinks.csv (19)
- DatabaseExternalLinks.csv (20)
- DatabaseRedirects.csv (21)
and
- UsagePageRequest.csv
- UsageVisits.csv
Special statistics
Ploticus
InputPloticus_X.csv with X = A,C,D,E,F,K,L,M,N,O,P,U and InputPloticusTemp.csv.
Categories
There is CategoriesXX.csv for each language (replace XX with uppercase language code).
Other files
LanguageCodes.csv contains one line for each language with:
- language code
- encoding (utf-8) this value is always utf-8, also for Wikipedias not supporting utf-8!
- category namespace prefix "(Category)" if not modified
- image namespace prefix "(Image)" if not modified
- user namespace prefix "(User)" if not modified
for instance
de,(utf-8),Kategorie,Bild,Benutzer
en,(utf-8),(Category),(Image),(User)
Layout per file
- StatisticsEditsPerArticle.csv
- language code
- total no revisions
- total no revisions by registered users
- total size in bytes for all revisions together, uncompressed (explains why dump and xml files are so huge)
- number of unique registered contributors to this article
- number of unique ip addresses for anonymous users that contributed (not exactly the same as number of unique anonymous contributors, which can not be known)
- full title in UTF-8 unicode