Jump to content
Wikimedia Meta-Wiki

Wikistats csv

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Fjarlq (talk | contribs) at 14:27, 2 March 2006 (clarify, add External links section at bottom). It may differ significantly from the current version .

Statistics data created by Erik Zachte's Wikistats script are available in comma-separated values (CSV) format at http://stats.wikimedia.org/csv/csv.zip


Single measures by language (column) and month (row)

Most of these files contain the same date as StatisticsMonthly.csv (see column number in parenthesis). Additional the second column contains the sum of all languages (tot).

  • WikipediansContributors.csv (3)
  • WikipediansNew.csv (4)
  • WikipediansEditsGt5.csv (5)
  • WikipediansEditsGt100.csv (6)
  • ArticlesTotal.csv (7)
  • ArticlesTotalAlt.csv (8)
  • ArticlesNewPerDay.csv (9)
  • ArticlesEditsPerArticle.csv (10)
  • ArticlesBytesPerArticle.csv (11)
  • ArticlesGt500Bytes.csv (12, but rounded percentage)
  • ArticlesGt1500Bytes.csv (13?, but rounded percentage)
  • DatabaseEdits.csv (14)
  • DatabaseSize.csv (15)
  • DatabaseWords.csv (16)
  • DatabaseLinks.csv (17)
  • DatabaseWikiLinks.csv (18)
  • DatabaseImageLinks.csv (19)
  • DatabaseExternalLinks.csv (20)
  • DatabaseRedirects.csv (21)

and

  • UsagePageRequest.csv
  • UsageVisits.csv

Special statistics

Ploticus

InputPloticus_X.csv with X = A,C,D,E,F,K,L,M,N,O,P,U and InputPloticusTemp.csv.

Categories

There is CategoriesXX.csv for each language (replace XX with uppercase language code).

Other files

LanguageCodes.csv contains one line for each language with:

  1. language code
  2. encoding (utf-8) this value is always utf-8, also for Wikipedias not supporting utf-8!
  3. category namespace prefix "(Category)" if not modified
  4. image namespace prefix "(Image)" if not modified
  5. user namespace prefix "(User)" if not modified

for instance

de,(utf-8),Kategorie,Bild,Benutzer
en,(utf-8),(Category),(Image),(User)

Layout per file

  • StatisticsEditsPerArticle.csv
  1. language code
  2. total no revisions
  3. total no revisions by registered users
  4. total size in bytes for all revisions together, uncompressed (explains why dump and xml files are so huge)
  5. number of unique registered contributors to this article
  6. number of unique ip addresses for anonymous users that contributed (not exactly the same as number of unique anonymous contributors, which can not be known)
  7. full title in UTF-8 unicode

AltStyle によって変換されたページ (->オリジナル) /