Jump to content
Wikimedia Meta-Wiki

Talk:India Program/Indic Languages/Statistics/2011 Annual Update

Add topic
From Meta, a Wikimedia project coordination wiki

Nice work

[edit ]
Latest comment: 13 years ago 3 comments3 people in discussion

Thanks Shiju for a detailed analysis of 2011 stats. I am sure that each indic wikipedia can benefit from this and develop/fine tune their plans. --Arjunaraoc (talk) 07:57, 16 February 2012 (UTC) Reply

Thanks Arjuna for commenting. Hope different Indic wiki communities will benefit from this. --Shiju Alex (WMF) (talk) 03:09, 17 February 2012 (UTC) Reply
Good work, Shiju. Having three major sections with color codes help a lot. An annual study is quite enough. Quarterly will be a waste of time :) I agree that with the introduction of bots and machine translated articles, over all size of Wikipedia and average size doesn't matter now. One really useful metric is to normalize the stats by means of the population of speakers--Ravidreams (talk) 17:48, 17 February 2012 (UTC) Reply

My comments - some I've posted in Tamil Wikipedia

[edit ]
Latest comment: 13 years ago 4 comments2 people in discussion

Hi Shiju, thanks for the detailed analysis. It is a good start but it needs some further improvement. Although we briefly discussed, we could not meet or talk during my recent trip to India during Nov-Dec 2011. I was visiting some 12 different cities in about 30 plus days time, which was indeed too hectic. I have posted some of my comments about this annual update is in Tamil Wikipedia here. The main points I'm making there, after congratulating your efforts and analyis, are the following:

  • I dont understand what you mean by "medium-sized" in your statement, "Marathi for the medium-sized communities, and Tamil amongst the larger communities", while you give in the Table the population of Marathi speakers as 9 crores and Tamil speakers as 6.6 crores!
  • It is quite misleading just to give the "raw" article count. You can give it, but I feel you should also give article count which has more than 200 Characters. Even with 200-250 characters, not much can be said in an article! So, the raw count can be misleading, especially for those who may not know how an article is counted. Even reputed Indian media outlets don't seem to know or care to report.
  • When comparing Indic Wikis, I strongly feel that total size of the Wikipedia (in bytes) and average article size in bytes are useful parameters to count. Even the percentage of articles that have an article size of 0.5K and 2K are important. In Tamil Wikipedia, we have done such comparisons (for a bunch of Indic languages) for many years now. See Quality factos-a Comparison
  • When assessing the number of Wikipedians (new etc.), an estimate of the ratio of Female/Male Wikipedians would also be a very useful measure to monitor.

I entirely concur with your viewpoint that although we have crores of people speaking almost any of these major languages, only about 100 or so people are active in contributing to any one of the Indic Wikipedias and there is definitely a huge disconnect (or lack of awareness). Although we in Tamil Wikipedia (I'm sure other language wikipedians as well) were aware of this disconnect and tried various ways to correct this short-coming, I think, we have not yet succeeded in attracting at least a 1000 regular contributors. I "believe" we may succeed in this sooner or later! I'm hopeful.

Again congratulations on your painstaking efforts to compile this report and provide this update. It would be very useful to a lot of people, I'm sure. I hope you would expand the scope of your analysis for your next installment. I hope I can assist you next time.

--C.R.Selvakumar (talk) 23:41, 16 February 2012 (UTC) Reply


Thank you very much Selva for your valuable comments. My reply for your comments below.
//*I dont understand what you mean by "medium-sized" in your statement, "Marathi for the medium-sized communities, and Tamil amongst the larger communities", while you give in the Table the population of Marathi speakers as 9 crores and Tamil speakers as 6.6 crores!//
Sorry for not writing it clearly. In this report by communities I mean the current strength of Wiki community (NOT the total number of language speakers). I rewrote that sentence to make the meaning more clear.
//*It is quite misleading just to give the "raw" article count. You can give it, but I feel you should also give article count which has more than 200 Characters. Even with 200-250 characters, not much can be said in an article! So, the raw count can be misleading, especially for those who may not know how an article is counted. Even reputed Indian media outlets don't seem to know or care to report.//


That was true 2 years before. With the introduction of Google's and Microsoft's auto translation tool and the heavy usage of bots in some wikis, that parameter is becoming increasingly irrelevant now (even raw article count also). It is increasingly becoming difficult to compare apples with apples due to this.


//*When assessing the number of Wikipedians (new etc.), an estimate of the ratio of Female/Male Wikipedians would also be a very useful measure to monitor.//


As of now we do not have data for this. The only way individual communities can find out this is through interaction between active members in the respective wiki. Then also there are many wikipedians who do not like to reveal their identity. So this is very difficult to keep track.
//I hope you would expand the scope of your analysis for your next installment. I hope I can assist you next time. //
You are most welcome to convert this into a team effort. Over the past 2 or 3 years it is only me doing this. I will be very happy if this can be extended to a team effort. In that case we can think about releasing quarterly reports also.--Shiju Alex (WMF) (talk) 03:06, 17 February 2012 (UTC) Reply
Thanks Shiju for your clarification. I partially agree with your other comments. It is possible, I believe, to separate out Google translated articles or to weight them in some manner. For example, In Tamil Wikipedia, the Google-translated articles are saved in a separate Category and the total bytes can be estimated by simply sampling a few of them (or can be estimated more substatially through a program). You can ask for help from each Wiki as well. Now, a tiny fraction of these articles are being corrected, Tamil-wikified and added. In Tamil, we have about 1210 articles and the size varies roughly from 50K to 80K each . This would give a solid 70M or so to the total size (just a guess, a unscientific guess). A 1/3 increase (assuming something like 210M was the size before). So, you can weight with 2/3rd for the total byte size (for example), subtract the articles created thus from total number etc. I don't say it is an easy task, but then if we want to get some realistic numbers for comparisons or to measure progress without even making comparisons, such methods may have to be used. In any case more than the number the quality of articles is what matters (my personal view).--C.R.Selvakumar (talk) 21:07, 17 February 2012 (UTC) Reply

To take seasonal variations into account, I have included data for December 2011. The sections are updated as per the updated statistics. Not much variations are noted for community statistics. But there are varions in the number of readers.--Shiju Alex (WMF) (talk) 06:46, 29 February 2012 (UTC) Reply

AltStyle によって変換されたページ (->オリジナル) /