27
\$\begingroup\$

Create a program that find the latest 50 challenges with the -tag that have at least 20 answers. Then, extract the scores for each language in each of the challenges. If there are more than one answer using the same language, count all scores. Thereafter, take the top 20 most common languages and output a list with the language names, the number of answers, the average byte counts and the median byte counts. The list should be sorted by number of answers, in descending order.

You must account for variations in capitalization (for instance: Matlab = MATLAB).

In languages with many different version numbers (e.g. Python), count them as unique languages, so: Python != Python 2 != Python 2.7 != Python 3.x

Example output (output format is optional):

cJam, 66, 12.4, 8.5
Pyth, 58, 15.2, 19
Ruby, 44, 19.2, 22.5
Python, 34, 29.3, 32
Python 2.7, 22, 31.2, 40
...
...
Java, 11, 115.5, 94.5

Header formats that must be supported:

  • Starts with # Language name, or #Language name
  • Ends with xx bytes, xx Bytes or just xx
  • There can be a lot of garbage between the first comma and the last number.
  • If the language name is a link ([Name](link)), it can be skipped

If the answer has another header format, you may choose to skip it (or include it if your code can handle it).

As an example, all of the below headers must be supported:

# Language Name, N bytes
# Ruby, <s>104</s> <s>101</s> 96 bytes 
# Perl, 43 + 2 (-p flag) = 45 Bytes
# MATLAB, 5

Rules:

  • It's OK to use API or just the website-url
    • The following can be extracted from the byte count (nothing else), so no need to use a url-shortener (Maximum 44 bytes):
      • https:// (or http://)
      • codegolf
      • .stackexchange.com
      • /questions
  • The program can take input. The input will be included in the byte count.

Other than that, standard rules apply.

asked Oct 27, 2015 at 20:28
\$\endgroup\$
11
  • 11
    \$\begingroup\$ I could tell you it's Pyth without having to do this challenge at all. \$\endgroup\$ Commented Oct 27, 2015 at 20:43
  • 1
    \$\begingroup\$ is the " bytes" suffix common, let alone universal, enough to require it? \$\endgroup\$ Commented Oct 27, 2015 at 20:54
  • \$\begingroup\$ @StewieGriffin I think Sparr is saying that, while it is common, it's not always used. \$\endgroup\$ Commented Oct 27, 2015 at 21:00
  • \$\begingroup\$ As far as I can see, xx bytes is very common on recent challenges (at least since the leaderboard snippet was created). \$\endgroup\$ Commented Oct 27, 2015 at 21:02
  • 2
    \$\begingroup\$ I usually use "chars" or "characters" instead of "bytes" \$\endgroup\$ Commented Oct 27, 2015 at 21:38

2 Answers 2

14
\$\begingroup\$

R, 821 - 44 = 777 bytes

Updated results: please see the edit history to make sense of all the comments below.

 language num_answers avg_count median_count
1 RUBY 49 49.97959 30.0
2 CJAM 48 32.64583 22.0
3 PYTH 48 21.02083 14.0
4 PYTHON 2 46 86.78261 77.0
5 JULIA 43 58.90698 45.0
6 HASKELL 41 74.65854 56.0
7 PHP 40 73.52500 48.0
8 PERL 36 53.30556 34.0
9 PYTHON 3 34 90.91176 90.5
10 POWERSHELL 33 60.24242 44.0
11 C 32 221.84375 79.5
12 R 32 77.40625 62.5
13 JAVA 29 170.68966 158.0
14 JAVASCRIPT (ES6) 29 90.79310 83.0
15 JAVASCRIPT 28 68.39286 61.0
16 C# 25 193.92000 130.0
17 MATHEMATICA 23 56.04348 47.0
18 MATLAB 22 67.45455 55.0
19 TI-BASIC 19 47.05263 37.0
20 APL 18 16.55556 15.0

The code, which I could shorten a bit more:

W=library;W(XML);W(plyr)
X=xpathSApply;Y=xmlValue;D=data.frame;H=htmlParse;S=sprintf
Z="http://codegolf.stackexchange.com/"
R=function(FUN,...)do.call(rbind,Map(FUN,...))
G=function(url){d=H(url)
a=as.double(sub(".*?(\\d+)a.*","\1円",X(d,"//div[starts-with(@class,'status')]",Y)))
u=paste0(Z,X(d,"//*[contains(@class,'question-hyperlink')]",xmlGetAttr,"href"))
D(u,a)}
u=S("%s/questions/tagged/code-golf?page=%i",Z,1:50)
q=R(G,u)
u=with(q,head(u[a>20],50))
A=function(url){u=S("%s?page=%i",url,1:10)
f=function(u){d=H(u)
h=X(d, "//div[@class='post-text']//h1",Y)
p="^(.*?),.*? (\\d+)( [Bb]ytes)?$"
k=grep(p,h,v=T)
l=toupper(sub(p,"\1円",k))
c=as.double(sub(p,"\2円",k))
D(l,c)}
R(f,u)}
a=R(A,u)
L=names(tail(sort(table(a$l)),20))
x=subset(a,l%in%L)
arrange(ddply(x, "l",summarise,n=length(c),a=mean(c),m=quantile(c,0.5)),-n)

De-golfed:

library(XML)
library(plyr)
LoopBind <- function(FUN, ...) do.call(rbind, Map(FUN, ...))
GetQuestions <- function(url) {
 d = htmlParse(url)
 a=as.double(sub(".*?(\\d+)a.*","\1円",xpathSApply(d, "//div[starts-with(@class, 'status')]", xmlValue)))
 u=paste0("http://codegolf.stackexchange.com/",xpathSApply(d, "//*[contains(@class, 'question-hyperlink')]", xmlGetAttr, "href"))
 data.frame(u, a)
}
u <- sprintf("http://codegolf.stackexchange.com/questions/tagged/code-golf?page=%i", 1:50)
q <- do.call(rbind, Map(GetQuestions, u))
u <- with(q, head(u[a > 20], 50))
GetAnswers <- function(url) {
 u=sprintf("%s?page=%i",url,1:10)
 f=function(u) {
 d = htmlParse(u)
 h = xpathSApply(d, "//div[@class='post-text']//h1", xmlValue)
 p = "^(.*?),.*? (\\d+)( [Bb]ytes)?$"
 k = grep(p,h,v=T)
 l = toupper(sub(p,"\1円",k))
 c = as.double(sub(p,"\2円",k))
 data.frame(language=l,c)
 }
LoopBind(f,u)
}
a=LoopBind(GetAnswers, u)
L=names(tail(sort(table(a$l)),20))
x=subset(a,language%in%L)
arrange(ddply(x, "language", summarise, num_answers = length(c), avg_count = mean(c), median_count = quantile(c,0.5)),
 -num_answers)
answered Oct 28, 2015 at 5:05
\$\endgroup\$
9
  • 6
    \$\begingroup\$ How is the average length for C# over 6000 bytes? \$\endgroup\$ Commented Oct 28, 2015 at 13:16
  • \$\begingroup\$ @SuperJedi224 - There might be some extremely long submissions that are skewing the average. That's why median is a useful statistic because it is resistant to outliers. \$\endgroup\$ Commented Oct 28, 2015 at 15:17
  • 1
    \$\begingroup\$ I read somewhere that C# is the least golfable language. Now I know why... \$\endgroup\$ Commented Oct 28, 2015 at 19:03
  • \$\begingroup\$ @ev3commander - C# pales in comparison to Unary... \$\endgroup\$ Commented Oct 29, 2015 at 4:42
  • \$\begingroup\$ @Comintern: Eek... \$\endgroup\$ Commented Oct 29, 2015 at 19:06
7
\$\begingroup\$

Python 2, 934 - 44 (url stuff) = 890 bytes

Using the API:

from urllib2 import urlopen as u
from gzip import GzipFile as f
from StringIO import StringIO as s;x="https://api.stackexchange.com/2.2%s&site=codegolf"
import re;j=u(x%'/search/advanced?pagesize=50&order=desc&sort=creation&answers=20&tagged=code-golf');q=s(j.read());g=f(fileobj=q);true=1;false=0;l=';'.join(str(a['question_id'])for a in eval(g.read())['items']);w=[]
def r(p):
 j=u(x%('/questions/%s/answers?page=%s&filter=!9YdnSMlgz&pagesize=100'%(l,p)));g.seek(0);q.truncate();q.write(j.read());q.seek(0);k=eval(g.read());w.extend(a['body_markdown']for a in k['items'])
 if k['has_more']:r(p+1)
r(1);x={};s=sorted
for m in w:
 try:
 l,n=re.match("(.*?),.*?([0-9]+)[^0-9]*$",m.splitlines()[0]).groups();l=re.subn("# ?","",l,1)[0].upper()
 if l not in x:x[l]=[]
 x[l]+=[(l,int(n))]
 except:pass
for l in s(x,cmp,lambda a:len(x[a]),1)[:20]:
 v=s(x[l])
 print l,len(v),sum(map(lambda a:a[1],v))/len(v),v[len(v)/2][1]

Note that this code does not pay attention to the API throttling.

Output:

RUBY 60 430 32
PYTH 57 426 16
CJAM 56 35 23
C 52 170 76
PYTHON 2 51 88 79
JULIA 42 63 48
HASKELL 42 81 63
JAVASCRIPT (ES6) 41 96 83
PERL 40 44 27
PYTHON 3 37 91 89
PHP 36 98 59
JAVASCRIPT 36 743 65
POWERSHELL 35 86 44
JAVA 32 188 171
R 30 73 48
MATLAB 25 73 51
MATHEMATICA 24 57 47
APL 22 14 13
SCALA 21 204 59
TI-BASIC 21 42 24
answered Nov 1, 2015 at 13:53
\$\endgroup\$
10
  • 1
    \$\begingroup\$ @StewieGriffin Interestingly, I had to add one extra slash to the second recursive query to qualify for the /questions reduction. \$\endgroup\$ Commented Nov 1, 2015 at 14:14
  • 2
    \$\begingroup\$ The differences are because @flodel disallows suffixes other than bytes, while mine will handle other suffixes like chars. \$\endgroup\$ Commented Nov 2, 2015 at 14:14
  • \$\begingroup\$ Is it possible that your code combines C, C# and possibly C++? It seems unlikely that there are 73 C-answers. \$\endgroup\$ Commented Nov 2, 2015 at 20:57
  • \$\begingroup\$ No, I don't think so. I end the language name on the first comma. \$\endgroup\$ Commented Nov 2, 2015 at 20:58
  • 1
    \$\begingroup\$ Looks like l=re.sub("# ?|,","",l) is what replaces C# with C. \$\endgroup\$ Commented Nov 3, 2015 at 1:17

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.