Two years ago I asked an adorable question: is 180,000 unanswered questions too many?
Now there are 704,563 questions with no up-voted answers and counting. I've had the feeling that this flood was starting to overwhelm the site, but I went looking for some concrete data.
Percentage of open questions with answers, by quarter :
Answered Question Percentage
This is probably the most telling. The percentage with any answer is dropping steadily, while the percentage with a good answer (Score > 0) is in freefall. Voters don't seem very impressed with all the new questions, either:
Question Votes
Question Quality
This also suggests an increase in bad questions (score < 0), but just as importantly, 60% now have a score of 0. If the flood is too much even for simple actions like voting which almost everyone can do, what chance do editors and moderators have of keeping up? I'd also be interested in seeing the percentage of users with close-vote privileges over time, as well as the ratio of those users to new questions over time. I don't think SEDE has the required data though.
Eventually the site will simply cease to work if these trends continue. Community moderation will be affecting such a small percentage of questions that it might as well not be happening at all.
I can see three avenues of attacking the problem: increasing the number of (good) moderators, increasing the amount of moderating individuals can do, and building more flood gates. I'm hoping to trigger some brainstorming and draw attention to any other proposals that attempt to address parts of this problem.
Related Reading to help your brainstorming:
- Does SO need social networking features to improve the experience for expert users?
- Improve tools for closing as duplicate
- Not just Closing Duplicate, nor Canonical Answers, but Optimizing for Pearls - How do we reward Pearl-Discovery?
- 1.5 million questions on SO - organization beyond tags?
- What do you think of an SO for beginners, only?
Assorted other things to ponder:
- How can we reduce the feeling of futility when trying to moderate such a large and growing backlog?
- If we were to hypothetically "raise the bar"/narrow the scope of the site, what would we exclude? How would we draw that line in a non-arbitrary way?
-
4Don't older questions and answers have a bias though? With time on the site, old questions could find answers, and posts could accumulate votes. Time is a factor, surely.Martijn Pieters– Martijn Pieters2012年08月11日 09:42:45 +00:00Commented Aug 11, 2012 at 9:42
-
3@MartijnPieters only considering answers that came within two weeks made surprisingly little difference: 0.5%-2% lower.Brad Mace– Brad Mace2012年08月11日 10:27:35 +00:00Commented Aug 11, 2012 at 10:27
-
1Thanks for verifying, it always pays to double-check assumptions. :-)Martijn Pieters– Martijn Pieters2012年08月11日 11:59:11 +00:00Commented Aug 11, 2012 at 11:59
-
11This analysis also needs to take the automatic deletion of zero-score unanswered questions after 365 days into account. Else this will make the old days look better than they actually were.Mad Scientist– Mad Scientist2012年08月11日 12:28:08 +00:00Commented Aug 11, 2012 at 12:28
-
1@Mad that's a good point but I may have to leave that to someone else. tsql is not my home turf. If we compare 2011 Qs 2&3 (one year ago), that drop is larger than most, but we can still see the trend on both sides of the divide.Brad Mace– Brad Mace2012年08月11日 18:39:45 +00:00Commented Aug 11, 2012 at 18:39
-
1@YannisRizos: Clearly you spend way too much time on Meta reading adorable questions.user102937– user1029372012年08月11日 19:12:47 +00:00Commented Aug 11, 2012 at 19:12
-
1The SO conceptual model & implementation of a Q&A site is flawed and not scaling well but the grey beards aren't interested and simply double down with their rigid views. That's fine - but it is not attracting new people who would be interested in actually building a knowledge base. A pity since there are so many talented people here with good intentions. Unfortunately their tasks (as @RobertHarvey noted in a "Summer of Love" blog comment) are primarily janitorial. Cheers + "thanks in advance" ;-)spring– spring2012年08月12日 01:45:53 +00:00Commented Aug 12, 2012 at 1:45
-
3@skinnyTOD - "The SO conceptual model & implementation of a Q&A site is flawed and not scaling well" - OK, provide examples of other sites that have done better with the kind of posting traffic this site is now receiving. Every other forum, mailing list, or newsgroup I've ever participated in fell apart long before it reached the volume we're seeing now. I don't see the rigidity that you describe, as many of the policies and tools of the site have changed significantly since 2008. Simply look at what we accept for questions now vs. what we did then for one example.Brad Larson– Brad Larson Mod2012年08月12日 03:11:00 +00:00Commented Aug 12, 2012 at 3:11
-
@BradLarson - That's the wrong end of the stick: not a matter of "what other sites do it better." It is what SO is doing. And sorry, but the rigidity is rampant - see the earnest suggestions for improvements on meta with resulting downvotes (+ snark). An example of what is broken: I see your name on lots of edits (btw- I voted for you as moderator), doing trivial copy editing tasks when I know you could answer the question. Is SO a community generated knowledge base or more about grading student papers? Not enough room to say more but it saddens me to see all the pointless busywork.spring– spring2012年08月12日 03:24:32 +00:00Commented Aug 12, 2012 at 3:24
-
3@skinnyTOD - My point was that we are in uncharted territory, but the fundamental Q&A approach and the mechanisms around it are why we've been able to get this far. I agree that we'll need to think of new ways to address the scale we're at, but I'm not as pessimistic about their reception by the community. The new /review system is one such attempt at dealing with the scale, and the community has had significant input in that. In regards to edits, I don't mind taking a little time to make things a little clearer. Sometimes that can have as great an impact as providing an answer.Brad Larson– Brad Larson Mod2012年08月12日 03:42:02 +00:00Commented Aug 12, 2012 at 3:42
6 Answers 6
Updated to include the last two quarters of 2012.
I think what's most interesting about the update is how the numbers have changed over time. For instance, when I originally answered this question quarter 1, 2012, was complete. There were 409,490 undeleted, open, questions that quarter of which 85.61% were answered and 70.71% answered "well". Those numbers today are 405,131 questions, 87.47% answered and 74.30% answered "well", which implies that as time goes by the Stack Overflow community is doing something about the older answered questions, just not very much. The increase in the number of questions with a score of 1 or more also indicates that SO users are upvoting (viewing/using?) older questions, which is good.
The extremely noticeable consistency between the current and previous results is the fact that all these downward trends have continued, things are getting worse. On current trends, quarter 3 2013 will have less than 50% of questions where an answer is either upvoted or accepted. If the trend in answering/deleting older questions continues then it'll be rescued later but the same will happen again, permanently, around quarter 1 2014.
This is a "problem" that needs to be fixed, somehow.
I know this isn't an answer but it's far too long for a comment. There's a problem with your SQL, caused by the LEFT OUTER JOIN; you're counting questions multiple times so the problem is both slightly better and slightly worse than what you think.
I also disagree with your definition of a "good" answer. By definition an accepted answer is "good" as it has helped the OP (unless of course they've been pushed into accepting it by loads of comments but that's another matter). I've excluded Community owned questions and answers as I don't really think they are relevant (and it helps the query to work!).
Your first set of results on the number of open questions by quarter now returns the following:
Year Quarter Questions Answered GoodAnswer ---- ------- --------- -------- ---------- 2008 3 17508 99.99% 99.83% 2008 4 38790 99.87% 98.11% 2009 1 53441 99.73% 96.23% 2009 2 75339 99.56% 94.24% 2009 3 98426 99.22% 92.49% 2009 4 113136 99.02% 91.72% 2010 1 142909 98.62% 90.76% 2010 2 159213 98.01% 88.50% 2010 3 187222 97.56% 87.32% 2010 4 205903 97.41% 86.87% 2011 1 267895 96.98% 85.83% 2011 2 298140 96.16% 84.30% 2011 3 312239 95.30% 82.52% 2011 4 316593 94.46% 81.55% 2012 1 405131 87.47% 74.30% 2012 2 431420 85.41% 71.38% 2012 3 452507 83.44% 68.07% 2012 4 461118 80.62% 64.02%
As you can see the percentage of answered questions is a little worse than you thought, but the percentage of questions with a "good" answer is a little better. Personally I think the telling point here is not necessarily the number of questions but where the differences lie between my results and your own. For Q4 2008 you have 122,616 and I've got 39,557 questions, which implies that every question received over 3 answers. For Q2 2012 the difference is minimal, on average questions receiving about 1.2 answers.
I've also run this for questions with a score >= 0:
2008 3 17454 99.99% 99.84% 2008 4 38618 99.87% 98.13% 2009 1 53133 99.73% 96.24% 2009 2 74795 99.55% 94.26% 2009 3 97569 99.22% 92.52% 2009 4 111776 99.01% 91.77% 2010 1 141491 98.60% 90.82% 2010 2 157757 98.00% 88.54% 2010 3 185404 97.54% 87.34% 2010 4 203733 97.38% 86.91% 2011 1 265103 96.95% 85.88% 2011 2 293660 96.11% 84.35% 2011 3 305355 95.20% 82.55% 2011 4 308130 94.31% 81.54% 2012 1 392376 87.07% 74.14% 2012 2 417341 84.92% 71.18% 2012 3 436875 82.85% 67.81% 2012 4 443642 80.31% 64.00%
I think the surprise here is how little difference it makes. It reflects well on Stack Overflow that no matter if the question is not as good as it could be you are just as likely to get a "good" answer. Obviously, closed questions would skew this massively and as a number of "poor" questions get closed not too much can be read into this.
Lastly, here the same query is for questions with a score >= 1:
Year Quarter Questions Answered GoodAnswer ---- ------- --------- -------- ---------- 2008 3 15797 99.99% 99.88% 2008 4 32945 99.90% 98.87% 2009 1 41381 99.74% 98.18% 2009 2 53922 99.63% 97.38% 2009 3 65645 99.53% 96.71% 2009 4 69626 99.35% 95.97% 2010 1 99493 99.02% 94.66% 2010 2 104134 98.50% 93.04% 2010 3 116514 98.10% 92.16% 2010 4 122588 98.00% 91.82% 2011 1 152765 97.79% 91.31% 2011 2 169152 96.96% 89.90% 2011 3 167683 95.92% 88.18% 2011 4 165138 94.92% 86.80% 2012 1 183482 93.92% 85.34% 2012 2 180243 92.93% 83.66% 2012 3 173618 91.26% 80.88% 2012 4 188466 87.52% 74.33%
As you can see the number of answered questions and the number of questions answered "well" significantly improves though the same drop-off is observable.
My own conclusion from these statistics is that a finesse to the system to remove unanswered questions, or whatever it might be, is not what is required. The number of answered "good" questions at over 91% is, in my opinion, a pretty high number.
What seems to be needed is an increase in the number of people who answer questions. Whilst Stack Overflow has had an ever increasing number of people asking questions there hasn't been a commensurate increase in the number of people answering them.
I ran a little query to test this hypothesis:
Year Quarter Questioning Answering ---- ------- ----------- --------- 2008 3 6411 9007 2008 4 10724 13276 2009 1 13713 15995 2009 2 18929 21276 2009 3 24474 25631 2009 4 37388 36124 2010 1 47454 41411 2010 2 56895 46868 2010 3 66340 52398 2010 4 73162 59578 2011 1 95347 74931 2011 2 108828 78798 2011 3 118150 85501 2011 4 117156 89839 2012 1 151337 107792 2012 2 167394 116139 2012 3 182767 125379 2012 4 201461 133558
As you can see in the "early" days the number of users answering questions was more than the number of people asking them. This has now been completely reversed and the questioners are in the ascendant.
What the solution is, I'm not entirely sure. What seems certain though is that Stack Overflow needs to find a way of converting question askers into question answerers. Without flooding the place with crap answers.
-
1Nice work fixing my science. Your point about needing more answerers is a good one but we'll still need some way to help moderators keep up.Brad Mace– Brad Mace2012年08月11日 18:44:54 +00:00Commented Aug 11, 2012 at 18:44
-
3I strongly disagree with your statement that the measure of a good answer is acceptance by the OP. The only thing the check mark indicates is "fixed my problem", and sometimes, it doesn't even indicate that. Looking at other viewer's opinions as expressed through up/down-votes (while granting that they too can be misused) is a much better way to find quality answers. Indeed, that's one of the fundamentals of the site's functionality -- crowd vetting of answers.jscs– jscs2012年08月11日 18:55:02 +00:00Commented Aug 11, 2012 at 18:55
-
6
What seems to be needed is an increase in the number of people who answer questions.I'm not sure I can agree with that; I think the overwhelming factor is more important than the number of people. Too many duplicates are being asked. I think a better site search/suggestions could help the problem more than more people looking at an already daunting stream of questionsZelda– Zelda2012年08月11日 19:02:25 +00:00Commented Aug 11, 2012 at 19:02 -
1@Josh That had occurred to me too. Possibly if an answer is accepted with score 0, this indicates that it's Too Localized and isn't useful to anyone else. It's a pretty hard thing to verify though.Brad Mace– Brad Mace2012年08月11日 19:59:39 +00:00Commented Aug 11, 2012 at 19:59
-
The other problem is the criteria for a 'good' answer. I didn't see it in your answer, but an accepted answer with no upvotes can be just as good as an accepted answer with one upvote. Or even a question with answers and no upvotes at all. The fact that questions are getting answered at all is important. I'd be more interested in questions that have lingered with no answers for days.user139168– user1391682012年08月14日 18:17:31 +00:00Commented Aug 14, 2012 at 18:17
-
@0A0D, it's in the second paragraph... (and all the queries!) I count accepted answers but to be honest it doesn't make much difference. Look at my percentages compared to the OPs.ben is uǝq backwards– ben is uǝq backwards2012年08月14日 18:36:21 +00:00Commented Aug 14, 2012 at 18:36
-
"which implies that as time goes by the Stack Overflow community is doing something about the older (un?)answered question" Actually, this is probably due to the increased inflow of new questions, together with the greater number of people answering them. This decreases the proportion of unanswered questions, without actually making a dent in the quantity.user200500– user2005002013年02月10日 20:46:00 +00:00Commented Feb 10, 2013 at 20:46
-
Why @asad? That sentence was discussing a static, past, period of time. There can't have been any new questions during that period, though they could have been answered by new answerers.ben is uǝq backwards– ben is uǝq backwards2013年02月10日 21:10:34 +00:00Commented Feb 10, 2013 at 21:10
-
@benisuǝqbackwards Oh, I might have misunderstood then. Do 87.4 and 74.3 percentages apply to the same batch of questions?user200500– user2005002013年02月10日 21:22:45 +00:00Commented Feb 10, 2013 at 21:22
-
Undeleted? Did you mean nondeleted?Cole Tobin– Cole Tobin2013年07月28日 17:16:36 +00:00Commented Jul 28, 2013 at 17:16
-
I meant not-deleted yes @Cole...ben is uǝq backwards– ben is uǝq backwards2013年07月28日 17:19:06 +00:00Commented Jul 28, 2013 at 17:19
Yes, this is a radical suggestion. Don't panic. We're just brainstorming.
A pressure relief valve for the backlog:
One possible option would be to expire more questions automatically, such as:
- ignore the view count (as Pekka says, they don't get any better by just from being viewed)
- questions with
score <= 1and no answers - questions with
score < 1and answers that are allscore < 1. - Tie score required to remain on site to a question's age. For example, deleting unanswered questions whose age in months is greater than their score
This could significantly reduce the backlog of questions that need voting, answering, editing, and closing.
There's a risk of losing some wheat with all this chaff, but perhaps the extremely high traffic would make this acceptable and/or necessary. In this case if someone is still interested in a the question they could ask it again with no penalty. Or perhaps they just click something on the original question (still accessible from their profile) indicating that they still care about it. Knowing that question scores really matter might also encourage more voting.
-
2Can I suggest that the questions with negative answers be deleted instead of zero score answers? It is possible some good answers never get any vote and I wouldn't want that to happen. But, a bad or a wrong answer has a decent chance of accruing a downvote, making your suggestion a little bit on the sensible side.jokerdino– jokerdino2012年08月11日 10:19:40 +00:00Commented Aug 11, 2012 at 10:19
-
5Editing the question could serve to indicate they still care about it.2012年08月11日 12:18:06 +00:00Commented Aug 11, 2012 at 12:18
-
1As long as it doesn't affect stackoverflow.com/badges/95/reversal or TumbleWeed.Jeremy Thompson– Jeremy Thompson2012年08月12日 02:00:39 +00:00Commented Aug 12, 2012 at 2:00
-
2@jokerdino Some bad questions have good answers and I read recently on Meta that these should not be deleted.Remou– Remou2012年08月12日 20:19:22 +00:00Commented Aug 12, 2012 at 20:19
-
2In the more obscure areas, single vote, or even zero vote questions and answers need not be bad, just not much frequented. Some may even be valuable as the only answer to some obscure point.Remou– Remou2012年08月12日 20:19:50 +00:00Commented Aug 12, 2012 at 20:19
-
1This would affect the Tumbleweed badge. If you change it from a score of <= 1 to -1 and remove the time span month clause, I'll recind my downvote...Cole Tobin– Cole Tobin2013年07月28日 17:20:10 +00:00Commented Jul 28, 2013 at 17:20
Increase the vote limit
Let users do more voting, possibly tied to reputation.
I don't regularly run out of votes, but that's largely because I know they're limited, so I save them for the really good and the really bad. This leaves a lot of stuff in the middle with no indication of quality.
Currently 46% of questions and 37% of answers have a score of zero, ignoring closed questions.
-
6I used to run out of votes (last year), but don't do that anymore. Could it be that the amount of not-bad-but-not-very-interesting questions has increased? That would explain the high number of zero votes.Bo Persson– Bo Persson2012年08月11日 09:58:52 +00:00Commented Aug 11, 2012 at 9:58
-
2@Bo I'm pretty sure that is exactly what's happening. The really bad stuff gets filtered out; it's the "meh" mediocre stuff that feels like the majority nowadaysPekka– Pekka2012年08月12日 20:13:04 +00:00Commented Aug 12, 2012 at 20:13
-
30-40 votes a day is a lot on typical SE site, but it's very little on StackOVerflow, with so much questions asked each day.Cjxcz Odjcayrwl– Cjxcz Odjcayrwl2013年07月28日 16:56:16 +00:00Commented Jul 28, 2013 at 16:56
-
@ŁukaszLech agreed. It should be increased.Cole Tobin– Cole Tobin2013年07月28日 17:17:56 +00:00Commented Jul 28, 2013 at 17:17
Prior review
Require posts from new users to be moderated before they show up on the site, until they reach 50 rep or so. Do the same for users whose last post was closed or deleted.
This would help the people who are answering questions as they won't have to wade through so much junk to find the decent questions. Plus they won't be stinking up the website at large.
The incentives for askers would also be improved, as they'll have more reason to fix their questions if they can't just shoot out a garbage one and get and answer anyway.
-
1Who would pre-moderate those posts? The same people who are going to moderate them as soon as they're posted anyway. What would that accomplish, other than hiding the question from other low-rep users for a bit?Kevin– Kevin2012年08月12日 02:37:54 +00:00Commented Aug 12, 2012 at 2:37
-
-
But the ones who answer the most questions have the most rep and would therefore be the ones doing the vetting in the first place.Kevin– Kevin2012年08月12日 02:43:44 +00:00Commented Aug 12, 2012 at 2:43
-
@Kevin, yes but only two would have to look at it, versus dozens that will typically see it under the current system.Brad Mace– Brad Mace2012年08月12日 02:45:26 +00:00Commented Aug 12, 2012 at 2:45
Sub-portals
Create sub-portals for popular languages and platforms, similar to facebook.stackoverflow.com. This would be akin to a View in a database where you would see only a subset of the site, whether browsing, searching, reviewing, handling flags, etc. For example, a Java portal could include not just java, but also jsf, spring,playframework, etc.
Allowing people to focus on a specific area of interest divides the problem into more manageable chunks, and cleaning up their particular slice feels like a more achievable goal, just as Facebook.SO's 16000 unanswered questions seems a lot less daunting than the 700,000 on the full site.
Likewise for reviewing, seeing only a slice of the 40.5k low quality posts and 58.7k close votes would make it seem a lot less futile.
-
2
-
1What I'm suggesting here isn't anything like tag wikis. I've edited to try to make it clearer.Brad Mace– Brad Mace2012年08月12日 19:44:19 +00:00Commented Aug 12, 2012 at 19:44
-
Because this idea worked so well with facebook.User1000547– User10005472013年10月24日 17:58:30 +00:00Commented Oct 24, 2013 at 17:58
-
Brad: it's totally unclear how this is any different from browsing a specific tag or set of tags or tag wiki. All it seems to say to me is that one language portal/master-tag/whatever would correspond to a set of tags, and those would rigidly silo discussion of a concept across languages. But we already allow user to define their own favorite tag lists, and without the partitioning, so this seems to have downsides and none of the upsides?smci– smci2014年04月11日 09:35:51 +00:00Commented Apr 11, 2014 at 9:35
Implement an invite system
Only allow users to join if they're invited by an existing user, similar to how Gmail worked early on. We could require a user to have, say, 100 rep before they can invite others. This would slow the tide and give us more time to get each new user up to speed on the site's rules and standards.
Current users would hopefully explain a bit about how the site works when they invite someone. Or if they're not following the rules, they won't get enough rep to be able to invite their friends who probably wouldn't follow them either.
-
Isn't this how Careers works?Cole Tobin– Cole Tobin2013年07月28日 17:20:31 +00:00Commented Jul 28, 2013 at 17:20
-
Also, don't be invite only, but they have to ask to join. That would be better.Cole Tobin– Cole Tobin2013年07月28日 17:20:58 +00:00Commented Jul 28, 2013 at 17:20