Finding which users have the most questions with unaccepted answers

Question 1

I was looking at a few profiles this afternoon and noticed a few users had plenty of open questions. This led me to wonder which users actually had the highest amount of questions with unaccepted answers (both in size and percent). So with all the hoopla I kept seeing in the 2nd monitor about SEDE queries, I decided to try my hand at writing one.

with unanswered as (
 select
 posts.Id as PostId
 from
 posts
 where posts.PostTypeId = 1 -- questions
 and posts.OwnerUserId is not null -- user exists
 and posts.AcceptedAnswerId is null -- no answer selected
 and posts.ClosedDate is null -- still open
 group by posts.Id
),
percentages as (
 select
 users.Id as UserId,
 count(posts.Id) as Questions,
 count(unanswered.PostId) as UnansweredQuestions,
 (count(unanswered.PostId) * 100.0 / count(posts.Id)) as UnansweredPct
 from
 users left outer join
 posts on users.Id = posts.OwnerUserId
 left outer join
 unanswered on posts.Id = unanswered.PostId
 where
 posts.PostTypeId = 1
 group by
 users.Id
)
select top ##MaxRowsToSelect:int?100##
 UserId as [User Link],
 UnansweredQuestions as [Unanswered Questions],
 Questions,
 round(UnansweredPct, 1) as [Unanswered %]
from percentages
where UnansweredPct > ##MinUnansweredPct:int?40##
 and UnansweredQuestions > ##MinUnansweredQuestions:int?10##
order by UnansweredPct desc;

Question 2

This is essentially a good, clear and effective query. The only feedback I can offer is minor and somewhat subjective.

In the first CTE, unanswered, I personally would use distinct rather than grouping by posts.id. I group by when I am using an aggregate function; since there's no aggregate, distinct expresses the intention clearer to me.
I'm not intimately familiar with SEDE but I think you can inner join between users and posts in the percentages CTE. This is likely to be more performant, and again it expresses the intention clearer. The left outer join to unanswered still makes sense because you want to include the count of answered as well as unanswered posts. (Personally, I write left join rather than left outer join - less typing, totally equivalent).
I'm not a huge fan of your layout and indentation style. Wholly subjective (and I don't know if I'm in the minority or you are) but I usually start each new query clause (select, from, joins, where, etc) on a new line, and leave a blank line between each clause.
I recommend aliasing all tables/rowsets for clarity, e.g. ...from users as U..., and referring to the alias e.g. select U.Id.... So long as you choose sensible aliases (never just A, B, C etc) this is clear and more compact.
Your final ordering, by UnansweredPct, doesn't guarantee the same order between re-runs (as two users could and do have the same value in that column). I'd order by more columns to ensure consistent ordering - it isn't the end of the world in this particular query, but in some cases consistency can be really important, so it's a good habit to have.
I am not a fan of using top in any "long term" SQL queries (in the context of my day job, any query which will be used in production rather than just ad-hoc code). Whether you consider your query to be the former or the latter is up to you ;-) My "production" approach would be to add a Row_Number() function, ordering by UnansweredPct (and other columns, see above), then restrict where RowNo <= ##MaxRowsToSelect:int?100## instead of the top. This also means you can just order by RowNo.

Stuart J Cuthbertson Stuart J Cuthbertson 3312 silver badges9 bronze badges · Accepted Answer · 2016-01-02 18:07:01Z

This is essentially a good, clear and effective query. The only feedback I can offer is minor and somewhat subjective.

In the first CTE, unanswered, I personally would use distinct rather than grouping by posts.id. I group by when I am using an aggregate function; since there's no aggregate, distinct expresses the intention clearer to me.
I'm not intimately familiar with SEDE but I think you can inner join between users and posts in the percentages CTE. This is likely to be more performant, and again it expresses the intention clearer. The left outer join to unanswered still makes sense because you want to include the count of answered as well as unanswered posts. (Personally, I write left join rather than left outer join - less typing, totally equivalent).
I'm not a huge fan of your layout and indentation style. Wholly subjective (and I don't know if I'm in the minority or you are) but I usually start each new query clause (select, from, joins, where, etc) on a new line, and leave a blank line between each clause.
I recommend aliasing all tables/rowsets for clarity, e.g. ...from users as U..., and referring to the alias e.g. select U.Id.... So long as you choose sensible aliases (never just A, B, C etc) this is clear and more compact.
Your final ordering, by UnansweredPct, doesn't guarantee the same order between re-runs (as two users could and do have the same value in that column). I'd order by more columns to ensure consistent ordering - it isn't the end of the world in this particular query, but in some cases consistency can be really important, so it's a good habit to have.
I am not a fan of using top in any "long term" SQL queries (in the context of my day job, any query which will be used in production rather than just ad-hoc code). Whether you consider your query to be the former or the latter is up to you ;-) My "production" approach would be to add a Row_Number() function, ordering by UnansweredPct (and other columns, see above), then restrict where RowNo <= ##MaxRowsToSelect:int?100## instead of the top. This also means you can just order by RowNo.

Stack Exchange Network

Finding which users have the most questions with unaccepted answers

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding which users have the most questions with unaccepted answers

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions