I was looking at a few profiles this afternoon and noticed a few users had plenty of open questions. This led me to wonder which users actually had the highest amount of questions with unaccepted answers (both in size and percent). So with all the hoopla I kept seeing in the 2nd monitor about SEDE queries, I decided to try my hand at writing one.
with unanswered as (
select
posts.Id as PostId
from
posts
where posts.PostTypeId = 1 -- questions
and posts.OwnerUserId is not null -- user exists
and posts.AcceptedAnswerId is null -- no answer selected
and posts.ClosedDate is null -- still open
group by posts.Id
),
percentages as (
select
users.Id as UserId,
count(posts.Id) as Questions,
count(unanswered.PostId) as UnansweredQuestions,
(count(unanswered.PostId) * 100.0 / count(posts.Id)) as UnansweredPct
from
users left outer join
posts on users.Id = posts.OwnerUserId
left outer join
unanswered on posts.Id = unanswered.PostId
where
posts.PostTypeId = 1
group by
users.Id
)
select top ##MaxRowsToSelect:int?100##
UserId as [User Link],
UnansweredQuestions as [Unanswered Questions],
Questions,
round(UnansweredPct, 1) as [Unanswered %]
from percentages
where UnansweredPct > ##MinUnansweredPct:int?40##
and UnansweredQuestions > ##MinUnansweredQuestions:int?10##
order by UnansweredPct desc;
1 Answer 1
This is essentially a good, clear and effective query. The only feedback I can offer is minor and somewhat subjective.
In the first CTE,
unanswered
, I personally would usedistinct
rather than grouping byposts.id
. Igroup by
when I am using an aggregate function; since there's no aggregate,distinct
expresses the intention clearer to me.I'm not intimately familiar with SEDE but I think you can
inner join
betweenusers
andposts
in thepercentages
CTE. This is likely to be more performant, and again it expresses the intention clearer. Theleft outer join
tounanswered
still makes sense because you want to include the count of answered as well as unanswered posts. (Personally, I writeleft join
rather thanleft outer join
- less typing, totally equivalent).I'm not a huge fan of your layout and indentation style. Wholly subjective (and I don't know if I'm in the minority or you are) but I usually start each new query clause (
select
,from
,join
s,where
, etc) on a new line, and leave a blank line between each clause.I recommend aliasing all tables/rowsets for clarity, e.g.
...from users as U...
, and referring to the alias e.g.select U.Id...
. So long as you choose sensible aliases (never justA
,B
,C
etc) this is clear and more compact.Your final ordering, by
UnansweredPct
, doesn't guarantee the same order between re-runs (as two users could and do have the same value in that column). I'd order by more columns to ensure consistent ordering - it isn't the end of the world in this particular query, but in some cases consistency can be really important, so it's a good habit to have.I am not a fan of using
top
in any "long term" SQL queries (in the context of my day job, any query which will be used in production rather than just ad-hoc code). Whether you consider your query to be the former or the latter is up to you ;-) My "production" approach would be to add aRow_Number()
function, ordering byUnansweredPct
(and other columns, see above), then restrictwhere RowNo <= ##MaxRowsToSelect:int?100##
instead of thetop
. This also means you can justorder by RowNo
.
Explore related questions
See similar questions with these tags.