5
\$\begingroup\$

I am trying to efficiently use the SEDE tool to get a list of tags a user has answered questions on, and their stats on each Tag.

My code works for my account in 3185 ms, but if I try to run it on @jon-skeet's, user ID (22656) then it times out.

Are there ways to make this more efficient / better code, and allow it to be used for any user ID?

Stack Explorer Link

-- The tags and scores of my posts
CREATE TABLE #MyPosts (
 _tags nvarchar(250) COLLATE SQL_Latin1_General_CP1_CS_AS, 
 _score int
);
INSERT INTO #MyPosts
SELECT
 Question.Tags,
 Answers.Score
FROM
 Posts as Answers
INNER JOIN
 Posts AS Question
ON
 Question.Id = Answers.ParentId
WHERE
 Answers.PostTypeId = 2 AND
 Answers.OwnerUserId = ##UserId:int?8041461##;
--------------------------------------------------------------------------
-- All of the tags with the total score and the amount of times used
CREATE TABLE #TagCounts (
 _tagName nvarchar(35) COLLATE SQL_Latin1_General_CP1_CS_AS, 
 _score int,
 _count int
);
INSERT INTO #TagCounts
SELECT
 Tags.TagName,
 sum(_score),
 count(*)
FROM 
 #MyPosts
INNER JOIN Tags ON #MyPosts._tags LIKE '%<'+Tags.TagName +'>%'
GROUP BY Tags.Id,Tags.TagName;
--------------------------------------------------------------------------
-- Outputs the tags with links, total score, count and average score
SELECT
 _tagName as [TagName],
 _score as 'Total Score',
 _count as 'Answer Count',
 cast(_score as FLOAT) / _count as 'Average Score'
FROM
 #TagCounts
WHERE
 _score IS NOT NULL
ORDER BY
 _score DESC
asked May 11, 2018 at 0:50
\$\endgroup\$
3
  • \$\begingroup\$ What flavour of SQL is this? Why are you creating temporary tables? \$\endgroup\$ Commented May 11, 2018 at 20:44
  • \$\begingroup\$ It's the SEDE version of TSQL, I am creating the temporary tables to pass results between the sections, filtering down to try to keep under the timeout limit. \$\endgroup\$ Commented May 11, 2018 at 20:50
  • \$\begingroup\$ Please show the results of an EXPLAIN on each of your queries. \$\endgroup\$ Commented May 11, 2018 at 21:07

1 Answer 1

2
\$\begingroup\$

This TagName column holding all the tags is denormalized and the non-equi-join ON #MyPosts._tags LIKE '%<'+Tags.TagName +'>%' kills performance. There's no efficient way for the optimizer to do this join, thus it's like a Cross-Join to the 50000+ rows in Tags.

But the base data model is normalized, there's an associative table PostTags between Posts and Tags and then it's simple equi-joins:

with MyPosts as 
 (
-- The tags and scores of my posts
 SELECT
 Question.Id AS PostId,
 Answers.Score
 FROM
 Posts as Answers
 INNER JOIN
 Posts AS Question
 ON
 Question.Id = Answers.ParentId
 WHERE
 Answers.PostTypeId = 2 AND
 Answers.OwnerUserId = ##UserId:int?8041461##
 )
-- All of the tags with the total score and the amount of times used
SELECT
 t.TagName,
 sum(score) as "Total Score",
 count(*) as "Answer Count",
 cast(sum(score) as FLOAT) / count(*) as 'Average Score'
FROM 
 MyPosts AS p
JOIN PostTags AS pt
 ON p.PostId = pt.PostId
JOIN Tags AS t
 ON pt.TagId = t.Id
GROUP BY t.Id,t.TagName
ORDER BY "Total Score" DESC;

Your HAVING sum(score) IS NULL is not needed, there's no NULL score.

See SEDE query which runs fine with John Skeet, too :-)

answered May 12, 2018 at 9:44
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.