Stack Overflow Tag Usage

Question 1

I am trying to efficiently use the SEDE tool to get a list of tags a user has answered questions on, and their stats on each Tag.

My code works for my account in 3185 ms, but if I try to run it on @jon-skeet's, user ID (22656) then it times out.

Are there ways to make this more efficient / better code, and allow it to be used for any user ID?

Stack Explorer Link

-- The tags and scores of my posts
CREATE TABLE #MyPosts (
 _tags nvarchar(250) COLLATE SQL_Latin1_General_CP1_CS_AS, 
 _score int
);
INSERT INTO #MyPosts
SELECT
 Question.Tags,
 Answers.Score
FROM
 Posts as Answers
INNER JOIN
 Posts AS Question
ON
 Question.Id = Answers.ParentId
WHERE
 Answers.PostTypeId = 2 AND
 Answers.OwnerUserId = ##UserId:int?8041461##;
--------------------------------------------------------------------------
-- All of the tags with the total score and the amount of times used
CREATE TABLE #TagCounts (
 _tagName nvarchar(35) COLLATE SQL_Latin1_General_CP1_CS_AS, 
 _score int,
 _count int
);
INSERT INTO #TagCounts
SELECT
 Tags.TagName,
 sum(_score),
 count(*)
FROM 
 #MyPosts
INNER JOIN Tags ON #MyPosts._tags LIKE '%<'+Tags.TagName +'>%'
GROUP BY Tags.Id,Tags.TagName;
--------------------------------------------------------------------------
-- Outputs the tags with links, total score, count and average score
SELECT
 _tagName as [TagName],
 _score as 'Total Score',
 _count as 'Answer Count',
 cast(_score as FLOAT) / _count as 'Average Score'
FROM
 #TagCounts
WHERE
 _score IS NOT NULL
ORDER BY
 _score DESC

Question 2

What flavour of SQL is this? Why are you creating temporary tables?

Question 3

It's the SEDE version of TSQL, I am creating the temporary tables to pass results between the sections, filtering down to try to keep under the timeout limit.

Question 4

Please show the results of an EXPLAIN on each of your queries.

Question 5

This TagName column holding all the tags is denormalized and the non-equi-join ON #MyPosts._tags LIKE '%<'+Tags.TagName +'>%' kills performance. There's no efficient way for the optimizer to do this join, thus it's like a Cross-Join to the 50000+ rows in Tags.

But the base data model is normalized, there's an associative table PostTags between Posts and Tags and then it's simple equi-joins:

with MyPosts as 
 (
-- The tags and scores of my posts
 SELECT
 Question.Id AS PostId,
 Answers.Score
 FROM
 Posts as Answers
 INNER JOIN
 Posts AS Question
 ON
 Question.Id = Answers.ParentId
 WHERE
 Answers.PostTypeId = 2 AND
 Answers.OwnerUserId = ##UserId:int?8041461##
 )
-- All of the tags with the total score and the amount of times used
SELECT
 t.TagName,
 sum(score) as "Total Score",
 count(*) as "Answer Count",
 cast(sum(score) as FLOAT) / count(*) as 'Average Score'
FROM 
 MyPosts AS p
JOIN PostTags AS pt
 ON p.PostId = pt.PostId
JOIN Tags AS t
 ON pt.TagId = t.Id
GROUP BY t.Id,t.TagName
ORDER BY "Total Score" DESC;

Your HAVING sum(score) IS NULL is not needed, there's no NULL score.

See SEDE query which runs fine with John Skeet, too :-)

dnoeth 1,2227 silver badges5 bronze badges · Accepted Answer · 2018-05-12 09:44:49Z

This TagName column holding all the tags is denormalized and the non-equi-join ON #MyPosts._tags LIKE '%<'+Tags.TagName +'>%' kills performance. There's no efficient way for the optimizer to do this join, thus it's like a Cross-Join to the 50000+ rows in Tags.

But the base data model is normalized, there's an associative table PostTags between Posts and Tags and then it's simple equi-joins:

with MyPosts as 
 (
-- The tags and scores of my posts
 SELECT
 Question.Id AS PostId,
 Answers.Score
 FROM
 Posts as Answers
 INNER JOIN
 Posts AS Question
 ON
 Question.Id = Answers.ParentId
 WHERE
 Answers.PostTypeId = 2 AND
 Answers.OwnerUserId = ##UserId:int?8041461##
 )
-- All of the tags with the total score and the amount of times used
SELECT
 t.TagName,
 sum(score) as "Total Score",
 count(*) as "Answer Count",
 cast(sum(score) as FLOAT) / count(*) as 'Average Score'
FROM 
 MyPosts AS p
JOIN PostTags AS pt
 ON p.PostId = pt.PostId
JOIN Tags AS t
 ON pt.TagId = t.Id
GROUP BY t.Id,t.TagName
ORDER BY "Total Score" DESC;

Your HAVING sum(score) IS NULL is not needed, there's no NULL score.

See SEDE query which runs fine with John Skeet, too :-)

Stack Exchange Network

Stack Overflow Tag Usage

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Stack Overflow Tag Usage

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions