5
\$\begingroup\$

Sometimes I like to use word count to describe someone's contributions to Stack Exchange, so I wrote a query to state the word count for a given Stack Exchange user using SEDE. click to run it online

DECLARE @total_count int, @post_id int, @post_body nvarchar(max); 
SET @total_count = 0;
DECLARE vendor_cursor CURSOR FOR
SELECT id, Body
FROM posts 
WHERE posts.OwnerUserId = 9872;
OPEN vendor_cursor
FETCH NEXT FROM vendor_cursor 
INTO @post_id, @post_body
PRINT FORMATMESSAGE(' id | count | total count')
WHILE @@FETCH_STATUS = 0
BEGIN
 DECLARE @count int;
 SELECT @count = (LEN(@post_body ) - LEN(replace(@post_body , ' ', '')))
 SELECT @total_count = @total_count + @count
 PRINT FORMATMESSAGE('%6d | %5d | %d', @post_id, @count, @total_count)
 FETCH NEXT FROM vendor_cursor
 INTO @post_id, @post_body
END
CrSb0001
6192 silver badges17 bronze badges
asked Mar 7, 2020 at 6:18
\$\endgroup\$
1
  • \$\begingroup\$ Could you add a general explanation of how your code works, as well as adding what you want specifically to be reviewed in this question? \$\endgroup\$ Commented Mar 7, 2020 at 13:28

1 Answer 1

5
\$\begingroup\$

Reviewing your T-SQL script I see that you have a cursor to iterate over Posts from a specific owner. For each body you use a neat trick to determine the word count of the body and use an extra variable to keep the running sum. You use FORMATMESSAGE to shape the data in each row.

Do know that the neat trick for the word count doesn't compensate for cases where two or more spaces are used, or where words are not separated by spaces but by other characters, like line-breaks for example. I'll ignore the fact here that in case of Posts.Body not only the text is stored but also the HTML markup.

I find the script clear and well structured.
If you're looking for an alternative way to accomplish the same output (I ignore the formatting here) I can offer this SEDE query:

DECLARE @UserId int = ##UserId## -- SDE parameter syntax
SELECT Id
 , LEN(Body ) - LEN(replace(Body , ' ', '')) [count]
 , SUM(LEN(Body ) - LEN(replace(Body , ' ', ''))) 
 OVER (ORDER BY id) [total count]
FROM posts 
WHERE posts.OwnerUserId = @UserId
ORDER BY Id

I have started with your base query as found in the CURSOR but instead added the third column with an OVER( ORDER BY id) clause which enables the calculation of running totals over sets. I'm a firm believer that when there is a feature available in the language it should be the preferred option, instead of running your own version in script. There is also a higher chance this gets better optimized by the query plan, if not today then possible in the future. Looking at wall-time and the query plan for your version and mine seems to support my gut feeling.

If you want to address the neat trick there are different ways to solve that, for example by evaluation the answers on Removing repeated duplicated characters but all of those have pros and cons and without confirmation if this is an area you want to have addressed I assume for now you're fine with your method.

answered Jul 3, 2020 at 14:57
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.