What's the opposite of CodeLess Questions? Perhaps CodeOnly Answers? (Also known as code dumps.)
Inspired by the opposite question, here's the opposite query:
SELECT Id AS [Post Link], Score, CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
AND Body LIKE '<pre><code>%'
AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') = LEN(Body)
AND ClosedDate IS NULL
ORDER BY CreationDate
Checking a couple of the results, this successfully finds the code only answers, and pretty fast.
I'm wondering if there's a better, cleaner approach that the hack with the CHARINDEX
.
1 Answer 1
If your aim is to find code dumps, then you can do better.
Disclaimer: the example posts linked in the following paragraphs are code dump suspects. Maybe they are code dumps, maybe not. Judge for yourselves.
In particular, answers that start with "I prefer:", or "Try:", or "Yes:", and then followed by a code dump, won't be matched by the posted query, but effectively they are still code dump suspects.
Similarly, answers that start with a code dump suspect and end with "etc...", "and so on", or "?" are also code dump suspects.
Generally speaking, a code dump suspect looks like this:
- Contains mostly code
- Too short (or no) introductory text
- Too short (or no) closing text
What is too short? There cannot be rule for that. The longer the text, the more false positives, and the shorter the text, the more code dumps missed. It's probably a good idea to make the text length before and after parameters of the query.
In addition to adding these parameters to widen the search, it's natural to include the text before and after in the query result, to make it easier to judge code dump suspects at a glance.
Something like this:
DECLARE @MaxTextLenBeforeCode AS INT = ##MaxTextLenBeforeCode:int?0##;
DECLARE @MaxTextLenAfterCode AS INT = ##MaxTextLenAfterCode:int?0##;
SELECT
Id AS [Post Link],
Score,
CHARINDEX('<pre><code>', Body) - 1 AS TextLenBeforeCode,
LEFT(Body, CHARINDEX('<pre><code>', Body) - 1) AS TextBeforeCode,
LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>') AS TextLenAfterCode,
RIGHT(Body, LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>')) AS TextAfterCode,
CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
AND CHARINDEX('<pre><code>', Body) > 0
AND CHARINDEX('<pre><code>', Body) - 1 <= @MaxTextLenBeforeCode
AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') >= LEN(Body) - @MaxTextLenAfterCode
AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDate
-
\$\begingroup\$ Now we're talking! Do you feel any performance difference between these queries? \$\endgroup\$Simon Forsberg– Simon Forsberg2015年06月10日 12:13:13 +00:00Commented Jun 10, 2015 at 12:13
-
\$\begingroup\$ Nope, feels just as fast (but obviously it should be slower) \$\endgroup\$janos– janos2015年06月11日 20:14:21 +00:00Commented Jun 11, 2015 at 20:14
<codedump>
" \$\endgroup\$