8
\$\begingroup\$

What's the opposite of CodeLess Questions? Perhaps CodeOnly Answers? (Also known as code dumps.)

Inspired by the opposite question, here's the opposite query:

SELECT Id AS [Post Link], Score, CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
 AND Body LIKE '<pre><code>%'
 AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') = LEN(Body)
 AND ClosedDate IS NULL
ORDER BY CreationDate

Checking a couple of the results, this successfully finds the code only answers, and pretty fast.

I'm wondering if there's a better, cleaner approach that the hack with the CHARINDEX.

asked Jun 9, 2015 at 19:13
\$\endgroup\$
1
  • 4
    \$\begingroup\$ Feature request: Include answers like "Here's what I would do: <codedump>" \$\endgroup\$ Commented Jun 9, 2015 at 19:19

1 Answer 1

3
\$\begingroup\$

If your aim is to find code dumps, then you can do better.

Disclaimer: the example posts linked in the following paragraphs are code dump suspects. Maybe they are code dumps, maybe not. Judge for yourselves.

In particular, answers that start with "I prefer:", or "Try:", or "Yes:", and then followed by a code dump, won't be matched by the posted query, but effectively they are still code dump suspects.

Similarly, answers that start with a code dump suspect and end with "etc...", "and so on", or "?" are also code dump suspects.

Generally speaking, a code dump suspect looks like this:

  • Contains mostly code
  • Too short (or no) introductory text
  • Too short (or no) closing text

What is too short? There cannot be rule for that. The longer the text, the more false positives, and the shorter the text, the more code dumps missed. It's probably a good idea to make the text length before and after parameters of the query.

In addition to adding these parameters to widen the search, it's natural to include the text before and after in the query result, to make it easier to judge code dump suspects at a glance.

Something like this:

DECLARE @MaxTextLenBeforeCode AS INT = ##MaxTextLenBeforeCode:int?0##;
DECLARE @MaxTextLenAfterCode AS INT = ##MaxTextLenAfterCode:int?0##;
SELECT
 Id AS [Post Link],
 Score,
 CHARINDEX('<pre><code>', Body) - 1 AS TextLenBeforeCode,
 LEFT(Body, CHARINDEX('<pre><code>', Body) - 1) AS TextBeforeCode,
 LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>') AS TextLenAfterCode,
 RIGHT(Body, LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>')) AS TextAfterCode,
 CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
 AND CHARINDEX('<pre><code>', Body) > 0
 AND CHARINDEX('<pre><code>', Body) - 1 <= @MaxTextLenBeforeCode
 AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') >= LEN(Body) - @MaxTextLenAfterCode
 AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDate
answered Jun 10, 2015 at 11:58
\$\endgroup\$
2
  • \$\begingroup\$ Now we're talking! Do you feel any performance difference between these queries? \$\endgroup\$ Commented Jun 10, 2015 at 12:13
  • \$\begingroup\$ Nope, feels just as fast (but obviously it should be slower) \$\endgroup\$ Commented Jun 11, 2015 at 20:14

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.