If your aim is to find code dumps, then you can do better.
Disclaimer: the example posts linked in the following paragraphs are code dump suspects. Maybe they are code dumps, maybe not. Judge for yourselves.
In particular, answers that start with "I prefer:" "I prefer:", or "Try:" "Try:", or "Yes:" "Yes:", and then followed by a code dump, won't be matched by the posted query, but effectively they are still code dump suspects.
Similarly, answers that start with a code dump suspect and end with "etc..." "etc...", "and so on" "and so on", or "?" "?" are also code dump suspects.
Generally speaking, a code dump suspect looks like this:
- Contains mostly code
- Too short (or no) introductory text
- Too short (or no) closing text
What is too short? There cannot be rule for that. The longer the text, the more false positives, and the shorter the text, the more code dumps missed. It's probably a good idea to make the text length before and after parameters of the query.
In addition to adding these parameters to widen the search, it's natural to include the text before and after in the query result, to make it easier to judge code dump suspects at a glance.
Something like this:
DECLARE @MaxTextLenBeforeCode AS INT = ##MaxTextLenBeforeCode:int?0##;
DECLARE @MaxTextLenAfterCode AS INT = ##MaxTextLenAfterCode:int?0##;
SELECT
Id AS [Post Link],
Score,
CHARINDEX('<pre><code>', Body) - 1 AS TextLenBeforeCode,
LEFT(Body, CHARINDEX('<pre><code>', Body) - 1) AS TextBeforeCode,
LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>') AS TextLenAfterCode,
RIGHT(Body, LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>')) AS TextAfterCode,
CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
AND CHARINDEX('<pre><code>', Body) > 0
AND CHARINDEX('<pre><code>', Body) - 1 <= @MaxTextLenBeforeCode
AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') >= LEN(Body) - @MaxTextLenAfterCode
AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDate
If your aim is to find code dumps, then you can do better.
Disclaimer: the example posts linked in the following paragraphs are code dump suspects. Maybe they are code dumps, maybe not. Judge for yourselves.
In particular, answers that start with "I prefer:", or "Try:", or "Yes:", and then followed by a code dump, won't be matched by the posted query, but effectively they are still code dump suspects.
Similarly, answers that start with a code dump suspect and end with "etc...", "and so on", or "?" are also code dump suspects.
Generally speaking, a code dump suspect looks like this:
- Contains mostly code
- Too short (or no) introductory text
- Too short (or no) closing text
What is too short? There cannot be rule for that. The longer the text, the more false positives, and the shorter the text, the more code dumps missed. It's probably a good idea to make the text length before and after parameters of the query.
In addition to adding these parameters to widen the search, it's natural to include the text before and after in the query result, to make it easier to judge code dump suspects at a glance.
Something like this:
DECLARE @MaxTextLenBeforeCode AS INT = ##MaxTextLenBeforeCode:int?0##;
DECLARE @MaxTextLenAfterCode AS INT = ##MaxTextLenAfterCode:int?0##;
SELECT
Id AS [Post Link],
Score,
CHARINDEX('<pre><code>', Body) - 1 AS TextLenBeforeCode,
LEFT(Body, CHARINDEX('<pre><code>', Body) - 1) AS TextBeforeCode,
LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>') AS TextLenAfterCode,
RIGHT(Body, LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>')) AS TextAfterCode,
CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
AND CHARINDEX('<pre><code>', Body) > 0
AND CHARINDEX('<pre><code>', Body) - 1 <= @MaxTextLenBeforeCode
AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') >= LEN(Body) - @MaxTextLenAfterCode
AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDate
If your aim is to find code dumps, then you can do better.
Disclaimer: the example posts linked in the following paragraphs are code dump suspects. Maybe they are code dumps, maybe not. Judge for yourselves.
In particular, answers that start with "I prefer:", or "Try:", or "Yes:", and then followed by a code dump, won't be matched by the posted query, but effectively they are still code dump suspects.
Similarly, answers that start with a code dump suspect and end with "etc...", "and so on", or "?" are also code dump suspects.
Generally speaking, a code dump suspect looks like this:
- Contains mostly code
- Too short (or no) introductory text
- Too short (or no) closing text
What is too short? There cannot be rule for that. The longer the text, the more false positives, and the shorter the text, the more code dumps missed. It's probably a good idea to make the text length before and after parameters of the query.
In addition to adding these parameters to widen the search, it's natural to include the text before and after in the query result, to make it easier to judge code dump suspects at a glance.
Something like this:
DECLARE @MaxTextLenBeforeCode AS INT = ##MaxTextLenBeforeCode:int?0##;
DECLARE @MaxTextLenAfterCode AS INT = ##MaxTextLenAfterCode:int?0##;
SELECT
Id AS [Post Link],
Score,
CHARINDEX('<pre><code>', Body) - 1 AS TextLenBeforeCode,
LEFT(Body, CHARINDEX('<pre><code>', Body) - 1) AS TextBeforeCode,
LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>') AS TextLenAfterCode,
RIGHT(Body, LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>')) AS TextAfterCode,
CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
AND CHARINDEX('<pre><code>', Body) > 0
AND CHARINDEX('<pre><code>', Body) - 1 <= @MaxTextLenBeforeCode
AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') >= LEN(Body) - @MaxTextLenAfterCode
AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDate
If your aim is to find code dumps, then you can do better.
Disclaimer: the example posts linked in the following paragraphs are code dump suspects. Maybe they are code dumps, maybe not. Judge for yourselves.
In particular, answers that start with "I prefer:", or "Try:", or "Yes:", and then followed by a code dump, won't be matched by the posted query, but effectively they are still code dump suspects.
Similarly, answers that start with a code dump suspect and end with "etc...", "and so on", or "?" are also code dump suspects.
Generally speaking, a code dump suspect looks like this:
- Contains mostly code
- Too short (or no) introductory text
- Too short (or no) closing text
What is too short? There cannot be rule for that. The longer the text, the more false positives, and the shorter the text, the more code dumps missed. It's probably a good idea to make the text length before and after parameters of the query.
In addition to adding these parameters to widen the search, it's natural to include the text before and after in the query result, to make it easier to judge code dump suspects at a glance.
Something like this:
DECLARE @MaxTextLenBeforeCode AS INT = ##MaxTextLenBeforeCode:int?0##;
DECLARE @MaxTextLenAfterCode AS INT = ##MaxTextLenAfterCode:int?0##;
SELECT
Id AS [Post Link],
Score,
CHARINDEX('<pre><code>', Body) - 1 AS TextLenBeforeCode,
LEFT(Body, CHARINDEX('<pre><code>', Body) - 1) AS TextBeforeCode,
LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>') AS TextLenAfterCode,
RIGHT(Body, LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>')) AS TextAfterCode,
CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
AND CHARINDEX('<pre><code>', Body) > 0
AND CHARINDEX('<pre><code>', Body) - 1 <= @MaxTextLenBeforeCode
AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') >= LEN(Body) - @MaxTextLenAfterCode
AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDate