Skip to main content
Code Review

Return to Answer

replaced http://codereview.stackexchange.com/ with https://codereview.stackexchange.com/
Source Link

If your aim is to find code dumps, then you can do better.

Disclaimer: the example posts linked in the following paragraphs are code dump suspects. Maybe they are code dumps, maybe not. Judge for yourselves.

In particular, answers that start with "I prefer:" "I prefer:", or "Try:" "Try:", or "Yes:" "Yes:", and then followed by a code dump, won't be matched by the posted query, but effectively they are still code dump suspects.

Similarly, answers that start with a code dump suspect and end with "etc..." "etc...", "and so on" "and so on", or "?" "?" are also code dump suspects.

Generally speaking, a code dump suspect looks like this:

  • Contains mostly code
  • Too short (or no) introductory text
  • Too short (or no) closing text

What is too short? There cannot be rule for that. The longer the text, the more false positives, and the shorter the text, the more code dumps missed. It's probably a good idea to make the text length before and after parameters of the query.

In addition to adding these parameters to widen the search, it's natural to include the text before and after in the query result, to make it easier to judge code dump suspects at a glance.

Something like this:

DECLARE @MaxTextLenBeforeCode AS INT = ##MaxTextLenBeforeCode:int?0##;
DECLARE @MaxTextLenAfterCode AS INT = ##MaxTextLenAfterCode:int?0##;
SELECT
 Id AS [Post Link],
 Score,
 CHARINDEX('<pre><code>', Body) - 1 AS TextLenBeforeCode,
 LEFT(Body, CHARINDEX('<pre><code>', Body) - 1) AS TextBeforeCode,
 LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>') AS TextLenAfterCode,
 RIGHT(Body, LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>')) AS TextAfterCode,
 CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
 AND CHARINDEX('<pre><code>', Body) > 0
 AND CHARINDEX('<pre><code>', Body) - 1 <= @MaxTextLenBeforeCode
 AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') >= LEN(Body) - @MaxTextLenAfterCode
 AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDate

If your aim is to find code dumps, then you can do better.

Disclaimer: the example posts linked in the following paragraphs are code dump suspects. Maybe they are code dumps, maybe not. Judge for yourselves.

In particular, answers that start with "I prefer:", or "Try:", or "Yes:", and then followed by a code dump, won't be matched by the posted query, but effectively they are still code dump suspects.

Similarly, answers that start with a code dump suspect and end with "etc...", "and so on", or "?" are also code dump suspects.

Generally speaking, a code dump suspect looks like this:

  • Contains mostly code
  • Too short (or no) introductory text
  • Too short (or no) closing text

What is too short? There cannot be rule for that. The longer the text, the more false positives, and the shorter the text, the more code dumps missed. It's probably a good idea to make the text length before and after parameters of the query.

In addition to adding these parameters to widen the search, it's natural to include the text before and after in the query result, to make it easier to judge code dump suspects at a glance.

Something like this:

DECLARE @MaxTextLenBeforeCode AS INT = ##MaxTextLenBeforeCode:int?0##;
DECLARE @MaxTextLenAfterCode AS INT = ##MaxTextLenAfterCode:int?0##;
SELECT
 Id AS [Post Link],
 Score,
 CHARINDEX('<pre><code>', Body) - 1 AS TextLenBeforeCode,
 LEFT(Body, CHARINDEX('<pre><code>', Body) - 1) AS TextBeforeCode,
 LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>') AS TextLenAfterCode,
 RIGHT(Body, LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>')) AS TextAfterCode,
 CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
 AND CHARINDEX('<pre><code>', Body) > 0
 AND CHARINDEX('<pre><code>', Body) - 1 <= @MaxTextLenBeforeCode
 AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') >= LEN(Body) - @MaxTextLenAfterCode
 AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDate

If your aim is to find code dumps, then you can do better.

Disclaimer: the example posts linked in the following paragraphs are code dump suspects. Maybe they are code dumps, maybe not. Judge for yourselves.

In particular, answers that start with "I prefer:", or "Try:", or "Yes:", and then followed by a code dump, won't be matched by the posted query, but effectively they are still code dump suspects.

Similarly, answers that start with a code dump suspect and end with "etc...", "and so on", or "?" are also code dump suspects.

Generally speaking, a code dump suspect looks like this:

  • Contains mostly code
  • Too short (or no) introductory text
  • Too short (or no) closing text

What is too short? There cannot be rule for that. The longer the text, the more false positives, and the shorter the text, the more code dumps missed. It's probably a good idea to make the text length before and after parameters of the query.

In addition to adding these parameters to widen the search, it's natural to include the text before and after in the query result, to make it easier to judge code dump suspects at a glance.

Something like this:

DECLARE @MaxTextLenBeforeCode AS INT = ##MaxTextLenBeforeCode:int?0##;
DECLARE @MaxTextLenAfterCode AS INT = ##MaxTextLenAfterCode:int?0##;
SELECT
 Id AS [Post Link],
 Score,
 CHARINDEX('<pre><code>', Body) - 1 AS TextLenBeforeCode,
 LEFT(Body, CHARINDEX('<pre><code>', Body) - 1) AS TextBeforeCode,
 LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>') AS TextLenAfterCode,
 RIGHT(Body, LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>')) AS TextAfterCode,
 CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
 AND CHARINDEX('<pre><code>', Body) > 0
 AND CHARINDEX('<pre><code>', Body) - 1 <= @MaxTextLenBeforeCode
 AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') >= LEN(Body) - @MaxTextLenAfterCode
 AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDate
Source Link
janos
  • 112.9k
  • 15
  • 154
  • 396

If your aim is to find code dumps, then you can do better.

Disclaimer: the example posts linked in the following paragraphs are code dump suspects. Maybe they are code dumps, maybe not. Judge for yourselves.

In particular, answers that start with "I prefer:", or "Try:", or "Yes:", and then followed by a code dump, won't be matched by the posted query, but effectively they are still code dump suspects.

Similarly, answers that start with a code dump suspect and end with "etc...", "and so on", or "?" are also code dump suspects.

Generally speaking, a code dump suspect looks like this:

  • Contains mostly code
  • Too short (or no) introductory text
  • Too short (or no) closing text

What is too short? There cannot be rule for that. The longer the text, the more false positives, and the shorter the text, the more code dumps missed. It's probably a good idea to make the text length before and after parameters of the query.

In addition to adding these parameters to widen the search, it's natural to include the text before and after in the query result, to make it easier to judge code dump suspects at a glance.

Something like this:

DECLARE @MaxTextLenBeforeCode AS INT = ##MaxTextLenBeforeCode:int?0##;
DECLARE @MaxTextLenAfterCode AS INT = ##MaxTextLenAfterCode:int?0##;
SELECT
 Id AS [Post Link],
 Score,
 CHARINDEX('<pre><code>', Body) - 1 AS TextLenBeforeCode,
 LEFT(Body, CHARINDEX('<pre><code>', Body) - 1) AS TextBeforeCode,
 LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>') AS TextLenAfterCode,
 RIGHT(Body, LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>')) AS TextAfterCode,
 CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
 AND CHARINDEX('<pre><code>', Body) > 0
 AND CHARINDEX('<pre><code>', Body) - 1 <= @MaxTextLenBeforeCode
 AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') >= LEN(Body) - @MaxTextLenAfterCode
 AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDate
lang-sql

AltStyle によって変換されたページ (->オリジナル) /