I'am trying to build a PHP/MySQL/jQuery comment system. Off late I started to realize that manual spamming is a serious issue that need to be addressed carefully. So I thought of building a PHP function (initially, later can be implemented in OOPS concept) that will sniff the content and give points (spam points) based on some criteria. Getting a spam point less than will make the content inactive and send for moderation, whereas posts with spam points greater than 2 won't make the content eligible to be inserted in DB. This is a very basic piece of codes, so I want you to give me your valuable suggestion as how to make this code much better and if I am doing something wrong here.
<?php
#spam points < 2 { will be send for moderation}
#spam points > 2 { wont be posted , ie wont be inserted in DB}
function sniffSpams($content)
{
$spam_points = 0;
$url_pattern = '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i';
preg_match_all($url_pattern, $content, $matches, PREG_PATTERN_ORDER);
if(! empty($matches))
{
//url is/are present
$get_number_of_urls = count($matches[0]); //get the number of urls/emails present in content
if($get_number_of_urls > 2)
$spam_points += $get_number_of_urls; //1 point per link
else
{
$spam_words_uri = array('free', 'vote', 'play');
//if less thamn 2 , check for length of url
foreach ($matches[0] as $url)
{
if(strlen($url) > 150) //long url mostly are spam
$spam_points += 1;
foreach($spam_words_uri as $spam)
{
if(stripos($url, $spam) !== false )
$spam_points += 1;
}
}
}
}
$spam_words = array('Levitra', 'viagra', 'casino', '*');
foreach($spam_words as $spam)
{
if(stripos($content, $spam) !== false )
$spam_points += 1;
}
return $spam_points;
}
echo sniffSpams('the * dsjdjsd ');
-
1\$\begingroup\$ You could pull the spam words from a database or text file. That way you don't have to keep change the code. \$\endgroup\$James– James2014年02月13日 22:28:38 +00:00Commented Feb 13, 2014 at 22:28
-
\$\begingroup\$ @James yep , that is the next thing , will be pulling strings (banned etc) form text file or so . Next will be wrapping this thing into a class . But what do you thing about the function ? Feasible or need some more changes ? \$\endgroup\$Dimag Kharab– Dimag Kharab2014年02月14日 06:50:59 +00:00Commented Feb 14, 2014 at 6:50
1 Answer 1
Minor (and subjective) remarks:
- Function name: The function does not sniff spam but calculate a score for the given content. Therefore I'd rename the function to something like
getSpamScore
- The variable
$matches
is not declared. Declaring it before using in inpreg_match_all
allows IDEs to perform code analysis and other developers don't have a hard time looking for its declaration. - Two empty lines just take up space. Your code should be clear enough from structure not to need any extra indentions (which you don't have) or double blank lines. As the
preg_match_all
is directly related to theif
and only to it, I'd not have any lines at all between the match and the if-statement. (This applies to all other lines too) //url is/are present
comment: more of a personal thing maybe: I don't like obvious comments. Personally I prefer theif it needs a comment it needs to be refactored
way of writing comments (non in about almost every case). Comments need maintenance time (updating comments for changed code). But time should only be spent here if it provides value. Additionally comments tend to loose their relation to the code they originally were attached to (mostly by refactoring). This greatly confuses other developers.$get_number_of_urls
naming: this variable contains a verb. Verbs indicate actions (e.g. a method). In PHP I'd expect this variable to be a closur, not an integer. Yet you don't actually get the number or urls, you already have them. A proper name might benumber_of_urls
instead.- Stick to one convention of styling code: in your case this especially applies for where to have whitespaces. In your
if
andforeach
lines you mix upforeach(
foreach (
and yourif
conditions sometimes have whitespaces in between and sometimes not. I suppose this is some kind of work-in-progress code. Yet later styling costs time. Code you write always should meetproduction ready
requirements (regarding style). It takes some time at the beginning to stick with someones conventions, but you'll get used to it and do it automatically after some time. - In my opinion this function has too many responsibilities (at least two). I'd suggest splitting this function into two sub-functions. This function itself only calls and sums of the results of the independent tests. When going OO this of course wants a design pattern ;)
-
\$\begingroup\$ Thanks a ton for the remarks . Will be working on those you pointed out ! \$\endgroup\$Dimag Kharab– Dimag Kharab2014年02月17日 04:49:22 +00:00Commented Feb 17, 2014 at 4:49