3
\$\begingroup\$

I'am trying to build a PHP/MySQL/jQuery comment system. Off late I started to realize that manual spamming is a serious issue that need to be addressed carefully. So I thought of building a PHP function (initially, later can be implemented in OOPS concept) that will sniff the content and give points (spam points) based on some criteria. Getting a spam point less than will make the content inactive and send for moderation, whereas posts with spam points greater than 2 won't make the content eligible to be inserted in DB. This is a very basic piece of codes, so I want you to give me your valuable suggestion as how to make this code much better and if I am doing something wrong here.

<?php 
#spam points < 2 { will be send for moderation}
#spam points > 2 { wont be posted , ie wont be inserted in DB}
function sniffSpams($content)
{
 $spam_points = 0;
 $url_pattern = '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i';
 preg_match_all($url_pattern, $content, $matches, PREG_PATTERN_ORDER);
 if(! empty($matches))
 {
 //url is/are present
 $get_number_of_urls = count($matches[0]); //get the number of urls/emails present in content
 if($get_number_of_urls > 2)
 $spam_points += $get_number_of_urls; //1 point per link
 else
 {
 $spam_words_uri = array('free', 'vote', 'play');
 //if less thamn 2 , check for length of url
 foreach ($matches[0] as $url) 
 {
 if(strlen($url) > 150) //long url mostly are spam
 $spam_points += 1;
 foreach($spam_words_uri as $spam)
 {
 if(stripos($url, $spam) !== false ) 
 $spam_points += 1;
 }
 }
 }
 }
 $spam_words = array('Levitra', 'viagra', 'casino', '*');
 foreach($spam_words as $spam)
 {
 if(stripos($content, $spam) !== false ) 
 $spam_points += 1;
 }
 return $spam_points;
}
echo sniffSpams('the * dsjdjsd ');
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Feb 11, 2014 at 12:59
\$\endgroup\$
2
  • 1
    \$\begingroup\$ You could pull the spam words from a database or text file. That way you don't have to keep change the code. \$\endgroup\$ Commented Feb 13, 2014 at 22:28
  • \$\begingroup\$ @James yep , that is the next thing , will be pulling strings (banned etc) form text file or so . Next will be wrapping this thing into a class . But what do you thing about the function ? Feasible or need some more changes ? \$\endgroup\$ Commented Feb 14, 2014 at 6:50

1 Answer 1

3
\$\begingroup\$

Minor (and subjective) remarks:

  • Function name: The function does not sniff spam but calculate a score for the given content. Therefore I'd rename the function to something like getSpamScore
  • The variable $matches is not declared. Declaring it before using in in preg_match_all allows IDEs to perform code analysis and other developers don't have a hard time looking for its declaration.
  • Two empty lines just take up space. Your code should be clear enough from structure not to need any extra indentions (which you don't have) or double blank lines. As the preg_match_all is directly related to the if and only to it, I'd not have any lines at all between the match and the if-statement. (This applies to all other lines too)
  • //url is/are present comment: more of a personal thing maybe: I don't like obvious comments. Personally I prefer the if it needs a comment it needs to be refactored way of writing comments (non in about almost every case). Comments need maintenance time (updating comments for changed code). But time should only be spent here if it provides value. Additionally comments tend to loose their relation to the code they originally were attached to (mostly by refactoring). This greatly confuses other developers.
  • $get_number_of_urls naming: this variable contains a verb. Verbs indicate actions (e.g. a method). In PHP I'd expect this variable to be a closur, not an integer. Yet you don't actually get the number or urls, you already have them. A proper name might be number_of_urls instead.
  • Stick to one convention of styling code: in your case this especially applies for where to have whitespaces. In your if and foreach lines you mix up foreach( foreach ( and your if conditions sometimes have whitespaces in between and sometimes not. I suppose this is some kind of work-in-progress code. Yet later styling costs time. Code you write always should meet production ready requirements (regarding style). It takes some time at the beginning to stick with someones conventions, but you'll get used to it and do it automatically after some time.
  • In my opinion this function has too many responsibilities (at least two). I'd suggest splitting this function into two sub-functions. This function itself only calls and sums of the results of the independent tests. When going OO this of course wants a design pattern ;)
answered Feb 14, 2014 at 18:22
\$\endgroup\$
1
  • \$\begingroup\$ Thanks a ton for the remarks . Will be working on those you pointed out ! \$\endgroup\$ Commented Feb 17, 2014 at 4:49

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.