I have a small function for filtering data from users. I'd like a review on logical mistakes I've made and if I'm somewhat protected using it.
public static function filtru($str, $filtre = []){
if(is_array($str)){//verifica daca datele trimise spre curatare sant array
$str_curat = [];
foreach($str as $key => $value){//selectam fiecare valoare din array-ul trimis spre curatare
foreach($filtre as $fitru){//aplicam fiecare filtru, daca se folosesc mai multe
switch($fitru){
case 'trim':
$value = htmlentities(trim(strip_tags($value)), ENT_QUOTES, 'utf-8');
break;
case 'int':
$value = intval($value);
break;
case 'nu':
$text = $value;$striptags = true;
$search = ["40","41","58","65","66","67","68","69","70",
"71","72","73","74","75","76","77","78","79","80","81",
"82","83","84","85","86","87","88","89","90","97","98",
"99","100","101","102","103","104","105","106","107",
"108","109","110","111","112","113","114","115","116",
"117","118","119","120","121","122"
];
$replace = ["(",")",":","a","b","c","d","e","f","g","h",
"i","j","k","l","m","n","o","p","q","r","s","t","u",
"v","w","x","y","z","a","b","c","d","e","f","g","h",
"i","j","k","l","m","n","o","p","q","r","s","t","u",
"v","w","x","y","z"
];
$entities = count($search);
for ($i=0; $i < $entities; $i++) {
$text = preg_replace("#(&\#)(0*".$search[$i]."+);*#si", $replace[$i], $text);
}
$text = preg_replace('#(&\#x)([0-9A-F]+);*#si', "", $text);
$text = preg_replace('#(<[^>]+[/\"\'\s])(onmouseover|onmousedown|onmouseup|onmouseout|onmousemove|onclick|ondblclick|onfocus|onload|xmlns)[^>]*>#iU', ">", $text);
$text = preg_replace('#([a-z]*)=([\`\'\"]*)script:#iU', '1ドル=2ドルnojscript...', $text);
$text = preg_replace('#([a-z]*)=([\`\'\"]*)javascript:#iU', '1ドル=2ドルnojavascript...', $text);
$text = preg_replace('#([a-z]*)=([\'\"]*)vbscript:#iU', '1ドル=2ドルnovbscript...', $text);
$text = preg_replace('#(<[^>]+)style=([\`\'\"]*).*expression\([^>]*>#iU', "1ドル>", $text);
$text = preg_replace('#(<[^>]+)style=([\`\'\"]*).*behaviour\([^>]*>#iU', "1ドル>", $text);
if ($striptags) {
do {
$thistext = $text;
$text = preg_replace('#</*(applet|meta|xml|blink|link|style|script|embed|object|iframe|frame|frameset|ilayer|layer|bgsound|title|base|body)[^>]*>#i', "", $text);
} while ($thistext != $text);
}
$value = $text;
$value = preg_replace('#(alert|cmd|passthru|eval|exec|expression|system|fopen|fsockopen|file|file_get_contents|file_put_contents|readfile|unlink|shell_exec)(\s*)\((.*?)\)#si', "\1円\2円(\3円)", $value);
break;
default:
$value = $fitru($value);
}// switch
}//bucla pt fiecare filtru
$str_curat[$key] = $value;
}//bucla pt fiecare valoare
}else{
foreach($filtre as $fitru){
switch($fitru){
case 'trim':
$str = htmlentities(trim(strip_tags($str)), ENT_QUOTES, 'utf-8');
break;
case 'int':
$str = intval($str);
break;
case 'nu':
$text = $str;$striptags = true;
$search = ["40","41","58","65","66","67","68","69","70",
"71","72","73","74","75","76","77","78","79","80","81",
"82","83","84","85","86","87","88","89","90","97","98",
"99","100","101","102","103","104","105","106","107",
"108","109","110","111","112","113","114","115","116",
"117","118","119","120","121","122"
];
$replace = ["(",")",":","a","b","c","d","e","f","g","h",
"i","j","k","l","m","n","o","p","q","r","s","t","u",
"v","w","x","y","z","a","b","c","d","e","f","g","h",
"i","j","k","l","m","n","o","p","q","r","s","t","u",
"v","w","x","y","z"
];
$entities = count($search);
for ($i=0; $i < $entities; $i++) {
$text = preg_replace("#(&\#)(0*".$search[$i]."+);*#si", $replace[$i], $text);
}
$text = preg_replace('#(&\#x)([0-9A-F]+);*#si', "", $text);
$text = preg_replace('#(<[^>]+[/\"\'\s])(onmouseover|onmousedown|onmouseup|onmouseout|onmousemove|onclick|ondblclick|onfocus|onload|xmlns)[^>]*>#iU', ">", $text);
$text = preg_replace('#([a-z]*)=([\`\'\"]*)script:#iU', '1ドル=2ドルnojscript...', $text);
$text = preg_replace('#([a-z]*)=([\`\'\"]*)javascript:#iU', '1ドル=2ドルnojavascript...', $text);
$text = preg_replace('#([a-z]*)=([\'\"]*)vbscript:#iU', '1ドル=2ドルnovbscript...', $text);
$text = preg_replace('#(<[^>]+)style=([\`\'\"]*).*expression\([^>]*>#iU', "1ドル>", $text);
$text = preg_replace('#(<[^>]+)style=([\`\'\"]*).*behaviour\([^>]*>#iU', "1ドル>", $text);
if ($striptags) {
do {
$thistext = $text;
$text = preg_replace('#</*(applet|meta|xml|blink|link|style|script|embed|object|iframe|frame|frameset|ilayer|layer|bgsound|title|base|body)[^>]*>#i', "", $text);
} while ($thistext != $text);
}
$str = $text;
$str = preg_replace('#(alert|cmd|passthru|eval|exec|expression|system|fopen|fsockopen|file|file_get_contents|file_put_contents|readfile|unlink|shell_exec)(\s*)\((.*?)\)#si', "\1円\2円(\3円)", $str);
break;
default:
$str = $fitru($value);
}// switch
}//foreach pt filtre
$str_curat = $str;
}
return $str_curat;
}//end filtru
-
3\$\begingroup\$ This function isn't small; by modern standards, it's huge and should be split up. \$\endgroup\$Adam– Adam2012年11月11日 23:12:47 +00:00Commented Nov 11, 2012 at 23:12
3 Answers 3
DRY - Write less handle more
You can make your function half the size by designing it to work with scalar vars only, and handling arrays with a loop:
public static function filtru($var) {
if (is_array($var)) {
foreach ($var as &$val) {
$val = static::filtru($val);
}
return $var;
}
// $var is not an array - it is number/string etc.
...
return $var;
}
In the above example, an array, a scalar or even a nested array can all be handled by the same function.
Use PHP's built in filters
Rather than write your own filters, you're better off using PHP's sanitization filters. For common cases, your code becomes therefore (assuming this is implemented):
$arrayIntOut = Foo::filtru($arrayMixedIn, FILTER_SANITIZE_NUMBER_INT);
But, especially with the above example, it's more work to use the filtru function than it is to just write:
$arrayIntOut = array_map('intval', $arrayMixedIn);
Putting all your logic in one place is not necessarily the best idea, when the alternative is to write concise (and more efficient) code using php's function directly in functions. As such, consider 'just' using PHP instead of wrapping php functions in some more logic.
Make "nu" a separate function
Each case you want to handle should really be a protected specific function. Once you do that, consider deleting any functions that are just one line of code as it's better to just use php's own functions. The 'nu' function however, is likely to warrants it's own function.
It looks like the main purpose is to strip out XSS attacks. Unless you have exhaustive tests for this logic, it is very likely it doesn't cater for all possibilities - therefore make it a separate function and if you haven't already it'd be in your interest to write some tests to at least cover the most common XSS attacks.
-
\$\begingroup\$ that's a really long list :D and it's just the common ones \$\endgroup\$danutz0501– danutz05012012年11月12日 11:04:00 +00:00Commented Nov 12, 2012 at 11:04
-
\$\begingroup\$ actually it's probably quite complete being owasp, but there can never truly be a complete xss list as new attacks are found/invented as time goes by. \$\endgroup\$AD7six– AD7six2012年11月12日 11:47:50 +00:00Commented Nov 12, 2012 at 11:47
- Your function should be broken into many more functions. Each different type of escaping should be its own function.
- You should either use separate functions where the array version calls the non-array version, or youo should try to fold your array and non array logic into one
$wasArray = is_array($str); if (!$wasArray) { $str = array($str); } foreach ($str as $s) { ... } return ($wasArray) ? $str : array_shift($str);
- Your
trim
method is a lie -- it doesn't just trim. What if you want to trim without html escaping? - If you did leave them all in one method, consider using class constants instead of strings to control the behavior. That's a lot less error prone, and it makes it easier to trace usage through code (and if you rename something later, it will be a million times easier).
- What is your
nu
sanitation doing? It's like it's processing HTML, PHP and JavaScript?- With HTML, you're typically better off with a whitelist than trying to sanitize, and then escaping everything else.
- With PHP, you're typically better off just not running code that you don't 100% trust
- JS falls into the HTML category. If HTML is whitelisted, it should be impossible for a user to get JS through.
You should consider if you really need all of this. In a lot of situations, it's sufficient to simply escape and not sanitize (though one could argue that escape is a form of sanitizing).
(There are of course situations where you really do have to process the hell out of user input though. Consider the StackExchange questions/answers/comments. They allow a very limited subset of HTML which means that their processor has to be aware of what to escape, what to leave alone.)
-
\$\begingroup\$ I will remove each filter as a separate function, i avoid using constants knowing that they are slower. Nice explanation with array_shift i'll use it.I need to rethink all logic. Thank you very much for response \$\endgroup\$danutz0501– danutz05012012年11月12日 10:43:28 +00:00Commented Nov 12, 2012 at 10:43
- First, break your function into pieces. One function should be doing one thing and correctly.
- Try to give functions, variables a meaningful name. Here the function name "filteru" doesn't make any sense. same goes for "$str_curat".
- you have huge if block
if (is_array($str)) { // lot of code... }
Instead you can write a simple check at the beginning of the functionif (!is_array($str)) { return; } // rest of the code goes here...
This way you can avoid a huge if block. - At the end of the curly braces you have comments like
}// switch
}//foreach pt filtre
}//end filtru
These are useless. - you have two foreach blocks. one foreach inside another foreach
foreach($str as $key => $value) { foreach($filtre as $fitru) { // code ... } }
Try rewriting the logic so that you can avoid this nested loop.
-
1\$\begingroup\$ The variable names and comments do make sense in romanian. \$\endgroup\$Diego Agulló– Diego Agulló2012年11月12日 07:18:29 +00:00Commented Nov 12, 2012 at 7:18
-
1\$\begingroup\$ Then It is fine if the variable names make sense in romanian. For the comments, what I wanted to say is that it is useless to comment the end of curly braces the way it is done here - "} //switch", "}//foreach pt filtre" etc \$\endgroup\$Kinjal– Kinjal2012年11月12日 07:30:15 +00:00Commented Nov 12, 2012 at 7:30
-
\$\begingroup\$ I used a poorly text editor when i made the function. It didn't had matching curly braces. I forgot to remove comments I'll rethink the function(logic). str_curat = clean string \$\endgroup\$danutz0501– danutz05012012年11月12日 10:50:49 +00:00Commented Nov 12, 2012 at 10:50