Filtering data from users

Question 1

I have a small function for filtering data from users. I'd like a review on logical mistakes I've made and if I'm somewhat protected using it.

public static function filtru($str, $filtre = []){
 if(is_array($str)){//verifica daca datele trimise spre curatare sant array
 $str_curat = [];
 foreach($str as $key => $value){//selectam fiecare valoare din array-ul trimis spre curatare
 foreach($filtre as $fitru){//aplicam fiecare filtru, daca se folosesc mai multe
 switch($fitru){
 case 'trim':
 $value = htmlentities(trim(strip_tags($value)), ENT_QUOTES, 'utf-8');
 break;
 case 'int':
 $value = intval($value);
 break;
 case 'nu':
 $text = $value;$striptags = true;
 $search = ["40","41","58","65","66","67","68","69","70",
 "71","72","73","74","75","76","77","78","79","80","81",
 "82","83","84","85","86","87","88","89","90","97","98",
 "99","100","101","102","103","104","105","106","107",
 "108","109","110","111","112","113","114","115","116",
 "117","118","119","120","121","122"
 ];
 $replace = ["(",")",":","a","b","c","d","e","f","g","h",
 "i","j","k","l","m","n","o","p","q","r","s","t","u",
 "v","w","x","y","z","a","b","c","d","e","f","g","h",
 "i","j","k","l","m","n","o","p","q","r","s","t","u",
 "v","w","x","y","z"
 ];
 $entities = count($search);
 for ($i=0; $i < $entities; $i++) {
 $text = preg_replace("#(&\#)(0*".$search[$i]."+);*#si", $replace[$i], $text);
 }
 $text = preg_replace('#(&\#x)([0-9A-F]+);*#si', "", $text);
 $text = preg_replace('#(<[^>]+[/\"\'\s])(onmouseover|onmousedown|onmouseup|onmouseout|onmousemove|onclick|ondblclick|onfocus|onload|xmlns)[^>]*>#iU', ">", $text);
 $text = preg_replace('#([a-z]*)=([\`\'\"]*)script:#iU', '1ドル=2ドルnojscript...', $text);
 $text = preg_replace('#([a-z]*)=([\`\'\"]*)javascript:#iU', '1ドル=2ドルnojavascript...', $text);
 $text = preg_replace('#([a-z]*)=([\'\"]*)vbscript:#iU', '1ドル=2ドルnovbscript...', $text);
 $text = preg_replace('#(<[^>]+)style=([\`\'\"]*).*expression\([^>]*>#iU', "1ドル>", $text);
 $text = preg_replace('#(<[^>]+)style=([\`\'\"]*).*behaviour\([^>]*>#iU', "1ドル>", $text);
 if ($striptags) {
 do {
 $thistext = $text;
 $text = preg_replace('#</*(applet|meta|xml|blink|link|style|script|embed|object|iframe|frame|frameset|ilayer|layer|bgsound|title|base|body)[^>]*>#i', "", $text);
 } while ($thistext != $text);
 }
 $value = $text;
 $value = preg_replace('#(alert|cmd|passthru|eval|exec|expression|system|fopen|fsockopen|file|file_get_contents|file_put_contents|readfile|unlink|shell_exec)(\s*)\((.*?)\)#si', "\1円\2円&#40;\3円&#41;", $value);
 break;
 default:
 $value = $fitru($value);
 }// switch
 }//bucla pt fiecare filtru
 $str_curat[$key] = $value;
 }//bucla pt fiecare valoare
 }else{
 foreach($filtre as $fitru){
 switch($fitru){
 case 'trim':
 $str = htmlentities(trim(strip_tags($str)), ENT_QUOTES, 'utf-8');
 break;
 case 'int':
 $str = intval($str);
 break;
 case 'nu':
 $text = $str;$striptags = true;
 $search = ["40","41","58","65","66","67","68","69","70",
 "71","72","73","74","75","76","77","78","79","80","81",
 "82","83","84","85","86","87","88","89","90","97","98",
 "99","100","101","102","103","104","105","106","107",
 "108","109","110","111","112","113","114","115","116",
 "117","118","119","120","121","122"
 ];
 $replace = ["(",")",":","a","b","c","d","e","f","g","h",
 "i","j","k","l","m","n","o","p","q","r","s","t","u",
 "v","w","x","y","z","a","b","c","d","e","f","g","h",
 "i","j","k","l","m","n","o","p","q","r","s","t","u",
 "v","w","x","y","z"
 ];
 $entities = count($search);
 for ($i=0; $i < $entities; $i++) {
 $text = preg_replace("#(&\#)(0*".$search[$i]."+);*#si", $replace[$i], $text);
 }
 $text = preg_replace('#(&\#x)([0-9A-F]+);*#si', "", $text);
 $text = preg_replace('#(<[^>]+[/\"\'\s])(onmouseover|onmousedown|onmouseup|onmouseout|onmousemove|onclick|ondblclick|onfocus|onload|xmlns)[^>]*>#iU', ">", $text);
 $text = preg_replace('#([a-z]*)=([\`\'\"]*)script:#iU', '1ドル=2ドルnojscript...', $text);
 $text = preg_replace('#([a-z]*)=([\`\'\"]*)javascript:#iU', '1ドル=2ドルnojavascript...', $text);
 $text = preg_replace('#([a-z]*)=([\'\"]*)vbscript:#iU', '1ドル=2ドルnovbscript...', $text);
 $text = preg_replace('#(<[^>]+)style=([\`\'\"]*).*expression\([^>]*>#iU', "1ドル>", $text);
 $text = preg_replace('#(<[^>]+)style=([\`\'\"]*).*behaviour\([^>]*>#iU', "1ドル>", $text);
 if ($striptags) {
 do {
 $thistext = $text;
 $text = preg_replace('#</*(applet|meta|xml|blink|link|style|script|embed|object|iframe|frame|frameset|ilayer|layer|bgsound|title|base|body)[^>]*>#i', "", $text);
 } while ($thistext != $text);
 }
 $str = $text;
 $str = preg_replace('#(alert|cmd|passthru|eval|exec|expression|system|fopen|fsockopen|file|file_get_contents|file_put_contents|readfile|unlink|shell_exec)(\s*)\((.*?)\)#si', "\1円\2円&#40;\3円&#41;", $str);
 break;
 default:
 $str = $fitru($value);
 }// switch
 }//foreach pt filtre
 $str_curat = $str;
 }
return $str_curat;
}//end filtru

Question 2

This function isn't small; by modern standards, it's huge and should be split up.

Question 3

DRY - Write less handle more

You can make your function half the size by designing it to work with scalar vars only, and handling arrays with a loop:

public static function filtru($var) {
 if (is_array($var)) {
 foreach ($var as &$val) {
 $val = static::filtru($val);
 } 
 return $var;
 } 
 // $var is not an array - it is number/string etc.
 ...
 return $var; 
}

In the above example, an array, a scalar or even a nested array can all be handled by the same function.

Use PHP's built in filters

Rather than write your own filters, you're better off using PHP's sanitization filters. For common cases, your code becomes therefore (assuming this is implemented):

$arrayIntOut = Foo::filtru($arrayMixedIn, FILTER_SANITIZE_NUMBER_INT);

But, especially with the above example, it's more work to use the filtru function than it is to just write:

$arrayIntOut = array_map('intval', $arrayMixedIn);

Putting all your logic in one place is not necessarily the best idea, when the alternative is to write concise (and more efficient) code using php's function directly in functions. As such, consider 'just' using PHP instead of wrapping php functions in some more logic.

Make "nu" a separate function

Each case you want to handle should really be a protected specific function. Once you do that, consider deleting any functions that are just one line of code as it's better to just use php's own functions. The 'nu' function however, is likely to warrants it's own function.

It looks like the main purpose is to strip out XSS attacks. Unless you have exhaustive tests for this logic, it is very likely it doesn't cater for all possibilities - therefore make it a separate function and if you haven't already it'd be in your interest to write some tests to at least cover the most common XSS attacks.

Question 4

that's a really long list :D and it's just the common ones

Question 5

actually it's probably quite complete being owasp, but there can never truly be a complete xss list as new attacks are found/invented as time goes by.

Question 6

Your function should be broken into many more functions. Each different type of escaping should be its own function.
You should either use separate functions where the array version calls the non-array version, or youo should try to fold your array and non array logic into one
- $wasArray = is_array($str); if (!$wasArray) { $str = array($str); } foreach ($str as $s) { ... } return ($wasArray) ? $str : array_shift($str);
Your trim method is a lie -- it doesn't just trim. What if you want to trim without html escaping?
If you did leave them all in one method, consider using class constants instead of strings to control the behavior. That's a lot less error prone, and it makes it easier to trace usage through code (and if you rename something later, it will be a million times easier).
What is your nu sanitation doing? It's like it's processing HTML, PHP and JavaScript?
- With HTML, you're typically better off with a whitelist than trying to sanitize, and then escaping everything else.
- With PHP, you're typically better off just not running code that you don't 100% trust
- JS falls into the HTML category. If HTML is whitelisted, it should be impossible for a user to get JS through.

You should consider if you really need all of this. In a lot of situations, it's sufficient to simply escape and not sanitize (though one could argue that escape is a form of sanitizing).

(There are of course situations where you really do have to process the hell out of user input though. Consider the StackExchange questions/answers/comments. They allow a very limited subset of HTML which means that their processor has to be aware of what to escape, what to leave alone.)

Question 7

I will remove each filter as a separate function, i avoid using constants knowing that they are slower. Nice explanation with array_shift i'll use it.I need to rethink all logic. Thank you very much for response

Question 8

First, break your function into pieces. One function should be doing one thing and correctly.
Try to give functions, variables a meaningful name. Here the function name "filteru" doesn't make any sense. same goes for "$str_curat".
you have huge if block
```
if (is_array($str)) {
 // lot of code...
}
```
Instead you can write a simple check at the beginning of the function
```
if (!is_array($str)) { return; } 
// rest of the code goes here...
```
This way you can avoid a huge if block.
At the end of the curly braces you have comments like
```
 }// switch 
```
```
 }//foreach pt filtre 
```
```
 }//end filtru 
```
These are useless.
you have two foreach blocks. one foreach inside another foreach
```
foreach($str as $key => $value) {
 foreach($filtre as $fitru) {
 // code ...
 }
}
```
Try rewriting the logic so that you can avoid this nested loop.

Question 9

The variable names and comments do make sense in romanian.

Question 10

Then It is fine if the variable names make sense in romanian. For the comments, what I wanted to say is that it is useless to comment the end of curly braces the way it is done here - "} //switch", "}//foreach pt filtre" etc

Question 11

I used a poorly text editor when i made the function. It didn't had matching curly braces. I forgot to remove comments I'll rethink the function(logic). str_curat = clean string

AD7six AD7six 1,3769 silver badges11 bronze badges · Accepted Answer · 2012-11-12 10:43:44Z

DRY - Write less handle more

You can make your function half the size by designing it to work with scalar vars only, and handling arrays with a loop:

public static function filtru($var) {
 if (is_array($var)) {
 foreach ($var as &$val) {
 $val = static::filtru($val);
 } 
 return $var;
 } 
 // $var is not an array - it is number/string etc.
 ...
 return $var; 
}

In the above example, an array, a scalar or even a nested array can all be handled by the same function.

Use PHP's built in filters

Rather than write your own filters, you're better off using PHP's sanitization filters. For common cases, your code becomes therefore (assuming this is implemented):

$arrayIntOut = Foo::filtru($arrayMixedIn, FILTER_SANITIZE_NUMBER_INT);

But, especially with the above example, it's more work to use the filtru function than it is to just write:

$arrayIntOut = array_map('intval', $arrayMixedIn);

Putting all your logic in one place is not necessarily the best idea, when the alternative is to write concise (and more efficient) code using php's function directly in functions. As such, consider 'just' using PHP instead of wrapping php functions in some more logic.

Make "nu" a separate function

Each case you want to handle should really be a protected specific function. Once you do that, consider deleting any functions that are just one line of code as it's better to just use php's own functions. The 'nu' function however, is likely to warrants it's own function.

It looks like the main purpose is to strip out XSS attacks. Unless you have exhaustive tests for this logic, it is very likely it doesn't cater for all possibilities - therefore make it a separate function and if you haven't already it'd be in your interest to write some tests to at least cover the most common XSS attacks.

actually it's probably quite complete being owasp, but there can never truly be a complete xss list as new attacks are found/invented as time goes by.

Stack Exchange Network

Filtering data from users

3 Answers 3

DRY - Write less handle more

Use PHP's built in filters

Make "nu" a separate function

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Filtering data from users

3 Answers 3

DRY - Write less handle more

Use PHP's built in filters

Make "nu" a separate function

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions