Some context
I've been tasked with supplying an escaping function to arbitrary CSS values that are entered through a form. The goals and caveats are:
- I know it's bad practice to let users input CSS. Deal with it.
- Data will be injected either to a
style
attribute, or to an external stylesheet. - This is run on PHP 5.1 (Again, I know, deal with it).
- I'm trying to follow this cheat sheet as closely as possible.
The Code
/**
* ord() alternative that works with UTF8 characters
* @param string $c
*
* @return int UTF-8 character code value
*/
function getUTF8CharCode($c) {
$h = ord($c{0});
if ($h <= 0x7F) {
return $h;
} else if ($h < 0xC2) {
return false;
} else if ($h <= 0xDF) {
return ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
} else if ($h <= 0xEF) {
return ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6
| (ord($c{2}) & 0x3F);
} else if ($h <= 0xF4) {
return ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12
| (ord($c{2}) & 0x3F) << 6
| (ord($c{3}) & 0x3F);
} else {
return -1;
}
}
/**
* Escape a single character for CSS context.
* @param $c
* @return string
*/
function escapeCSSCharacter($c) {
return "\\" . base_convert(getUTF8CharCode($c), 10, 16) . " ";
}
/**
* Escape CSS rule
*
* @param string $data The CSS rule
* @param array $immuneChars Array of immune character. These characters will not be escaped.
*
* @return string Escaped string
*/
function escapeCSSValue($data, array $immuneChars = array()) {
$result = "";
for ($i = 0; $i < mb_strlen($data); $i++) {
$currChar = mb_substr($data, $i, 1);
if (getUTF8CharCode($currChar) < 256 && //Character value is less than 256
!preg_match("/^\w$/", $currChar) && //Character is not alphanumeric (underscore is considered safe too)
!in_array($currChar, $immuneChars) //Character is not immune
) {
$result .= escapeCSSCharacter($currChar);
}
else {
$result .= $currChar;
}
}
return $result;
}
Usage
$colorRule = "color: " . escapeCSSValue("#BADA55;}*{display:none;}/*", array("#")) . ";"; //Will be obviously broken, but will not break the rest of the document.
echo $colorRule;
My worries
- Is this a good way to escape CSS? Is this safe? Will this method be impossible to break out of?
- Am I doing this relatively efficiently? Given that the strings I'm going to be encoding are arbitrary, I'm worried about attacks that involve thousands of characters.
Any review will be welcomed.
3 Answers 3
Your code appears to be inconsistent with respect to UTF-8 characters.
Two significant issues I can see are:
You are intending to return an int value, yet you return a
false
for the block between0x80
and0xC2
. In PHP, false is not an int, and0
is also false-ey. Then, in the 'else' block, you return-1
, which is 'true-ey'.I am uncertain that your char ranges are correct. In UTF-8, the invalid blocks are inconsistent with
0xc2
, there are multiple valid UTF-8 encodings in that range. I am not certain you have your conditions right. Is there something I am unaware of?
-
\$\begingroup\$ TBH I found that function on Stack Overflow and arbitrarily checked it on all the ranges in that function. Are you familiar with a better function to do this job? \$\endgroup\$Madara's Ghost– Madara's Ghost2014年06月30日 11:24:53 +00:00Commented Jun 30, 2014 at 11:24
There's not much to say. Overall, the code is small and quite readable.
Just a small nitpick:
if (getUTF8CharCode($currChar) < 256 && //Character value is less than 256
!preg_match("/^\w$/", $currChar) && //Character is not alphanumeric (underscore is considered safe too)
!in_array($currChar, $immuneChars) //Character is not immune
These comments say the what, not the why. They're akin to:
$i++; // increments i by one
Your if
condition should be commented, but not like this.
I can't comment on the security-side of things.
-
2\$\begingroup\$ Code Review is often the right place for "Is this code OK with regards to security?" We've had many security-related questions here already. \$\endgroup\$Simon Forsberg– Simon Forsberg2014年06月30日 11:05:49 +00:00Commented Jun 30, 2014 at 11:05
-
\$\begingroup\$ @SimonAndréForsberg my bad then! Deleted this part. \$\endgroup\$Florian Margaine– Florian Margaine2014年06月30日 12:19:27 +00:00Commented Jun 30, 2014 at 12:19
Don't know if it will be cleaner or not (depends a bit on the length of immuneChars), but you could look into using preg_replace_callback together with Unicode categories in your regular expression (with the u
modifier to enable PCRE_UTF8
). Especially since I see you're already using a regular expression anyways.
If I'm reading your if
correctly, you're trying to escape all characters below 256, that are not word characters or immune characters. Don't know what the immune characters are, but you should be able to get pretty far with the categories. The following should for example escape all control characters and any kind of punctuation.
preg_replace_callback('%\p{C}|\p{P}%u', 'escape_function', $subject);
You should also be able to match specific Unicode characters using \x{1234}
, which would match the unicode character U+1234
.
-
\$\begingroup\$ Immune characters change with context, for example, if I allow hex color, I would add
"#"
to my immune characters. If I allow the value foropacity
I would add'.'
to my immune characters. \$\endgroup\$Madara's Ghost– Madara's Ghost2014年06月30日 13:05:24 +00:00Commented Jun 30, 2014 at 13:05 -
\$\begingroup\$ @MadaraUchiha Right. Shouldn't be too difficult to adjust for that, but I'll leave that as an exercise for the reader :p If the user can only specify values for certain CSS properties though, and not full complete CSS, then I think I'd rather just specify specific allowed patterns for those properties. For example for opacity,
%(\d+\.)?\d+%
, or something like that. The patterns could be reused for other similar properties of course. \$\endgroup\$Svish– Svish2014年07月01日 12:57:22 +00:00Commented Jul 1, 2014 at 12:57