The following page simulates XSS attacks and successfully (?) prevents them. I want to know if I've missed any other major attack vectors (or small ones) and/or if anyone has suggestions as to improving my escaping methods.
There are three contexts in which I attack:
- direct PHP echo into an HTML tag
- Attack: an HTML element with an onclick
- Solution: convert everything inside the tag into its HTML entity code
- inside a JavaScript string
- Attack: closing script tag followed by arbitrary code
- Solution: escape quotes as well as forward slashes
- setting innerHTML with JavaScript
- Attack: an HTML element with an onclick
- Solution: convert to HTML entities when setting innerHTML (unless it's supposed to be an element)
- inside an onclick attribute
- Attack: closing quote and adding another bit of code
- Solution: convert into HTML entities
To implement the solutions detailed above, I made three functions (2 in PHP, 1 in JS):
escapeForJSString
: escapes anything that would break out of the string, as well as forward slashes, to prevent closing script tagsescapeHTMLSpecialChars
: a wrapper aroundhtmlspecialchars
that escapes single quotesescapeForInnerHTML
: basically does whathtmlspecialchars
does in PHP but in JS
xss-attack-prevention.php:
<?php
require_once "xss-attack-prevention-helpers.php";
$userInput1 = "</script><script>alert('y0uv3b33nh4ck3d');</script><script>\"";
$userInput2 = "<div onclick=\"alert('such l33t!')\"></div>";
$userInput3 = "\"); alert(\"pwn3d!\\\")";
?>
<!DOCTYPE html>
<html>
<head>
<meta charset='utf-8'>
<title>XSS Attack Prevention</title>
<script type="text/javascript" src="xss-attack-prevention-helpers.js"></script>
<script type="text/javascript">
var n00bz = "<?= escapeForJSString($userInput1) ?>";
var l33t = "<?= escapeForJSString($userInput2) ?>";
</script>
</head>
<body>
<div id='div1'><?= escapeHTMLSpecialChars($userInput1) ?></div>
<div id='div2'
onclick='alert("<?= escapeHTMLSpecialChars(escapeForJSString($userInput3)) ?>")'
>Click me to see a message!</div>
<div id='div3' style='border: 1px solid black;'></div>
<script type="text/javascript">
document.getElementById('div3').innerHTML = "Click me and nothing will happen!"
+ escapeForInnerHTML(l33t);
</script>
</body>
</html>
xss-attack-prevention-helpers.php:
<?php
// returns false if $var is not a string
// otherwise, returns a string with \n, ', ", ,円 0,円 / (to prevent XSS) escaped
function escapeForJSString($var) {
if(!is_string($var)) return false;
return str_replace(
"/" // escape forward slash to prevent XSS
, "\\/"
, str_replace(
"\n" // escape newline
, "\\n"
, addslashes($var) // escape sq, dq, backslash, and null byte
)
);
}
// wrapper for htmlspecialchars so we escape single and double quotes
function escapeHTMLSpecialChars($text="") {
return htmlspecialchars($text, ENT_QUOTES | ENT_HTML5, 'UTF-8', true);
}
?>
xss-attack-prevention-helpers.js:
function getType(val) { return Object.prototype.toString.call(val).slice(8, -1).toLowerCase(); }
function isString(val) { return getType(val) === "string"; }
function isArray(val) { return getType(val) === "array"; }
// returns false on unexpected input
// input must be
// - text : a string
// - replacements : an array of arrays, where each subarray contains 2 strings
function replaceAll(text, reps) {
if(!isString(text) || !isArray(reps)) return false;
for(var i = 0; i < reps.length; i++) {
if(!isArray(reps[i]) || reps[i].length !== 2 || !isString(reps[i][0])
|| !isString(reps[i][1])
) return false;
text = text.split(reps[i][0]).join(reps[i][1]);
}
return text;
}
function escapeForInnerHTML(text) {
if(!isString(text)) return false;
return replaceAll(text, [["&", "&"], ["<", "<"],
[">", ">"], ["/", "/"], ["\"", """], ["'", "'"]]
);
}
2 Answers 2
There are already native functions to escape for HTML and JS strings: htmlspecialchars()
and json_encode()
. See this related question on Stack Overflow
As for innerHTML
, simply don't use it. Use textContent
instead. If you wish to allow for formatting (for example, in comments or posts), I recommend Markdown
-
\$\begingroup\$ Regarding
htmlspecialchars
, you'll see in my code that it's simply a wrapper for htmlspecialchars, setting the ENT_QUOTES option so I don't forget. \$\endgroup\$AmadeusDrZaius– AmadeusDrZaius2014年12月01日 15:19:48 +00:00Commented Dec 1, 2014 at 15:19
My other answer explains the general best practice, let's go over your code.
You can pass an array to
str_replace()
so that it replaces every occurence of matching substrings with their replacement array counterparts with the same index:return str_replace(["/", "\n"], ["\\/", "\\n"], $subject)
You aren't escaping
;
for JavaScript strings, that can be used to break out of the context.You shouldn't really care about JavaScript replacement. Data comes from the server, that's where all escaping should be.
ES5 has the
Array.isArray()
static method to check if a given parameter is an array. For strings,typeof str
will returnstring
. So your getType function is a bit redundant:function isString(str) { return typeof str === 'string'; } function isArray(arr) { return Array.isArray(arr); }
Also, ES5 has
Array.prototype.forEach
for iterating over an array, and is considered a better alternative tofor
.
-
\$\begingroup\$ I have a couple questions.
You aren't escaping ; for JavaScript strings, that can be used to break out of the context.
Can you give an example of this? I don't see how that's true.So your getType function is a bit redundant
Won't theArray.isArray
fail on inner frames? \$\endgroup\$AmadeusDrZaius– AmadeusDrZaius2014年12月01日 15:22:28 +00:00Commented Dec 1, 2014 at 15:22 -
\$\begingroup\$
var variable = <?= jsEscape($valueFromPHP); ?>;
In this instance,;
followed by JS code will allow you to execute any JS code you want. \$\endgroup\$Madara's Ghost– Madara's Ghost2014年12月01日 15:23:50 +00:00Commented Dec 1, 2014 at 15:23 -
\$\begingroup\$ But not when there are quotes around it, right? \$\endgroup\$AmadeusDrZaius– AmadeusDrZaius2014年12月01日 17:12:15 +00:00Commented Dec 1, 2014 at 17:12
-
1\$\begingroup\$ @AmadeusDrZaius correct, with quotes the only thing that can break through is the ending quote or
</script>
. If you have both escaped, you're clear. \$\endgroup\$Madara's Ghost– Madara's Ghost2014年12月01日 17:45:05 +00:00Commented Dec 1, 2014 at 17:45
Explore related questions
See similar questions with these tags.
Except for alphanumeric characters, escape all characters less than 256 with the \xHH format to prevent switching out of the data value into the script context or into another attribute
. \$\endgroup\$onclicks
- even if the data is properly escaped you are still vulnerable. See the bit in my link where it saysEVEN IF YOU ESCAPE UNTRUSTED DATA YOU ARE XSSED HERE
for more info. \$\endgroup\$window.setInterval
- an outdated syntax in the first place. The bit about escaping all characters less than\xHH
seems like overkill to me unless there is a good reason for it. \$\endgroup\$setInterval
can take a string that will be evaluated, much likeonclick
so the same rule applies. Hex entity escaping (\xHH
) is necessary and there are very good reasons for doing so. This will stop the context being changed back to HTML by ending a script tag with</script>
, for example. It can also stop items such as CDATA closing tags changing the context. You mentioned just escaping forward slashes (it is back slashes that are more important), but if you don't encode properly an attacker will find a way round. \$\endgroup\$