Allowed characters for CSS identifiers

Question 1

What are the (full) valid / allowed ~~(削除) charset (削除ここまで)~~ characters for CSS identifiers id and class?

Is there a regular expression that I can use to validate against? Is it browser agnostic?

Question 2

This question appears to be a duplicate of s.o. Q448981: What characters are valid in CSS class names?

Question 3

possible duplicate of What characters are valid in CSS class names?

Question 4

@mercator: Also voting to close. =)

Question 5

The charset doesn't matter. The allowed characters matters more. Check the CSS specification. Here's a cite of relevance:

In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit, two hyphens, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B\&W\?" or "B26円 W3円F".

Update: As to the regex question, you can find the grammar here:

ident -?{nmstart}{nmchar}*

Which contains of the parts:

nmstart [_a-z]|{nonascii}|{escape}
nmchar [_a-z0-9-]|{nonascii}|{escape}
nonascii [240円-377円]
escape {unicode}|\\[^\r\n\f0-9a-f]
unicode \\{h}{1,6}(\r\n|[ \t\r\n\f])?
h [0-9a-f]

This can be translated to a Java regex as follows (I only added parentheses to parts containing the OR and escaped the backslashes):

String h = "[0-9a-f]";
String unicode = "\\\\{h}{1,6}(\\r\\n|[ \\t\\r\\n\\f])?".replace("{h}", h);
String escape = "({unicode}|\\\\[^\\r\\n\\f0-9a-f])".replace("{unicode}", unicode);
String nonascii = "[\240円-\377円]";
String nmchar = "([_a-z0-9-]|{nonascii}|{escape})".replace("{nonascii}", nonascii).replace("{escape}", escape);
String nmstart = "([_a-z]|{nonascii}|{escape})".replace("{nonascii}", nonascii).replace("{escape}", escape);
String ident = "-?{nmstart}{nmchar}*".replace("{nmstart}", nmstart).replace("{nmchar}", nmchar);
System.out.println(ident); // The full regex.

Update 2: oh, you're more a PHP'er, well I think you can figure how/where to do str_replace?

Question 6

"the identifier "B&W?" may be written as "B\&W\?" or "B26円 W3円F"" - But nobody does that, and I'm glad they don't. :-)

Question 7

THANK YOU! That's just awesome! :D I though it was very limited but didn't knew I could use `` as an escape character. Has anyone ever built a regex to validate the allowed chars?

Question 8

That's perfect, and yes I can figure it out. =) Thanks again!

Question 9

You're welcome. Don't forget to make it case insensitive or to lowercase the identifier beforehand.

Question 10

If I evaluate your Java, I get the following regex pattern:

-?([_a-z]|[\x200-\x377]|(\\[0-9a-f]{1,6}(\r\n|[ \t\r\n\f])?|\\[^\r\n\f0-9a-f]))([_a-z0-9-]|[\x200-\x377]|(\\[0-9a-f]{1,6}(\r\n|[ \t\r\n\f])?|\\[^\r\n\f0-9a-f]) )*

Yet that matches the string "2thisshouldfail" which is not a valid CSS indicator

Question 11

For anyone looking for something a little more turn-key. The full expression, replaced and all, from @BalusC's answer is:

/-?([_a-z]|[240円-377円]|([0-9a-f]{1,6}(\r\n|[ \t\r\n\f])?|[^\r\n\f0-9a-f]))([_a-z0-9-]|[240円-377円]|([0-9a-f]{1,6}(\r\n|[ \t\r\n\f])?|[^\r\n\f0-9a-f]))*/

And using DEFINE, which I find a little more readable:

/(?(DEFINE)
 (?P<h> [0-9a-f] )
 (?P<unicode> (?&h){1,6}(\r\n|[ \t\r\n\f])? )
 (?P<escape> ((?&unicode)|[^\r\n\f0-9a-f])* )
 (?P<nonascii> [240円-377円] )
 (?P<nmchar> ([_a-z0-9-]|(?&nonascii)|(?&escape)) )
 (?P<nmstart> ([_a-z]|(?&nonascii)|(?&escape)) )
 (?P<ident> -?(?&nmstart)(?&nmchar)* )
) (?:
 (?&ident)
)/x

Incidentally, the original regular expression (and @human's contribution) had a few rogue escape characters that allow [ in the name.

Also, it should be noted that the raw regex without, DEFINE, runs about 2x as fast as the DEFINE expression, taking only ~23 steps to identify a single unicode character, while the later takes ~40.

Question 12

This is merely a contribution to @BalusC answer. It is the PHP version of the Java code he provided, I converted it and I thought someone else could find it helpful.

$h = "[0-9a-f]";
$unicode = str_replace( "{h}", $h, "\{h}{1,6}(\r\n|[ \t\r\n\f])?" );
$escape = str_replace( "{unicode}", $unicode, "({unicode}|\[^\r\n\f0-9a-f])");
$nonascii = "[240円-377円]";
$nmchar = str_replace( array( "{nonascii}", "{escape}" ), array( $nonascii, $escape ), "([_a-z0-9-]|{nonascii}|{escape})");
$nmstart = str_replace( array( "{nonascii}", "{escape}" ), array( $nonascii, $escape ), "([_a-z]|{nonascii}|{escape})" );
$ident = str_replace( array( "{nmstart}", "{nmchar}" ), array( $nmstart, $nmchar ), "-?{nmstart}{nmchar}*");
echo $ident; // The full regex.

BalusC BalusC 1.1m377 gold badges3.7k silver badges3.6k bronze badges · Accepted Answer · 2010-05-11 15:41:39Z

The charset doesn't matter. The allowed characters matters more. Check the CSS specification. Here's a cite of relevance:

In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit, two hyphens, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B\&W\?" or "B26円 W3円F".

Update: As to the regex question, you can find the grammar here:

ident -?{nmstart}{nmchar}*

Which contains of the parts:

nmstart [_a-z]|{nonascii}|{escape}
nmchar [_a-z0-9-]|{nonascii}|{escape}
nonascii [240円-377円]
escape {unicode}|\\[^\r\n\f0-9a-f]
unicode \\{h}{1,6}(\r\n|[ \t\r\n\f])?
h [0-9a-f]

This can be translated to a Java regex as follows (I only added parentheses to parts containing the OR and escaped the backslashes):

String h = "[0-9a-f]";
String unicode = "\\\\{h}{1,6}(\\r\\n|[ \\t\\r\\n\\f])?".replace("{h}", h);
String escape = "({unicode}|\\\\[^\\r\\n\\f0-9a-f])".replace("{unicode}", unicode);
String nonascii = "[\240円-\377円]";
String nmchar = "([_a-z0-9-]|{nonascii}|{escape})".replace("{nonascii}", nonascii).replace("{escape}", escape);
String nmstart = "([_a-z]|{nonascii}|{escape})".replace("{nonascii}", nonascii).replace("{escape}", escape);
String ident = "-?{nmstart}{nmchar}*".replace("{nmstart}", nmstart).replace("{nmchar}", nmchar);
System.out.println(ident); // The full regex.

Update 2: oh, you're more a PHP'er, well I think you can figure how/where to do str_replace?

"the identifier "B&W?" may be written as "B\&W\?" or "B26円 W3円F"" - But nobody does that, and I'm glad they don't. :-)
THANK YOU! That's just awesome! :D I though it was very limited but didn't knew I could use `` as an escape character. Has anyone ever built a regex to validate the allowed chars?
That's perfect, and yes I can figure it out. =) Thanks again!
You're welcome. Don't forget to make it case insensitive or to lowercase the identifier beforehand.
If I evaluate your Java, I get the following regex pattern: -?([_a-z]|[\x200-\x377]|(\\[0-9a-f]{1,6}(\r\n|[ \t\r\n\f])?|\\[^\r\n\f0-9a-f]))([_a-z0-9-]|[\x200-\x377]|(\\[0-9a-f]{1,6}(\r\n|[ \t\r\n\f])?|\\[^\r\n\f0-9a-f]) )* Yet that matches the string "2thisshouldfail" which is not a valid CSS indicator

CollectivesTM on Stack Overflow

Allowed characters for CSS identifiers

3 Answers 3

7 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

7 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related