2
\$\begingroup\$

I needed to create a method that would sanitize DOM IDs based on the HTML 4 criteria (yeah, HTML 5 is a lot looser). Does this make sense? Did I get too cute with making it concise? Am I totally misinterpreting what a DOM id is? I presumed it meant something like <p id="annoying_paragraph"></p>.

def sanitize_dom_id(candidate_id)
 #The HTML 4.01 spec states that ID tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), 
 #hyphens (-), underscores (_), colons (:), and periods (.).
 prefix = candidate_id.slice!(0)
 #replace invalid prefix with Z_
 prefix = "Z_" if [/[a-zA-Z]/].nil? 
 #replace invalid internal characters with underscore "_"
 candidate_id.gsub!(/[^a-zA-Z0-9\.:_-]/,"_")
 "#{prefix}#{candidate_id}"
end

Sample input and output:

sanitize_dom_id("1htmlid")
Should result with "Z_htmlid"
sanitize_dom_id("html id")
Should result with "html_id"
200_success
146k22 gold badges190 silver badges479 bronze badges
asked Mar 5, 2012 at 21:49
\$\endgroup\$
4
  • \$\begingroup\$ are you using Rails? You know there is such helpers in Rails? \$\endgroup\$ Commented Mar 6, 2012 at 14:05
  • \$\begingroup\$ @jipiboily.com Can you link to them? \$\endgroup\$ Commented Mar 7, 2012 at 13:44
  • \$\begingroup\$ There is this exact method in Rails: sanitize_dom_id \$\endgroup\$ Commented Mar 7, 2012 at 18:08
  • 1
    \$\begingroup\$ @Cygal The helper he mentioned just returns the same id entered. I am working on the TODO to add in the cleansing process to the helper he brings up \$\endgroup\$ Commented Mar 7, 2012 at 18:44

2 Answers 2

2
\$\begingroup\$

Here is my implementation, the early return and extra variable are probably a matter of taste. That early return doesn't feel idiomatic.

It attempts to remove invalid characters at the start of the candidate_id, until it finds valid ones. It will only prefix if it can't find a valid id somewhere in the candidate.

def sanitize_dom_id(candidate_id)
 # Replace non-ascii chars with an ascii version
 # See ActiveSupport::Inflector#transliterate
 sanitized_id = transliterate(candidate_id)
 # Replace invalid characters with underscore "_"
 sanitized_id.gsub!(/[^a-zA-Z0-9\.:_-]/,"_")
 # Remove invalid (non Alpha) leading characters 
 valid_id = sanitized_id.gsub(/^[^a-zA-Z]+/, '')
 return valid_id unless valid_id.empty?
 # Prefix the ID with a known valid prefix.
 "n_" + sanitized_id
end

Example output.

"100-1012foo foo-bar f-=-=- ba9 --dash 9999 66-66".split.map {|id| sanitize_dom_id id }
>> ["foo", "foo-bar", "f-_-_-", "ba9", "dash", "n9999", "n66-66"]
answered Mar 13, 2012 at 12:13
\$\endgroup\$
4
  • \$\begingroup\$ About the early return: why don't you declare a prefix like it has been done in the question? \$\endgroup\$ Commented Mar 13, 2012 at 12:30
  • \$\begingroup\$ I could probably replace the last 2 lines with a ternary operator, but I dont think it would be as clear. valid_id.empty? ? "n_" + sanitized_id : valid_id \$\endgroup\$ Commented Mar 13, 2012 at 13:22
  • \$\begingroup\$ What about the same thing with if/else? :) \$\endgroup\$ Commented Mar 13, 2012 at 13:28
  • \$\begingroup\$ On second thought, I think I prefer the early return, but that it also a matter of taste. Using if/else to determine a return value, but without an explicit return always throws me. \$\endgroup\$ Commented Mar 15, 2012 at 0:35
0
\$\begingroup\$

I don't believe that your 'Z_' prefix replacement works: [/[a-zA-Z]/] is an array consisting of one Regexp, and that array is never nil.

Your implementation using #slice! is tricky to follow. I suggest writing it this way:

def sanitize_dom_id(name)
 # http://www.w3.org/TR/html4/types.html#type-name
 #
 # ID and NAME tokens must begin with a letter ([A-Za-z]) and
 # may be followed by any number of letters, digits ([0-9]),
 # hyphens ("-"), underscores ("_"), colons (":"), and periods (".").
 name.sub(/\A[A-Z]/i, 'Z_').gsub(/[^A-Z0-9.:_-]/i, '_')
end

Note that inside a character class, . is taken literally and does not need to be preceded by a backslash.

You haven't explained why you need this function, which makes it hard to assess whether it is suitably designed. I'd like to point out that it is possible for multiple input strings to map to the same sanitized output. That might be an undesirable property — in which case you should think of this more as an escaping function than a sanitizing function.

answered Dec 3, 2015 at 1:59
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.