2
\$\begingroup\$

I've written a function for converting strings to upper case. It currently works by replacing each character using a Regex pattern with a hash:

# Special upcase function that handles several Norwegian symbols.
def norwegian_upcase(string)
 string.upcase.gsub(/[æøåäâãàáëêèéïîìíöôõòóüûùúý]/, {
 'æ' => 'Æ',
 'ø' => 'Ø',
 'å' => 'Å',
 'ä' => 'Ä',
 'â' => 'Â',
 'ã' => 'Ã',
 'à' => 'À',
 'á' => 'Á',
 'ë' => 'Ë',
 'ê' => 'Ê',
 'è' => 'È',
 'é' => 'É',
 'ï' => 'Ï',
 'î' => 'Î',
 'ì' => 'Ì',
 'í' => 'Í',
 'ö' => 'Ö',
 'ô' => 'Ô',
 'õ' => 'Õ',
 'ò' => 'Ò',
 'ó' => 'Ó',
 'ü' => 'Ü',
 'û' => 'Û',
 'ù' => 'Ù',
 'ú' => 'Ú',
 'ý' => 'Ý'
 })
end

I have the feeling that it's horribly inefficient to be using Regex like this, and that I should probably be using another method. Can anyone suggest a better way to replace single characters, or is Regex fit for the task?

Johntron
1,1106 silver badges26 bronze badges
asked Jun 5, 2013 at 17:51
\$\endgroup\$
1
  • 1
    \$\begingroup\$ I'd suggest looking over stackoverflow.com/questions/1020568/… which covers .downcase .upcase .capitalize and the gem unicode_utils for i8n \$\endgroup\$ Commented Jun 5, 2013 at 18:25

2 Answers 2

3
\$\begingroup\$

One-to-one substitution is precisely what tr is for.

def norwegian_upcase(string)
 string.upcase.tr('æ-ý','Æ-Ý')
end

That being said, the unicode_utils gem provides a method for this:

def norwegian_upcase(string)
 UnicodeUtils.upcase(string, :no)
end

I don't know Norwegian, but you can supply language subtags :nb (Norwegian Bokmal) or :nn (Norwegian Nynorsk) if they would behave differently than general :no (Norwegian).

answered Jun 6, 2013 at 12:05
\$\endgroup\$
1
  • \$\begingroup\$ I tried UnicodeUtils without success, it simply doesn't upcase æøå to ÆØÅ no matter what language i pass to it. I tried :en, :no, :nb and :nn. I haven't tried tr, but I'll check that out. \$\endgroup\$ Commented Jun 6, 2013 at 13:58
0
\$\begingroup\$

After a cursory glance at the character codes for these, it looks like the lowercase is 32 (decimal) higher than uppercase. e.g. 'é'.ord - 32 == 'É'.ord

You could try something like this:

string.upcase.gsub(/[æøåäâãàáëêèéïîìíöôõòóüûùúý]/){|s| (s.ord-32).chr(Encoding::UTF_8)}
answered Jun 5, 2013 at 19:13
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.