I've written a function for converting strings to upper case. It currently works by replacing each character using a Regex pattern with a hash:
# Special upcase function that handles several Norwegian symbols.
def norwegian_upcase(string)
string.upcase.gsub(/[æøåäâãàáëêèéïîìíöôõòóüûùúý]/, {
'æ' => 'Æ',
'ø' => 'Ø',
'å' => 'Å',
'ä' => 'Ä',
'â' => 'Â',
'ã' => 'Ã',
'à' => 'À',
'á' => 'Á',
'ë' => 'Ë',
'ê' => 'Ê',
'è' => 'È',
'é' => 'É',
'ï' => 'Ï',
'î' => 'Î',
'ì' => 'Ì',
'í' => 'Í',
'ö' => 'Ö',
'ô' => 'Ô',
'õ' => 'Õ',
'ò' => 'Ò',
'ó' => 'Ó',
'ü' => 'Ü',
'û' => 'Û',
'ù' => 'Ù',
'ú' => 'Ú',
'ý' => 'Ý'
})
end
I have the feeling that it's horribly inefficient to be using Regex like this, and that I should probably be using another method. Can anyone suggest a better way to replace single characters, or is Regex fit for the task?
-
1\$\begingroup\$ I'd suggest looking over stackoverflow.com/questions/1020568/… which covers .downcase .upcase .capitalize and the gem unicode_utils for i8n \$\endgroup\$JustinC– JustinC2013年06月05日 18:25:57 +00:00Commented Jun 5, 2013 at 18:25
2 Answers 2
One-to-one substitution is precisely what tr
is for.
def norwegian_upcase(string)
string.upcase.tr('æ-ý','Æ-Ý')
end
That being said, the unicode_utils gem provides a method for this:
def norwegian_upcase(string)
UnicodeUtils.upcase(string, :no)
end
I don't know Norwegian, but you can supply language subtags :nb
(Norwegian Bokmal) or :nn
(Norwegian Nynorsk) if they would behave differently than general :no
(Norwegian).
-
\$\begingroup\$ I tried UnicodeUtils without success, it simply doesn't upcase
æøå
toÆØÅ
no matter what language i pass to it. I tried:en
,:no
,:nb
and:nn
. I haven't tried tr, but I'll check that out. \$\endgroup\$Hubro– Hubro2013年06月06日 13:58:03 +00:00Commented Jun 6, 2013 at 13:58
After a cursory glance at the character codes for these, it looks like the lowercase is 32 (decimal) higher than uppercase. e.g. 'é'.ord - 32 == 'É'.ord
You could try something like this:
string.upcase.gsub(/[æøåäâãàáëêèéïîìíöôõòóüûùúý]/){|s| (s.ord-32).chr(Encoding::UTF_8)}