Problem: Write a method that takes in a string. Your method should return the most common letter in the array, and a count of how many times it appears.
Note: I solved it with Ruby, but the solution will be almost the same with other OOP language which have arrays and hashes.
Solution:
Split the string into an array of its characters/letters.
Put each char in a hash as key and starting value = 1.
When you find a character again, increase its value by 1.
Iterate your hash to find the most common character.
My solution works, but, is there a simpler & better way to do it?
What are the disadvantages of my solution?
def most_common_letter(string)
hash = {}
arr = string.split("")
i = 0
ge = "" # the char which occurs the most times in string
max = 0 # number of times ge occurs in string
while i < arr.length
e = arr[i]
if hash.has_key? e
hash[e] += 1
else
hash[e] = 1
end
i = i + 1
end
# Find most common character
hash.each do |k,v|
if(v > max)
max = v
ge = k
end
end
return ge.to_s + ":" + max.to_s
end
#Test
puts most_common_letter("abca")
puts most_common_letter("g")
-
\$\begingroup\$ Welcome to Code Review! Please only use language tags that apply to the language(s) used in your code. We have removed the Java and JavaScript tags. I hope you get some good reviews! \$\endgroup\$Phrancis– Phrancis2015年07月05日 22:51:56 +00:00Commented Jul 5, 2015 at 22:51
-
\$\begingroup\$ @SirPython - Actually, its language agnostic. For me, the algorithm is more important than the language used. \$\endgroup\$stack1– stack12015年07月05日 22:54:01 +00:00Commented Jul 5, 2015 at 22:54
-
\$\begingroup\$ you have chosen to use Ruby, so tag it with Ruby. If you create another question with the same algorithm but use another language, please use the tag for that language. It's just the way we do things around here. Thank you \$\endgroup\$Malachi– Malachi2015年07月05日 22:59:14 +00:00Commented Jul 5, 2015 at 22:59
-
\$\begingroup\$ Thanks. Is there a way to do the same thing with less steps and/or less memory ? Is using hashes an overkill ? \$\endgroup\$stack1– stack12015年07月06日 02:19:46 +00:00Commented Jul 6, 2015 at 2:19
-
\$\begingroup\$ Hashes are exactly what you need. They associate keys with values, and here you need to associate characters with their frequencies. It's the right tool. With anything else (like an array), you'd have to jump through hoops and use it in a hacky way, as it wouldn't have been meant for this use case. \$\endgroup\$FrontierPsycho– FrontierPsycho2015年07月06日 09:44:59 +00:00Commented Jul 6, 2015 at 9:44
5 Answers 5
while i < arr.length
In your loop, we are only accessing the current element and not doing any index arithmetic. Use arr.each{}
.
if hash.has_key? e hash[e] += 1 else hash[e] = 1 end
When we encounter a key, we just want to increment by 1. If a key doesn't exist, use the default value and increment by 1. You can initialize your hash to have a default value of 0 by hash = Hash.new(0)
.
# Find most common character hash.each do |k,v| if(v > max) max = v ge = k end end
This can simply be written using hash.max_by{}
and it does the work for you of returning a key-value pair.
Putting it all together, you get
def most_common_letter(string)
hash = Hash.new(0)
string.split('').each{ |ch| hash[ch] += 1 }
hash.max_by{ |ch,count| count }
end
And if you want to clean it up so its a single chained statement.
def most_common_letter(string)
string.split('')
.each_with_object(Hash.new(0)) { |ch,hash| hash[ch] += 1 }
.max_by{ |ch,count| count }
end
What are the disadvantages of my solution ?
The biggest issue is that it only returns 1 letter/count pair. What happens if you call most_common_letter("aabb")
. Your function returns the pair that is first encountered that sets max
{'a', 2}
rather than {{'a', 2}, {'b', 2}}
. If you want to return all of the max pairs, build your count hash, find the max value, then return all elements from your hash that has that max value using Hash#select{}
.
-
\$\begingroup\$ Nice catch about multiple possible pairs. \$\endgroup\$FrontierPsycho– FrontierPsycho2015年07月06日 09:42:50 +00:00Commented Jul 6, 2015 at 9:42
First off, after doing a little reading, the style for ruby seems say that you should indent your code by two spaces, not four, like Python.
Secondly the line i = i + 1
can be shortened to this: i += 1
.
Rather than using a while
loop, and a variable, i
, as an iterator variable as shown in the below code:
while i < arr.length e = arr[i] if hash.has_key? e hash[e] += 1 else hash[e] = 1 end i = i + 1 end
You could use a for ... in
loop instead. By using a for ... in
loop, you don't need to declare an iterator variable, i
, and it looks cleaner as well.
for i in 0..arr.length
e = arr[i]
if hash.has_key? e
hash[e] += 1
else
hash[e] = 1
end
end
Finally, you could use some better names for your variables. For example, naming a variable name hash
is completely useless because I already know that the variable is a hash. It should have a better name. Same goes for arr
, ge
, and others.
Personally (maybe because of my C++ background) I wouldn't use a hash here. I'd just have an array with 26 integers in it, one per letter, initialized with all 0
s. Arrays are simpler than hashes, and in this case they act equivalently, I'm fairly sure that ought to make it faster if you had very large inputs (though realistically you'd likely be I/O limited anyway in that case).
Then, this:
while i < arr.length
e = arr[i]
if hash.has_key? e
hash[e] += 1
else
hash[e] = 1
end
i = i + 1
end
Would become:
while i < arr.length
hash_that_is_now_an_array[arr[i].ord - 97] += 1 #ord get's the ascii value, google says the ascii of 'a' is 97
i += 1
Or with for each as was suggested by others
arr.each do |c|
hash_that_is_now_an_array[c.ord - 97] += 1
That also gets rid of an if statement. Which also helps algorithm-wise.
NOTE: As Snowhawk04 pointed out, what I wrote assumes your input is all lowercase letters. If not, replace c.ord - 97
with c.ord
and make the array 128 long to handle all of ASCII. Or, say, just add spaces as a special case. Or, if you really want, make it the size of the maximum number of characters in unicode. Whatever applies to what you want to do.
-
1\$\begingroup\$
c.ord - 97
is making an assumption on the input that is not necessarily true. \$\endgroup\$Snowhawk– Snowhawk2015年07月06日 02:50:09 +00:00Commented Jul 6, 2015 at 2:50 -
\$\begingroup\$ True, not sure why I assumed lowercase letters. The algorithmic idea holds though. \$\endgroup\$WanderingMathematician– WanderingMathematician2015年07月06日 05:06:03 +00:00Commented Jul 6, 2015 at 5:06
-
\$\begingroup\$ In general, I think
.ord - 97
is a very bad idea, coming from the C* world. Also, arrays are not "simpler than hashes", they are a different tool for a different purpose, and they are the right tool for this job. \$\endgroup\$FrontierPsycho– FrontierPsycho2015年07月06日 09:53:33 +00:00Commented Jul 6, 2015 at 9:53 -
1\$\begingroup\$ What would your algorithm return for
most_common_letter('µµµ')
? \$\endgroup\$Jörg W Mittag– Jörg W Mittag2015年07月06日 10:33:04 +00:00Commented Jul 6, 2015 at 10:33 -
\$\begingroup\$ @FrontierPsycho Arrays represent memory more closely, though maybe that doesn't actually matter in ruby. OP said language-agnostic anyway. In an array, getting the nth element translates pretty directly into getting something from memory, whereas in a hash you have to call the hashing function and maybe resolve collisions and all sorts of fun stuff. OP asked in one of the comments whether there was a way to do it with less steps/memory - arrays are the way. If there's only 26, or even 128 known possible values in your hash that map to integers directly, hashes are unnecessary overhead. \$\endgroup\$WanderingMathematician– WanderingMathematician2015年07月06日 15:15:13 +00:00Commented Jul 6, 2015 at 15:15
I think there's a number of things the ruby standard library can do for you. This will make the code more concise, and I also believe that as a rule, it tends to be faster (as some things are written in C under the hood - I might be mistaken).
First, instead of a while, you could use #each_char, like this:
string.each_char do |char|
# ...
end
Also, I think it's unnecessary to find the max yourself. You can just use #sort_by (a method of Enumerable
, which is included in Hash
), and sort like this:
hash.sort_by { |char, freq| freq }
This will create an array of arrays like this:
[["c", 2], ["b", 4], ["z", 5]]
and the last element contains the character you want (as sort is, by default, ascending, I believe). You can just retrieve it like this:
(hash.sort_by { |char, freq| freq })[-1]
although you might want to check what happens in cases the input string is empty and so on.
Also, I think it would be better if the method just returned the most frequent character and its frequency as a tuple, rather than as a string, so that you can then use the frequency as a number, without extracting it. The method's purpose is to discover and report the most frequent character, so it should only do that.
Finally, I think variable names could be more descriptive. For example, instead of hash
, you could have frequencies_hash
or just frequencies
. If you follow my suggestions, I don't think you need any more variables.
Let me know if you'd like more complete code. I wrote it like this because I think it's good for people to refactor their own code based on suggestions, rather than copy paste stuff.
Your code does not make use of Ruby capabilities, it is like C++ or Java. I propose this slower but much higher level implementation:
def most_common_letter(string, alpha_string="a".."z")
most_common = string
.split('')
.select{|char| alpha_string.include?(char)}
.max_by{|letter| string.count(letter)}
most_common, string.count(most_common)
end