Most common letter in string

Question 1

Problem: Write a method that takes in a string. Your method should return the most common letter in the array, and a count of how many times it appears.

Note: I solved it with Ruby, but the solution will be almost the same with other OOP language which have arrays and hashes.

Solution:

Split the string into an array of its characters/letters.
Put each char in a hash as key and starting value = 1.
When you find a character again, increase its value by 1.
Iterate your hash to find the most common character.

My solution works, but, is there a simpler & better way to do it?

What are the disadvantages of my solution?

def most_common_letter(string)
 hash = {}
 arr = string.split("")
 i = 0
 ge = "" # the char which occurs the most times in string
 max = 0 # number of times ge occurs in string
 while i < arr.length
 e = arr[i]
 if hash.has_key? e
 hash[e] += 1
 else
 hash[e] = 1
 end
 i = i + 1
 end
 # Find most common character
 hash.each do |k,v|
 if(v > max)
 max = v
 ge = k
 end
 end
 return ge.to_s + ":" + max.to_s
end
#Test
puts most_common_letter("abca")
puts most_common_letter("g")

Question 2

Welcome to Code Review! Please only use language tags that apply to the language(s) used in your code. We have removed the Java and JavaScript tags. I hope you get some good reviews!

Question 3

@SirPython - Actually, its language agnostic. For me, the algorithm is more important than the language used.

Question 4

you have chosen to use Ruby, so tag it with Ruby. If you create another question with the same algorithm but use another language, please use the tag for that language. It's just the way we do things around here. Thank you

Question 5

Thanks. Is there a way to do the same thing with less steps and/or less memory ? Is using hashes an overkill ?

Question 6

Hashes are exactly what you need. They associate keys with values, and here you need to associate characters with their frequencies. It's the right tool. With anything else (like an array), you'd have to jump through hoops and use it in a hacky way, as it wouldn't have been meant for this use case.

Question 7

 while i < arr.length

In your loop, we are only accessing the current element and not doing any index arithmetic. Use arr.each{}.

 if hash.has_key? e
 hash[e] += 1
 else
 hash[e] = 1
 end

When we encounter a key, we just want to increment by 1. If a key doesn't exist, use the default value and increment by 1. You can initialize your hash to have a default value of 0 by hash = Hash.new(0).

# Find most common character
hash.each do |k,v|
 if(v > max)
 max = v
 ge = k
 end
end

This can simply be written using hash.max_by{} and it does the work for you of returning a key-value pair.

Putting it all together, you get

def most_common_letter(string)
 hash = Hash.new(0)
 string.split('').each{ |ch| hash[ch] += 1 }
 hash.max_by{ |ch,count| count }
end

And if you want to clean it up so its a single chained statement.

def most_common_letter(string)
 string.split('')
 .each_with_object(Hash.new(0)) { |ch,hash| hash[ch] += 1 }
 .max_by{ |ch,count| count }
end

What are the disadvantages of my solution ?

The biggest issue is that it only returns 1 letter/count pair. What happens if you call most_common_letter("aabb"). Your function returns the pair that is first encountered that sets max {'a', 2} rather than {{'a', 2}, {'b', 2}}. If you want to return all of the max pairs, build your count hash, find the max value, then return all elements from your hash that has that max value using Hash#select{}.

Question 8

Nice catch about multiple possible pairs.

Question 9

First off, after doing a little reading, the style for ruby seems say that you should indent your code by two spaces, not four, like Python.

Secondly the line i = i + 1 can be shortened to this: i += 1.

Rather than using a while loop, and a variable, i, as an iterator variable as shown in the below code:

while i < arr.length
 e = arr[i]
 if hash.has_key? e
 hash[e] += 1
 else
 hash[e] = 1
 end
 i = i + 1
end

You could use a for ... in loop instead. By using a for ... in loop, you don't need to declare an iterator variable, i, and it looks cleaner as well.

for i in 0..arr.length
 e = arr[i]
 if hash.has_key? e
 hash[e] += 1
 else
 hash[e] = 1
 end
end

Finally, you could use some better names for your variables. For example, naming a variable name hash is completely useless because I already know that the variable is a hash. It should have a better name. Same goes for arr, ge, and others.

Question 10

Personally (maybe because of my C++ background) I wouldn't use a hash here. I'd just have an array with 26 integers in it, one per letter, initialized with all 0s. Arrays are simpler than hashes, and in this case they act equivalently, I'm fairly sure that ought to make it faster if you had very large inputs (though realistically you'd likely be I/O limited anyway in that case).

Then, this:

while i < arr.length
 e = arr[i]
 if hash.has_key? e
 hash[e] += 1
 else
 hash[e] = 1
 end
 i = i + 1
 end

Would become:

while i < arr.length
 hash_that_is_now_an_array[arr[i].ord - 97] += 1 #ord get's the ascii value, google says the ascii of 'a' is 97
 i += 1

Or with for each as was suggested by others

arr.each do |c|
 hash_that_is_now_an_array[c.ord - 97] += 1

That also gets rid of an if statement. Which also helps algorithm-wise.

NOTE: As Snowhawk04 pointed out, what I wrote assumes your input is all lowercase letters. If not, replace c.ord - 97 with c.ord and make the array 128 long to handle all of ASCII. Or, say, just add spaces as a special case. Or, if you really want, make it the size of the maximum number of characters in unicode. Whatever applies to what you want to do.

Question 11

c.ord - 97 is making an assumption on the input that is not necessarily true.

Question 12

True, not sure why I assumed lowercase letters. The algorithmic idea holds though.

Question 13

In general, I think .ord - 97 is a very bad idea, coming from the C* world. Also, arrays are not "simpler than hashes", they are a different tool for a different purpose, and they are the right tool for this job.

Question 14

What would your algorithm return for most_common_letter('µµµ')?

Question 15

@FrontierPsycho Arrays represent memory more closely, though maybe that doesn't actually matter in ruby. OP said language-agnostic anyway. In an array, getting the nth element translates pretty directly into getting something from memory, whereas in a hash you have to call the hashing function and maybe resolve collisions and all sorts of fun stuff. OP asked in one of the comments whether there was a way to do it with less steps/memory - arrays are the way. If there's only 26, or even 128 known possible values in your hash that map to integers directly, hashes are unnecessary overhead.

Question 16

I think there's a number of things the ruby standard library can do for you. This will make the code more concise, and I also believe that as a rule, it tends to be faster (as some things are written in C under the hood - I might be mistaken).

First, instead of a while, you could use #each_char, like this:

string.each_char do |char|
 # ...
end

Also, I think it's unnecessary to find the max yourself. You can just use #sort_by (a method of Enumerable, which is included in Hash), and sort like this:

hash.sort_by { |char, freq| freq }

This will create an array of arrays like this:

[["c", 2], ["b", 4], ["z", 5]]

and the last element contains the character you want (as sort is, by default, ascending, I believe). You can just retrieve it like this:

(hash.sort_by { |char, freq| freq })[-1]

although you might want to check what happens in cases the input string is empty and so on.

Also, I think it would be better if the method just returned the most frequent character and its frequency as a tuple, rather than as a string, so that you can then use the frequency as a number, without extracting it. The method's purpose is to discover and report the most frequent character, so it should only do that.

Finally, I think variable names could be more descriptive. For example, instead of hash, you could have frequencies_hash or just frequencies. If you follow my suggestions, I don't think you need any more variables.

Let me know if you'd like more complete code. I wrote it like this because I think it's good for people to refactor their own code based on suggestions, rather than copy paste stuff.

Question 17

Your code does not make use of Ruby capabilities, it is like C++ or Java. I propose this slower but much higher level implementation:

def most_common_letter(string, alpha_string="a".."z")
 most_common = string
 .split('')
 .select{|char| alpha_string.include?(char)}
 .max_by{|letter| string.count(letter)}
 most_common, string.count(most_common)
end

Snowhawk Snowhawk 6,7701 gold badge19 silver badges37 bronze badges · Accepted Answer · 2015-07-05 23:48:37Z

 while i < arr.length

In your loop, we are only accessing the current element and not doing any index arithmetic. Use arr.each{}.

 if hash.has_key? e
 hash[e] += 1
 else
 hash[e] = 1
 end

When we encounter a key, we just want to increment by 1. If a key doesn't exist, use the default value and increment by 1. You can initialize your hash to have a default value of 0 by hash = Hash.new(0).

# Find most common character
hash.each do |k,v|
 if(v > max)
 max = v
 ge = k
 end
end

This can simply be written using hash.max_by{} and it does the work for you of returning a key-value pair.

Putting it all together, you get

def most_common_letter(string)
 hash = Hash.new(0)
 string.split('').each{ |ch| hash[ch] += 1 }
 hash.max_by{ |ch,count| count }
end

And if you want to clean it up so its a single chained statement.

def most_common_letter(string)
 string.split('')
 .each_with_object(Hash.new(0)) { |ch,hash| hash[ch] += 1 }
 .max_by{ |ch,count| count }
end

What are the disadvantages of my solution ?

The biggest issue is that it only returns 1 letter/count pair. What happens if you call most_common_letter("aabb"). Your function returns the pair that is first encountered that sets max {'a', 2} rather than {{'a', 2}, {'b', 2}}. If you want to return all of the max pairs, build your count hash, find the max value, then return all elements from your hash that has that max value using Hash#select{}.

\$\begingroup\$ Nice catch about multiple possible pairs. \$\endgroup\$

FrontierPsycho
– FrontierPsycho

2015年07月06日 09:42:50 +00:00
Commented Jul 6, 2015 at 9:42

Stack Exchange Network

Most common letter in string

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Most common letter in string

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions