After reading a certain question on the Stack Overflow website, I tried to write a solution to the problem just for fun. I'm, however, left with the nagging feeling there is a beautiful one-liner that can be used instead.
The question's premises:
Create a function that will receive a string. The function will count each vowel in the string, and return a count of all the vowels, whether found or not, each in a tuple pair. Each tuple parir will be stored in a list.
Example:
>>> vowels_finder("This has some vowels")
[('a', 1), ('o', 2), ('u', 0), ('e', 2), ('i', 1)] # tuple pair of each vowel.
>>>
My attempt:
def vowels_finder(s):
vowels = {'a':0, 'e':0, 'i':0, 'o':0, 'u':0}
for el in s:
if el in {'a', 'e', 'i', 'o', 'u'}:
vowels[el]+=1
vowels = [(key, pair) for key, pair in vowels.items()]
return vowels
My code above is not commented, but I'm confident that its brevity will allow me to pass on this.
Questions:
- Is there _any way, besides using a python library, that this can be condensed into a one-liner?
- Would there be a way to not have to convert the
vowels
key's back into tuple pairs, and just have them be tuples in the beginning. eg:vowels = [('a', 0), ('e', 0), ('i', 0), ('o', 0), ('u', 0)]
?
language: Python 3.4
4 Answers 4
With Python it can be tempting to want to write one-liners; but, short code does not necessarily make for better code, and I'd like to review your code in a way that I think would make it more maintainable, flexible and professional, rather than shorter. (maybe someone else will address the one-liner request)
Type hints
Since you are using Python 3.x you could take advantage of the new type hints. According to PEP 484:
This PEP aims to provide a standard syntax for type annotations, opening up Python code to easier static analysis and refactoring, potential runtime type checking, and (perhaps, in some contexts) code generation utilizing type information.
Of these goals, static analysis is the most important. This includes support for off-line type checkers such as mypy, as well as providing a standard notation that can be used by IDEs for code completion and refactoring.
Even if you don't use static code analysis at the moment, type hints still have the advantage of making the code easier to read and understand.
In your case:
def vowels_finder(s: str) -> list:
# ...
Reusable functions
The first thought I had looking at your function is that some logic could be extracted for more general reuse. For instance, this function could come in handy for other things:
def is_vowel(ch: chr, include_y: bool=False) -> bool:
if include_y:
return ch in ('a', 'e', 'i', 'o', 'u', 'y')
else:
return ch in ('a', 'e', 'i', 'o', 'u')
You will notice I also added support for optionally including "Y" as a vowel, which can be useful for certain contexts.
Note that I also used a tuple instead of set, since vowels won't change anyways and tuples are generally faster since they are immutable, and we don't need set operations in this case other than in
membership, which tuples support as well.
Now we can simply do this in your vowels_finder
function:
for el in s:
if is_vowel(el):
vowels[el]+=1
Main function improvements
I would expect a function named vowels_finder
to do just that: look for a vowel, and return True
if it finds one. Furthermore, I would expect a name like this to be an object/class "Thing", rather than a function which is usually named like "do something".
Let's call it count_individual_vowels
instead. Also, now that we have a function for vowels with added functionality for "Y", we can very easily add this option to this function. Note that I have changed some of the variable names a bit to make them more clear:
def count_individual_vowels(input_str: str, include_y: bool = False) -> list:
vowel_counts = {'a':0, 'e':0, 'i':0, 'o':0, 'u':0}
if include_y:
vowel_counts['y'] = 0
for el in input_str:
if is_vowel(el, include_y):
vowel_counts[el] += 1
return [(key, pair) for key, pair in vowel_counts.items()]
Bug / overlooked problem
After refactoring this I noticed a problem, see for illustration:
string1 = "My phrase has some vowels, pretty cool don't you think?"
print(count_individual_vowels(string1))
print(count_individual_vowels(string1, True))
string2 = string1.upper()
print('UPPER CASE')
print(count_individual_vowels(string2))
print(count_individual_vowels(string2, True))
Results:
[('u', 1), ('o', 6), ('i', 1), ('e', 4), ('a', 2)] [('e', 4), ('a', 2), ('u', 1), ('y', 3), ('o', 6), ('i', 1)] UPPER CASE [('u', 0), ('o', 0), ('i', 0), ('e', 0), ('a', 0)] [('e', 0), ('a', 0), ('u', 0), ('y', 0), ('o', 0), ('i', 0)]
This could of course cause problems, but thankfully the fix is extremely simple, just use .lower()
method on strings in the functions
Here in the new helper function...
return ch.lower() in ('a', 'e', 'i', 'o', 'u')
And also when adding it into the dict, so that uppercase vowels get grouped into the lowercase count:
for el in input_str:
if is_vowel(el, include_y):
# here:
vowel_counts[el.lower()] += 1
OrderedDict to keep vowels in order
As you've noticed, using a regular dict
to store the counts results in the output vowels returns the values in arbitrary order. You could make it return always in the same order by using from collections import OrderedDict
and just replacing this:
vowel_counts = {'a':0, 'e':0, 'i':0, 'o':0, 'u':0}
if include_y:
vowel_counts['y'] = 0
By this a bit more verbose, but much nicer output:
vowel_counts = OrderedDict()
for vow in 'a', 'e','i', 'o', 'u':
vowel_counts[vow] = 0
if include_y:
vowel_counts['y'] = 0
See:
[('a', 2), ('e', 4), ('i', 1), ('o', 6), ('u', 1)] [('a', 2), ('e', 4), ('i', 1), ('o', 6), ('u', 1), ('y', 3)]
Finally, here is a working demo on repl.it with all the above suggestions applied.
-
\$\begingroup\$ Na, don't worry about my request. It's really is just an after thought. I know very well the "I can slap every single thing on one line!" ideology. It really was more just to see if it could be done. Regardless, You have given an excellent answer. Thanks! \$\endgroup\$Chris– Chris2016年10月13日 05:51:47 +00:00Commented Oct 13, 2016 at 5:51
-
3\$\begingroup\$ You can do your contains checks using
in 'aeiou'
rather than building a tuple for the occasion. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2016年10月13日 07:42:47 +00:00Commented Oct 13, 2016 at 7:42 -
2\$\begingroup\$ This is a great answer, and I +1d, but I'd like to be picky about your
if include_y == False:
line. Direct comparisons to bools are almost never useful, and I don't think this one is an exception to that. \$\endgroup\$ymbirtt– ymbirtt2016年10月13日 08:14:47 +00:00Commented Oct 13, 2016 at 8:14 -
2\$\begingroup\$ @Graipher. Pure conjecture, haven't looked too deeply into the internals or actually benchmarked this, but a Python
set
is similar to adict
under the hood. In the worst case,in tuple()
will be 6chr
comparisons, whilstin set
will be a hash compute (over an internedchr
, granted, so actually just a cache lookup) followed by a hash lookup. I wouldn't be overly surprised if these lookups actually took longer than the 6 comparisons. That said, if we're optimising that hard, we're solving the wrong problem. \$\endgroup\$ymbirtt– ymbirtt2016年10月13日 09:54:23 +00:00Commented Oct 13, 2016 at 9:54 -
3\$\begingroup\$ If you're going to include optional vowels, you should probably add w, just in case someone wants to run this code in a Welsh locale. ;) \$\endgroup\$TRiG– TRiG2016年10月13日 11:53:53 +00:00Commented Oct 13, 2016 at 11:53
Whenever you want to count elements, you can use a collections.Counter
to do so rather than building the dictionnary yourself. What's left in the end is extracting the count of letters you're interested in out of said Counter
:
from collections import Counter
def vowels_finder(sentence, vowels='aeiou'):
count = Counter(sentence)
return [(letter, count[letter]) for letter in vowels]
-
\$\begingroup\$
Counter
is adict
, so it has.items()
. Thus, you can returnlist(count.items())
instead of building it manually. \$\endgroup\$Daerdemandt– Daerdemandt2016年10月13日 12:40:46 +00:00Commented Oct 13, 2016 at 12:40 -
\$\begingroup\$ @Daerdemandt Except that doing so you will get all the consonnants too, defeating the purpose of the function. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2016年10月13日 12:43:42 +00:00Commented Oct 13, 2016 at 12:43
-
\$\begingroup\$ Oh, bummer. I've confused it with other code that only counts vowels. \$\endgroup\$Daerdemandt– Daerdemandt2016年10月13日 12:46:51 +00:00Commented Oct 13, 2016 at 12:46
MathiasEttinger's answer already went into the right direction of suggesting to use a Counter
and making the vowels a function parameter with default value. Here are some more improvements:
vowels_finder
is not a particularly good name for the function. Functions are typically named after the result they return, or the side effect they invoke. The function also doesn't only "find" vowels, its main purpose is to count them. I propose the namevowel_count
.Instead of counting all letters and then filtering out the vowels afterwards, filter the vowels first and count only these. This makes the function one line shorter and about ×ばつ faster (depending on how common vowels are in the input text).
Add a docstring explaining the purpose of the function, and giving you the opportunity to include short examples which are automatically testable using doctest.
def vowel_count(sentence, vowels='aeiou'):
'''Return a Counter of all vowels contained in the sentence.
>>> vowels = vowel_count('This has some vowels, ALSO IN UPPERCASE.')
>>> vowels['a']
3
>>> vowels.most_common()
[('e', 4), ('a', 3), ('o', 3), ('i', 2), ('u', 1)]
'''
return Counter(c for c in sentence.lower() if c in vowels)
(I also incorporated the conversion to lower-case, as suggested in Phrancis' answer.)
Returning the Counter
object directly gives you the freedom of converting it to the "list of pairs" format, using the most_common
method, or keeping the dictionary interface.
-
\$\begingroup\$ Yeah the name I gave to my function sucked, but in my defense it was midnight! Seriously though, thanks for the great suggestions. About my naming though; I usually think I'll be to verbose if I choose a name that fully describes what my function returns/does. But in this case, the name you proposed does not seem overly wordy, so I'll use it. Cheers! \$\endgroup\$Chris– Chris2016年10月13日 15:36:21 +00:00Commented Oct 13, 2016 at 15:36
-
\$\begingroup\$ It might be interesting to use
in set(vowels)
and check if it increases speed with large amount of vowels in sentence. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2016年10月13日 17:32:48 +00:00Commented Oct 13, 2016 at 17:32
Apart from the awesome answer @Phrancis just gave, I'll just address the part of the question where it asks for a one-liner. You can have something like this, if that's really necessary:
def find_vowels_1(data):
return [(k, v) for k, v in {v: data.lower().count(v) for v in 'aeiou'}.items()]
print(find_vowels_1('dexter'))
Which will print:
[('o', 0), ('e', 2), ('i', 0), ('a', 0), ('u', 0)]
Bear with the fact that I don't recommend this solution as it won't bring anything good. I'd just follow everything it was stated in the above answer.
As others mentioned, is redundant to have a dict which will then be converted to a list. So you can build a list from the start. More, you can also make another function argument vowels
in which you can pass the vowels specific to a different region:
def find_vowels_1(data, vowels='aeiou'):
# return the number of occurrences of each vowel in a string
return [(v, data.count(v)) for v in vowels]
print(find_vowels_1('dexter'))
-
2\$\begingroup\$ Why make a dict, to then make a dict_items, and then a list, why not make a list off the bat? For example
[(v, data.count(v)) for v in 'aeiou']
. Also the special y case can be achieved using a ternary,'aeiouy' if include_y else 'aeiou'
. \$\endgroup\$2016年10月13日 08:42:42 +00:00Commented Oct 13, 2016 at 8:42 -
\$\begingroup\$ I think the y special case is easier in this example. Just take 'aeiou' as a paramenter. This allows me to pass in 'aeiouyæøå', which is nice. \$\endgroup\$Taemyr– Taemyr2016年10月13日 13:21:59 +00:00Commented Oct 13, 2016 at 13:21
return vowels.items()
instead of using that list comprehension. \$\endgroup\$