I'm trying to create a program that creates a record of how many times letters appear in a given string (ignoring blank spaces) using recursion.
s = "Hello there this is the string"
new_s = ""
for i in s:
if i != " ":
new_s += i
new_s = new_s.lower()
com_dict = {}
def merge_count(s):
global com_dict
if len(s) == 1:
if s[0] in com_dict:
com_dict[s[0]] += 1
else:
com_dict[s[0]] = 1
else:
merge_count(s[:int(len(s)/2)])
merge_count(s[int(len(s)/2):])
merge_count(new_s)
for i in com_dict:
print i, com_dict[i]
Is the global variable necessary? Should I be using a different data structure instead of a dictionary? I will need to rank the results based on the most frequently appearing letters. Is using recursion in this scenerio generally a bad idea? Would iterating be a better choice?
-
8\$\begingroup\$ Any reason you're not using Counter? \$\endgroup\$mjolka– mjolka2015年06月25日 03:58:26 +00:00Commented Jun 25, 2015 at 3:58
-
\$\begingroup\$ I would say use a dict instead. \$\endgroup\$Ryan Dougherty– Ryan Dougherty2015年06月25日 03:58:49 +00:00Commented Jun 25, 2015 at 3:58
-
2\$\begingroup\$ Is this an exercise where you use only the bit of Python you already know, or are you interested in getting the absolute best way to do it using the standard library? (as that would be Counter) \$\endgroup\$RemcoGerlich– RemcoGerlich2015年06月25日 09:03:11 +00:00Commented Jun 25, 2015 at 9:03
2 Answers 2
Is the global variable necessary?
No. Global variables are almost never necessary, and rarely the best solution. Consider this:
def merge_count(s, com_dict):
if len(s) == 1:
if s[0] in com_dict:
com_dict[s[0]] += 1
else:
com_dict[s[0]] = 1
else:
merge_count(s[:int(len(s)/2)], com_dict)
merge_count(s[int(len(s)/2):], com_dict)
def main():
s = "Hello there this is the string"
new_s = ""
for i in s:
if i != " ":
new_s += i
new_s = new_s.lower()
com_dict = {}
merge_count(new_s, com_dict)
for i in com_dict:
print i, com_dict[i]
main()
In this, you pass a dictionary to merge into into to the function.
You do
new_s = ""
for i in s:
if i != " ":
new_s += i
Addition of strings in loops is always almost bad, because it may reallocate the string on every addition. Instead, one should do a join:
new_s = "".join(i for i in s if i != " ")
However, in this case you might as well just do
new_s = s.replace(" ", "")
Recursively splitting in merge_count
is pointless. Just do
def merge_count(s, com_dict):
for i in s:
if s in com_dict:
com_dict[s[0]] += 1
else:
com_dict[s[0]] = 1
which affords the better interface of
def merge_count(s):
com_dict = {}
for i in s:
if s in com_dict:
com_dict[i] += 1
else:
com_dict[i] = 1
return com_dict
One can simplify things with defaultdict
:
from collections import defaultdict
def merge_count(s):
com_dict = defaultdict(int)
for i in s:
com_dict[i] += 1
return com_dict
Or even further with Counter
:
from collections import Counter
def merge_count(s):
return Counter(s)
Instead of
for i in com_dict:
print i, com_dict[i]
One should do
for i, v in com_dict.iteritems():
print i, v
Then one fixes up the naming
from collections import Counter
def main():
string = "Hello there this is the string"
lower_chars = string.replace(" ", "").lower()
char_counts = Counter(lower_chars)
for char, count in char_counts.items():
print(char, count)
main()
-
3\$\begingroup\$ Use
print char_counts.most_common()
instead to print the frequencies in descending order. \$\endgroup\$user1016274– user10162742015年06月25日 12:25:13 +00:00Commented Jun 25, 2015 at 12:25
Is the global variable necessary?
No. You don't need the global
keyword there, because you don't reassign the variable, you modify its content. On the other hand, right now the dictionary is too exposed. It might be modified outside of merge_counts
. Another problem is that if I don't see the implementation of the function, it's not clear that it expects this dictionary to be defined. It would be better to rewrite in a way that eliminates these issues, for example:
def merge_counts(s):
com_dict = {}
def merge_counts_recurse(s):
# ...
return com_dict
In this version the dictionary is no longer exposed in global scope, it cannot be modified outside of merge_counts
, and the function no longer depends on the dictionary defined outside.
Should I be using a different data structure instead of a dictionary?
The data structure is fine for the purpose, but there's a higher level utility class that you could use instead of reinventing the wheel, Counter, as @mjolka pointed out.
I will need to rank the results based on the most frequently appearing letters. Is using recursion in this scenerio generally a bad idea? Would iterating be a better choice?
Recursion is not bad, it's just a bit unusual for this scenario. Iteration is the obvious choice, and a lot easier to implement. In the first loop where you excluded the space characters, you could at the same time build the counts. That would have been the obvious, easiest implementation.