Decrease time of a function (Python)

Question 1

I'm trying to create a function in python that from a list of strings will return me a dict where the key(index) shows the most repetitive character for each index between all the strings. for example a list1 = ['one', 'two', 'twin', 'who'] should return index 0=t index 1=w index 2=o index 3=n in fact the most frequent character at the index 1 between all the string is 'w'. I found a solution but if I have lists with thousands of strings inside it will require too much time to perform. I would like to know if you can give me some help to decrease the time of execution.

Here is what I tried to do but seems too slow to perform with lists of thousands strings inside

list1 = ['one', 'two', 'twin', 'who']
width = len(max(list1, key=len))
chars = {}
for i, item in enumerate(zip(*[s.ljust(width) for s in list1])):
 set1 = set(item)
 if ' ' in set1:
 set1.remove(' ')
 chars[i] = max(set1, key=item.count)
print(chars)

Question 2

Not even the standard library? "collections.Counter" would help.

Question 3

nope, I need it without any libraries, like the code I poste above, but with a more efficient time

Question 4

What are using ljust for? You are adding spaces and then remove them again. This is most likely a waste and makes your code slow.

Question 5

I used ljust to establish the max width between all strings

Question 6

What do you need the maximum width for if you want to count characters?

Question 7

Whether something is quick enough is a matter of use case, but this solution uses a couple of seconds to go through the default wordlist available under OS X.

Python's collections.Counter implements a counter object for you, so you don't have keep track of the counts of multiple possible values yourself.

I've paired it with defaultdict, which intializes a key with a function if the key is undefined - so that if we haven't already seen the index we're updating the count for, it gets initialized to a Counter object that we then update.

from collections import defaultdict, Counter
with open("/usr/share/dict/words") as f:
 words = f.read().splitlines()
 letters = defaultdict(Counter)
 for word in words:
 for idx, letter in enumerate(word):
 letters[idx].update((letter, ))
for idx, counter in letters.items():
 print(idx, counter.most_common(1))

Whether this is quick enough depends on your use case as mentioned; it can be done a lot quicker if necessary, but it's probably quick enough. For 235 886 words the runtime is:

python3 letterfreq.py 2.67s user 0.04s system 99% cpu 2.734 total

This assumes that every word is lowercased, if not, lowercase it before adding it to your Counter object.

If you want to implement it without using the Counter or defaultdict parts of the standard library (which are just helper functionality to avoid reimplementing the same small code repeatedly), you can do the exact thing yourself manually:

with open("/usr/share/dict/words") as f:
 words = f.read().splitlines()
 letter_positions = {}
 for word in words:
 for idx, letter in enumerate(word):
 if idx not in letter_positions:
 letter_positions[idx] = {}
 if letter not in letter_positions[idx]:
 letter_positions[idx][letter] = 0
 letter_positions[idx][letter] += 1
final_dict = {}
for idx, counts in letter_positions.items():
 most_popular = sorted(counts.items(), key=lambda v: v[1], reverse=True)
 print(idx, most_popular)
 final_dict[idx] = most_popular[0][0]
print(final_dict)

Then pick as many entries as necessary from most_popular when going through the list afterwards.

Since we're no longer using the defaultdict and Counter abstractions, our running time is now about a third of the previous one:

python3 letterfreq2.py 1.08s user 0.03s system 98% cpu 1.124 total

It's usually a good idea to go through what you're trying to do and formulate a strategy - i.e. "ok, I need to keep track of how many times a letter has appeared in this location .. so for that I need some way to keep values for each index .. and then for each letter ..".

Question 8

thanks for the answer but I need a solution without importing any libraries

Question 9

Those are part of the standard library, but sure, you can just do the same thing as those functions/objects do manually; I've added an example of that as well.

Question 10

if I try it using the list1 = ['one', 'two', 'twin', 'who'] it doesn't return me a dictionary with {0:'t', 1:'w', 2:'o', 3:'n'} and that's what I need to get returned from the function when I print the dict

Question 11

@terrier99uk In that case, iterate over the sorted dict and assign it to a secondary dict. This outputs a sorted list with the characters sorted in popularity - feel free to do that last part to get the exact format you need; but I've added an example of that as well.

Question 12

@terrier99uk Depending on what you're asking about this for, if this is for a class assignment or similar (or any other reason, really), you need to spend to time to understand what is happening and why something works the way to do.

Question 13

I just make some improvements based on your algorithm.

First, you can use itertools.zip_longest() instead of zip() to remove the need of ljust() and the width variable:

from itertools import zip_longest
list1 = ['one', 'two', 'twin', 'who']
chars = {}
for i, item in enumerate(zip_longest(*list1)):
 set1 = set(item)
 if None in set1:
 set1.remove(None)
 chars[i] = max(set1, key=item.count)
print(chars)

Then, replace max(set1, key=item.count) with a more efficent way Counter(item).most_common(1)[0][0], combined with or set1.most_common(2)[1][0] to filter None values

from itertools import zip_longest
from collections import Counter
list1 = ['one', 'two', 'twin', 'who']
chars = {}
for i, item in enumerate(zip_longest(*list1)):
 set1 = Counter(item)
 chars[i] = set1.most_common(1)[0][0] or set1.most_common(2)[1][0]
print(chars)

As itertools and collections are Python built-in modules, you can import them directly without pip install them.

Question 14

can you give me another solution without importing any libraries? because I need it without importing any libraries

MatsLindh 53.6k5 gold badges70 silver badges97 bronze badges · Accepted Answer · 2022-11-12 11:36:57Z

Whether something is quick enough is a matter of use case, but this solution uses a couple of seconds to go through the default wordlist available under OS X.

Python's collections.Counter implements a counter object for you, so you don't have keep track of the counts of multiple possible values yourself.

I've paired it with defaultdict, which intializes a key with a function if the key is undefined - so that if we haven't already seen the index we're updating the count for, it gets initialized to a Counter object that we then update.

from collections import defaultdict, Counter
with open("/usr/share/dict/words") as f:
 words = f.read().splitlines()
 letters = defaultdict(Counter)
 for word in words:
 for idx, letter in enumerate(word):
 letters[idx].update((letter, ))
for idx, counter in letters.items():
 print(idx, counter.most_common(1))

Whether this is quick enough depends on your use case as mentioned; it can be done a lot quicker if necessary, but it's probably quick enough. For 235 886 words the runtime is:

python3 letterfreq.py 2.67s user 0.04s system 99% cpu 2.734 total

This assumes that every word is lowercased, if not, lowercase it before adding it to your Counter object.

If you want to implement it without using the Counter or defaultdict parts of the standard library (which are just helper functionality to avoid reimplementing the same small code repeatedly), you can do the exact thing yourself manually:

with open("/usr/share/dict/words") as f:
 words = f.read().splitlines()
 letter_positions = {}
 for word in words:
 for idx, letter in enumerate(word):
 if idx not in letter_positions:
 letter_positions[idx] = {}
 if letter not in letter_positions[idx]:
 letter_positions[idx][letter] = 0
 letter_positions[idx][letter] += 1
final_dict = {}
for idx, counts in letter_positions.items():
 most_popular = sorted(counts.items(), key=lambda v: v[1], reverse=True)
 print(idx, most_popular)
 final_dict[idx] = most_popular[0][0]
print(final_dict)

Then pick as many entries as necessary from most_popular when going through the list afterwards.

Since we're no longer using the defaultdict and Counter abstractions, our running time is now about a third of the previous one:

python3 letterfreq2.py 1.08s user 0.03s system 98% cpu 1.124 total

It's usually a good idea to go through what you're trying to do and formulate a strategy - i.e. "ok, I need to keep track of how many times a letter has appeared in this location .. so for that I need some way to keep values for each index .. and then for each letter ..".

thanks for the answer but I need a solution without importing any libraries
Those are part of the standard library, but sure, you can just do the same thing as those functions/objects do manually; I've added an example of that as well.
if I try it using the list1 = ['one', 'two', 'twin', 'who'] it doesn't return me a dictionary with {0:'t', 1:'w', 2:'o', 3:'n'} and that's what I need to get returned from the function when I print the dict
@terrier99uk In that case, iterate over the sorted dict and assign it to a secondary dict. This outputs a sorted list with the characters sorted in popularity - feel free to do that last part to get the exact format you need; but I've added an example of that as well.
@terrier99uk Depending on what you're asking about this for, if this is for a class assignment or similar (or any other reason, really), you need to spend to time to understand what is happening and why something works the way to do.

CollectivesTM on Stack Overflow

Decrease time of a function (Python)

2 Answers 2

12 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

12 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related