0

I'm trying to create a function in python that from a list of strings will return me a dict where the key(index) shows the most repetitive character for each index between all the strings. for example a list1 = ['one', 'two', 'twin', 'who'] should return index 0=t index 1=w index 2=o index 3=n in fact the most frequent character at the index 1 between all the string is 'w'. I found a solution but if I have lists with thousands of strings inside it will require too much time to perform. I would like to know if you can give me some help to decrease the time of execution.

Here is what I tried to do but seems too slow to perform with lists of thousands strings inside

list1 = ['one', 'two', 'twin', 'who']
width = len(max(list1, key=len))
chars = {}
for i, item in enumerate(zip(*[s.ljust(width) for s in list1])):
 set1 = set(item)
 if ' ' in set1:
 set1.remove(' ')
 chars[i] = max(set1, key=item.count)
print(chars)
asked Nov 12, 2022 at 11:08
7
  • Not even the standard library? "collections.Counter" would help. Commented Nov 12, 2022 at 11:26
  • nope, I need it without any libraries, like the code I poste above, but with a more efficient time Commented Nov 12, 2022 at 11:45
  • What are using ljust for? You are adding spaces and then remove them again. This is most likely a waste and makes your code slow. Commented Nov 12, 2022 at 11:45
  • I used ljust to establish the max width between all strings Commented Nov 12, 2022 at 11:47
  • What do you need the maximum width for if you want to count characters? Commented Nov 12, 2022 at 11:50

2 Answers 2

2

Whether something is quick enough is a matter of use case, but this solution uses a couple of seconds to go through the default wordlist available under OS X.

Python's collections.Counter implements a counter object for you, so you don't have keep track of the counts of multiple possible values yourself.

I've paired it with defaultdict, which intializes a key with a function if the key is undefined - so that if we haven't already seen the index we're updating the count for, it gets initialized to a Counter object that we then update.

from collections import defaultdict, Counter
with open("/usr/share/dict/words") as f:
 words = f.read().splitlines()
 letters = defaultdict(Counter)
 for word in words:
 for idx, letter in enumerate(word):
 letters[idx].update((letter, ))
for idx, counter in letters.items():
 print(idx, counter.most_common(1))

Whether this is quick enough depends on your use case as mentioned; it can be done a lot quicker if necessary, but it's probably quick enough. For 235 886 words the runtime is:

python3 letterfreq.py 2.67s user 0.04s system 99% cpu 2.734 total

This assumes that every word is lowercased, if not, lowercase it before adding it to your Counter object.

If you want to implement it without using the Counter or defaultdict parts of the standard library (which are just helper functionality to avoid reimplementing the same small code repeatedly), you can do the exact thing yourself manually:

with open("/usr/share/dict/words") as f:
 words = f.read().splitlines()
 letter_positions = {}
 for word in words:
 for idx, letter in enumerate(word):
 if idx not in letter_positions:
 letter_positions[idx] = {}
 if letter not in letter_positions[idx]:
 letter_positions[idx][letter] = 0
 letter_positions[idx][letter] += 1
final_dict = {}
for idx, counts in letter_positions.items():
 most_popular = sorted(counts.items(), key=lambda v: v[1], reverse=True)
 print(idx, most_popular)
 final_dict[idx] = most_popular[0][0]
print(final_dict)

Then pick as many entries as necessary from most_popular when going through the list afterwards.

Since we're no longer using the defaultdict and Counter abstractions, our running time is now about a third of the previous one:

python3 letterfreq2.py 1.08s user 0.03s system 98% cpu 1.124 total

It's usually a good idea to go through what you're trying to do and formulate a strategy - i.e. "ok, I need to keep track of how many times a letter has appeared in this location .. so for that I need some way to keep values for each index .. and then for each letter ..".

answered Nov 12, 2022 at 11:36
Sign up to request clarification or add additional context in comments.

12 Comments

thanks for the answer but I need a solution without importing any libraries
Those are part of the standard library, but sure, you can just do the same thing as those functions/objects do manually; I've added an example of that as well.
if I try it using the list1 = ['one', 'two', 'twin', 'who'] it doesn't return me a dictionary with {0:'t', 1:'w', 2:'o', 3:'n'} and that's what I need to get returned from the function when I print the dict
@terrier99uk In that case, iterate over the sorted dict and assign it to a secondary dict. This outputs a sorted list with the characters sorted in popularity - feel free to do that last part to get the exact format you need; but I've added an example of that as well.
@terrier99uk Depending on what you're asking about this for, if this is for a class assignment or similar (or any other reason, really), you need to spend to time to understand what is happening and why something works the way to do.
|
0

I just make some improvements based on your algorithm.

First, you can use itertools.zip_longest() instead of zip() to remove the need of ljust() and the width variable:

from itertools import zip_longest
list1 = ['one', 'two', 'twin', 'who']
chars = {}
for i, item in enumerate(zip_longest(*list1)):
 set1 = set(item)
 if None in set1:
 set1.remove(None)
 chars[i] = max(set1, key=item.count)
print(chars)

Then, replace max(set1, key=item.count) with a more efficent way Counter(item).most_common(1)[0][0], combined with or set1.most_common(2)[1][0] to filter None values

from itertools import zip_longest
from collections import Counter
list1 = ['one', 'two', 'twin', 'who']
chars = {}
for i, item in enumerate(zip_longest(*list1)):
 set1 = Counter(item)
 chars[i] = set1.most_common(1)[0][0] or set1.most_common(2)[1][0]
print(chars)

As itertools and collections are Python built-in modules, you can import them directly without pip install them.

answered Nov 12, 2022 at 11:55

1 Comment

can you give me another solution without importing any libraries? because I need it without importing any libraries

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.