Find first unique character in string

Question 1

I intentionally avoided Python tools which would make this even more trivial. I would like to be reviewed on efficiency, style, and obviously if there is a bug I would like to know.

def unique_char(s):
 ''' Find first non-repeated char in a string '''
 myDict = {}
 first = len(s)
 for c in s:
 if c in myDict.keys():
 myDict[c] += 1
 else:
 myDict[c] = 1
 if 1 in myDict.values():
 for k,v in myDict.items():
 if v == 1:
 if s.index(k) < first:
 first = s.index(k)
 return s[first]
 return(False)

Question 2

That's 4 posts in a very short period of time. Had you allowed a day between each, you would have 4 posts counting toward a nice participation/question badge... and could possibly have applied feedback from the older posts into the newer ones. I hope you get good reviews!

Question 3

The docstring is not quite right: "Find first non-repeated char in a string" — and then what? It should say explicitly that it returns the character.
The function returns False if all characters are repeated. This is a bad idea: it would be easy for the caller to forget to check. It's better to raise an exception in exceptional cases.
myDict is an uninformative choice of variable name. This dictionary contains counts of characters, so it should have name like counts or character_counts.
c in myDict.keys() can be simplified to c in myDict.
Building a dictionary of counts can more easily done using the built-in collections.Counter. It's not clear why you would want to avoid using this. Trivial functions should be trivial.
There's no point in testing if 1 in myDict.values(). Dictionaries have efficient lookup by key, but not by value, so the in operator here has to look through all the values. Since you're going to look through all the values anyway, this doesn't save you anything.

The runtime is \$Θ(n^2)\$ because s.index(k) has to search through the string for the character k. But there is a \$Θ(n)\$ algorithm:

from collections import Counter
def unique_char(s):
 """Return the first non-repeated character in the string s, or raise
 ValueError if all characters are repeated.
 """
 counts = Counter(s)
 for c in s:
 if counts[c] == 1:
 return c
 raise ValueError("all characters are repeated")

Question 4

Gareth has a lot of good points, though I disagree about raising an error. str.find doesn't raise an error when it doesn't find the search term, it returns -1. Errors aren't always the best way to signal absence or failure. In your case, returning a boolean is a datatype mismatch. If you're going to return a different type, return None, to indicate no result. Alternatively, return ''. Which is a string type, but empty. And of length zero, so when turned to a boolean it would be False anyway. eg. if unique_char("banbanana"): would be evaluated as False.

Also note you don't need to put brackets around the value that you return, return False works fine.

This may be irrelevant for your use case but one problem with this approach is that you're processing the whole string before you start looking for unique characters. Instead, why not check each string as it comes.

So, you'd want a set to store characters found to be duplicates. Then you just want to loop over the string, testing first for previously found duplicates, then checking if the new character is a duplicate. You can save time by ignoring all characters before the current one, as you've already found all of them to have duplicates. So this is how it could look:

def unique_chars(s):
 """Return the first non-repeated character in the string s.
 Returns an empty string if no character is found"""
 dupes = set()
 for i, c in enumerate(s):
 if c in dupes:
 continue
 # Only search the characters ahead of this point
 if c in s[i + 1:]:
 dupes.add(c)
 else:
 return c
 return ''

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2015-11-02 00:12:24Z

The docstring is not quite right: "Find first non-repeated char in a string" — and then what? It should say explicitly that it returns the character.
The function returns False if all characters are repeated. This is a bad idea: it would be easy for the caller to forget to check. It's better to raise an exception in exceptional cases.
myDict is an uninformative choice of variable name. This dictionary contains counts of characters, so it should have name like counts or character_counts.
c in myDict.keys() can be simplified to c in myDict.
Building a dictionary of counts can more easily done using the built-in collections.Counter. It's not clear why you would want to avoid using this. Trivial functions should be trivial.
There's no point in testing if 1 in myDict.values(). Dictionaries have efficient lookup by key, but not by value, so the in operator here has to look through all the values. Since you're going to look through all the values anyway, this doesn't save you anything.

The runtime is \$Θ(n^2)\$ because s.index(k) has to search through the string for the character k. But there is a \$Θ(n)\$ algorithm:

from collections import Counter
def unique_char(s):
 """Return the first non-repeated character in the string s, or raise
 ValueError if all characters are repeated.
 """
 counts = Counter(s)
 for c in s:
 if counts[c] == 1:
 return c
 raise ValueError("all characters are repeated")

Stack Exchange Network

Find first unique character in string

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Find first unique character in string

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions