I intentionally avoided Python tools which would make this even more trivial. I would like to be reviewed on efficiency, style, and obviously if there is a bug I would like to know.
def unique_char(s):
''' Find first non-repeated char in a string '''
myDict = {}
first = len(s)
for c in s:
if c in myDict.keys():
myDict[c] += 1
else:
myDict[c] = 1
if 1 in myDict.values():
for k,v in myDict.items():
if v == 1:
if s.index(k) < first:
first = s.index(k)
return s[first]
return(False)
-
5\$\begingroup\$ That's 4 posts in a very short period of time. Had you allowed a day between each, you would have 4 posts counting toward a nice participation/question badge... and could possibly have applied feedback from the older posts into the newer ones. I hope you get good reviews! \$\endgroup\$Mathieu Guindon– Mathieu Guindon2015年11月01日 23:38:41 +00:00Commented Nov 1, 2015 at 23:38
2 Answers 2
The docstring is not quite right: "Find first non-repeated char in a string" — and then what? It should say explicitly that it returns the character.
The function returns
False
if all characters are repeated. This is a bad idea: it would be easy for the caller to forget to check. It's better to raise an exception in exceptional cases.myDict
is an uninformative choice of variable name. This dictionary contains counts of characters, so it should have name likecounts
orcharacter_counts
.c in myDict.keys()
can be simplified toc in myDict
.Building a dictionary of counts can more easily done using the built-in
collections.Counter
. It's not clear why you would want to avoid using this. Trivial functions should be trivial.There's no point in testing
if 1 in myDict.values()
. Dictionaries have efficient lookup by key, but not by value, so thein
operator here has to look through all the values. Since you're going to look through all the values anyway, this doesn't save you anything.The runtime is \$Θ(n^2)\$ because
s.index(k)
has to search through the string for the characterk
. But there is a \$Θ(n)\$ algorithm:from collections import Counter def unique_char(s): """Return the first non-repeated character in the string s, or raise ValueError if all characters are repeated. """ counts = Counter(s) for c in s: if counts[c] == 1: return c raise ValueError("all characters are repeated")
Gareth has a lot of good points, though I disagree about raising an error. str.find
doesn't raise an error when it doesn't find the search term, it returns -1. Errors aren't always the best way to signal absence or failure. In your case, returning a boolean is a datatype mismatch. If you're going to return a different type, return None
, to indicate no result. Alternatively, return ''
. Which is a string type, but empty. And of length zero, so when turned to a boolean it would be False anyway. eg. if unique_char("banbanana"):
would be evaluated as False
.
Also note you don't need to put brackets around the value that you return, return False
works fine.
This may be irrelevant for your use case but one problem with this approach is that you're processing the whole string before you start looking for unique characters. Instead, why not check each string as it comes.
So, you'd want a set to store characters found to be duplicates. Then you just want to loop over the string, testing first for previously found duplicates, then checking if the new character is a duplicate. You can save time by ignoring all characters before the current one, as you've already found all of them to have duplicates. So this is how it could look:
def unique_chars(s):
"""Return the first non-repeated character in the string s.
Returns an empty string if no character is found"""
dupes = set()
for i, c in enumerate(s):
if c in dupes:
continue
# Only search the characters ahead of this point
if c in s[i + 1:]:
dupes.add(c)
else:
return c
return ''