I recently tried to solve the first non-repeating character problem. Please tell me if my solution is valid. I'm aiming for O(n) with a single loop.
My thinking is, it would be easy to tell you what the last non repeated character is, just:
loop each char
if map[char] doesn't exist
output = char
store char in map
So if you can get the last non repeated character, wouldn't you be able to run the same algorithm to get the first one, if the loop went through the string backwards? Here's my (python) code:
def firstNonRepeater(s):
smap = {}
out = ''
for i in range(len(s)-1, -1, -1):
if s[i] not in smap:
out = s[i]
if s[i] in smap:
smap[s[i]] += 1
else:
smap[s[i]] = 1
if smap[out] > 1:
return None
return out
There is some extra code to catch when there are no unique chars.
This seems to work, am I missing something?
EDIT:
@Alex Waygood has raised that my solution has an oversight, in that it will return None if it encounters a repeating character, where all instances are before the first non-repeater. See his in-depth explanation below.
4 Answers 4
Python has a Counter
data structure that is handy whenever you are ...
counting things. Since strings are directly iterable, you can easily get a
dict-like tally for how many times each character occurs in a string. And since
modern Python dicts keep track of insertion order, you can just iterate over
that tally, returning the first valid character.
from collections import Counter
def first_non_repeater(s):
for char, n in Counter(s).items():
if n == 1:
return char
return None
print(first_non_repeater("xxyzz")) # y
print(first_non_repeater("xxyzzy")) # None
print(first_non_repeater("xxyzazy")) # a
And if you don't want to use Counter
, just add your
own counting logic to the function:
tally = {}
for char in s:
try:
tally[char] += 1
except KeyError:
tally[char] = 1
-
2\$\begingroup\$ Even shorter:
def first_non_repeater(s): return next((char for char, n in Counter(s).items() if n == 1), None)
\$\endgroup\$200_success– 200_success2021年07月27日 06:32:53 +00:00Commented Jul 27, 2021 at 6:32 -
2\$\begingroup\$ This is a great solution, pythons Counter seems very elegant here. I also had no idea that later python versions maintained insertion order in dicts, so I wasn't aware that writing your own tally was an option(since I thought there is no way to check the first occurrence). Thanks for your help. Although your answer doesn't mention that my algorithm is flawed(feel free to update), I'll accept it because I like the solution best. \$\endgroup\$zing– zing2021年07月27日 07:24:54 +00:00Commented Jul 27, 2021 at 7:24
As has been pointed out in the comments and @Kraigolas's answer, the existing algorithm doesn't work. Here's why.
In the string "xxy"
:
- The algorithm first considers the third character,
"y"
. It finds it has not encountered it before, so records it as a preliminaryout
value. It adds it tosmap
with value1
. - The algorithm next considers the second character,
"x"
. It finds it has not encountered this character before, either. It discards its previous preliminaryout
value,"y"
, and adopts"x"
as its new preliminaryout
value. It adds"x"
tosmap
with value1
. - The algorithm now moves on to the first character,
"x"
. It realises that it has seen this one before, so adds1
to the associated value insmap
. - The algorithm has now finished iterating through the loop. Its current
out
value is"x"
, but when it looks it up insmap
, it finds it has encountered"x"
more than once. As a result, it assumes that there are 0 characters that it has seen only once, and returnsNone
.
It is impossible to fix the current implementation of the algorithm while keeping to the condition that there should be "only a single loop". If you were to relax this restriction, you could "fix" your current algorithm like this:
def first_non_repeater(string):
smap = {}
for char in string:
if char in smap:
smap[char] += 1
else:
smap[char] = 1
return next((k for k, v in smap.items() if v == 1), None)
Though it's generally both more idiomatic and more efficient, in python, to "ask for forgiveness" rather than "ask for permission":
def first_non_repeater(string):
smap = {}
for char in string:
try:
smap[char] += 1
except KeyError:
smap[char] = 1
return next((k for k, v in smap.items() if v == 1), None)
And you can get there more cleanly by just using a defaultdict
:
from collections import defaultdict
def first_non_repeater(string):
smap = defaultdict(int)
for char in string:
smap[char] += 1
return next((k for k, v in smap.items() if v == 1), None)
And I like @FMc's answer using collections.Counter
more than any of these three — I think using Counter
is probably the most pythonic way of doing this.
But at I say, all of these solutions break the condition that only one loop is allowed (whether that's the key concern when it comes to an efficient algorithm in python is another question).
The following would be my attempt at a solution that only uses builtin data structures (nothing from collections
), and that only uses a single loop. The solution only works on python 3.6+, as it relies on dict
s being ordered. Feedback welcome:
from typing import TypeVar, Union, Optional
T = TypeVar('T')
def first_non_repeating_char(
string: str,
default: Optional[T] = None
) -> Union[str, T, None]:
"""Return the first non-repeating character in a string.
If no character in the string occurs exactly once,
return the default value.
Parameters
----------
string: str
The string to be searched.
default: Any, optional
The value to be returned if there is no character
in the string that occurs exactly once.
By default, None.
Returns
-------
Either a string of length 1, or the default value.
"""
# Using a dictionary as an ordered set,
# for all the characters we've seen exactly once.
# The values in this dictionary are irrelevant.
uniques: dict[str, None] = {}
# a set for all the characters
# that we've already seen more than once
dupes: set[str] = set()
for char in string:
if char in dupes:
continue
try:
del uniques[char]
except KeyError:
uniques[char] = None
else:
dupes.add(char)
# return the first key in the dictionary
# if the dictionary isn't empty.
# If the dictionary is empty,
# return the default.
return next(iter(uniques), default)
-
1\$\begingroup\$ Using
return next((k for k in uniques.keys()), default)
would allow you to skip thetry ... except
. \$\endgroup\$AJNeufeld– AJNeufeld2021年07月27日 01:48:10 +00:00Commented Jul 27, 2021 at 1:48 -
\$\begingroup\$ Nice! Forgot about that @AJNeufeld — have edited it into my answer. \$\endgroup\$Alex Waygood– Alex Waygood2021年07月27日 02:00:04 +00:00Commented Jul 27, 2021 at 2:00
-
1\$\begingroup\$ @AJNeufeld I guess? But you don't need a generator to do that; just call
iter
. \$\endgroup\$Reinderien– Reinderien2021年07月27日 03:43:23 +00:00Commented Jul 27, 2021 at 3:43 -
1\$\begingroup\$ I have edited my docstring. \$\endgroup\$Alex Waygood– Alex Waygood2021年07月27日 12:44:46 +00:00Commented Jul 27, 2021 at 12:44
-
1\$\begingroup\$ @Reinderien Yes,
iter(uniques)
is sufficient; you don't need the.keys()
\$\endgroup\$AJNeufeld– AJNeufeld2021年07月28日 03:16:52 +00:00Commented Jul 28, 2021 at 3:16
Style
From PEP 8, function names should be lowercase and use underscores to separate words:
def first_non_repeater(s):
...
Implementation
In Python 3.6+ dictionaries maintain insertion order and if you're using an earlier version, PEP 372 adds collections.OrderedDict
. If collections.Counter
used an OrderedDict
, you could solve this problem with that, but it doesn't so we will do that work instead:
from collections import OrderedDict
def first_non_repeater(s):
smap = OrderedDict()
for char in s:
if char in smap:
smap[char] += 1
else:
smap[char] = 1
for char, count in smap.items():
if count == 1:
return char
return None
In the best case, this is O(N), worst case it is O(2N), and average case you would expect O(3N/2), all of which is just O(N).
The room for improvement here would be to eliminate this:
for char, count in smap.items():
if count == 1:
return char
But removing this requires more complicated code and makes it more bug-prone with little benefit. As pointed out by @AJNeufeld, your code won't work for "xxyzz"
and many other cases. Avoiding a second pass through the characters in your text will add a considerable amount of complexity to your code, and might not even make the solution faster as it will require extra checks in your initial for
loop, and in the best case it still doesn't change the complexity from O(N).
-
\$\begingroup\$ O(N), O(2N), and O(3N/2) are technically all the same... It doesn't make much sense to use Big-O this way. \$\endgroup\$idmean– idmean2021年07月29日 08:57:48 +00:00Commented Jul 29, 2021 at 8:57
The main issue I see with this code is the lack of any unit tests. It's quite simple to invoke the test framework at the end of the module:
if __name__ == "__main__":
import doctest
doctest.testmod()
Now, just add some test cases:
def firstNonRepeater(s):
"""
Returns the first character in the input string
which is not repeated later in the string
>>> firstNonRepeater('')
>>> firstNonRepeater('a')
'a'
>>> firstNonRepeater('aa')
>>> firstNonRepeater('ab')
'a'
>>> firstNonRepeater('aab')
'b'
>>> firstNonRepeater('aba')
'b'
"""
Even this small set of tests reveals problems that need to be fixed:
**********************************************************************
File "264433.py", line 8, in __main__.firstNonRepeater
Failed example:
firstNonRepeater('')
Exception raised:
Traceback (most recent call last):
File "/usr/lib/python3.9/doctest.py", line 1336, in __run
exec(compile(example.source, filename, "single",
File "<doctest __main__.firstNonRepeater[0]>", line 1, in <module>
firstNonRepeater('')
File "264433.py", line 32, in firstNonRepeater
if smap[out] > 1:
KeyError: ''
**********************************************************************
File "264433.py", line 16, in __main__.firstNonRepeater
Failed example:
firstNonRepeater('aab')
Expected:
'b'
Got nothing
**********************************************************************
1 items had failures:
2 of 6 in __main__.firstNonRepeater
***Test Failed*** 2 failures.
Explore related questions
See similar questions with these tags.
collections
/itertools
etc okay? \$\endgroup\$