First non-repeating Character, with a single loop in Python

Question 1

I recently tried to solve the first non-repeating character problem. Please tell me if my solution is valid. I'm aiming for O(n) with a single loop.

My thinking is, it would be easy to tell you what the last non repeated character is, just:

loop each char
 if map[char] doesn't exist
 output = char
 store char in map

So if you can get the last non repeated character, wouldn't you be able to run the same algorithm to get the first one, if the loop went through the string backwards? Here's my (python) code:

def firstNonRepeater(s):
 smap = {}
 out = ''
 for i in range(len(s)-1, -1, -1):
 if s[i] not in smap:
 out = s[i]
 if s[i] in smap:
 smap[s[i]] += 1
 else:
 smap[s[i]] = 1
 
 if smap[out] > 1: 
 return None
 return out

There is some extra code to catch when there are no unique chars.

This seems to work, am I missing something?

EDIT:

@Alex Waygood has raised that my solution has an oversight, in that it will return None if it encounters a repeating character, where all instances are before the first non-repeater. See his in-depth explanation below.

Question 2

Also, what are the constraints here? Are you aiming for a solution that just uses builtin types, or are solutions that use collections/itertools etc okay?

Question 3

Python has a Counter data structure that is handy whenever you are ... counting things. Since strings are directly iterable, you can easily get a dict-like tally for how many times each character occurs in a string. And since modern Python dicts keep track of insertion order, you can just iterate over that tally, returning the first valid character.

from collections import Counter
def first_non_repeater(s):
 for char, n in Counter(s).items():
 if n == 1:
 return char
 return None
print(first_non_repeater("xxyzz")) # y
print(first_non_repeater("xxyzzy")) # None
print(first_non_repeater("xxyzazy")) # a

And if you don't want to use Counter, just add your own counting logic to the function:

tally = {}
for char in s:
 try:
 tally[char] += 1
 except KeyError:
 tally[char] = 1

Question 4

Even shorter: def first_non_repeater(s): return next((char for char, n in Counter(s).items() if n == 1), None)

Question 5

This is a great solution, pythons Counter seems very elegant here. I also had no idea that later python versions maintained insertion order in dicts, so I wasn't aware that writing your own tally was an option(since I thought there is no way to check the first occurrence). Thanks for your help. Although your answer doesn't mention that my algorithm is flawed(feel free to update), I'll accept it because I like the solution best.

Question 6

As has been pointed out in the comments and @Kraigolas's answer, the existing algorithm doesn't work. Here's why.

In the string "xxy":

The algorithm first considers the third character, "y". It finds it has not encountered it before, so records it as a preliminary out value. It adds it to smap with value 1.
The algorithm next considers the second character, "x". It finds it has not encountered this character before, either. It discards its previous preliminary out value, "y", and adopts "x" as its new preliminary out value. It adds "x" to smap with value 1.
The algorithm now moves on to the first character, "x". It realises that it has seen this one before, so adds 1 to the associated value in smap.
The algorithm has now finished iterating through the loop. Its current out value is "x", but when it looks it up in smap, it finds it has encountered "x" more than once. As a result, it assumes that there are 0 characters that it has seen only once, and returns None.

It is impossible to fix the current implementation of the algorithm while keeping to the condition that there should be "only a single loop". If you were to relax this restriction, you could "fix" your current algorithm like this:

def first_non_repeater(string):
 smap = {}
 for char in string:
 if char in smap:
 smap[char] += 1
 else:
 smap[char] = 1
 return next((k for k, v in smap.items() if v == 1), None)

Though it's generally both more idiomatic and more efficient, in python, to "ask for forgiveness" rather than "ask for permission":

def first_non_repeater(string):
 smap = {}
 for char in string:
 try:
 smap[char] += 1
 except KeyError:
 smap[char] = 1
 return next((k for k, v in smap.items() if v == 1), None)

And you can get there more cleanly by just using a defaultdict:

from collections import defaultdict
def first_non_repeater(string):
 smap = defaultdict(int)
 for char in string:
 smap[char] += 1
 return next((k for k, v in smap.items() if v == 1), None)

And I like @FMc's answer using collections.Counter more than any of these three — I think using Counter is probably the most pythonic way of doing this.

But at I say, all of these solutions break the condition that only one loop is allowed (whether that's the key concern when it comes to an efficient algorithm in python is another question).

The following would be my attempt at a solution that only uses builtin data structures (nothing from collections), and that only uses a single loop. The solution only works on python 3.6+, as it relies on dicts being ordered. Feedback welcome:

from typing import TypeVar, Union, Optional
T = TypeVar('T')
def first_non_repeating_char(
 string: str,
 default: Optional[T] = None
) -> Union[str, T, None]:
 """Return the first non-repeating character in a string.
 
 If no character in the string occurs exactly once,
 return the default value.
 Parameters
 ----------
 string: str
 The string to be searched.
 default: Any, optional
 The value to be returned if there is no character 
 in the string that occurs exactly once.
 By default, None.
 Returns
 -------
 Either a string of length 1, or the default value.
 """
 
 # Using a dictionary as an ordered set,
 # for all the characters we've seen exactly once.
 # The values in this dictionary are irrelevant.
 uniques: dict[str, None] = {}
 
 # a set for all the characters
 # that we've already seen more than once
 dupes: set[str] = set() 
 
 for char in string:
 if char in dupes:
 continue
 try:
 del uniques[char]
 except KeyError:
 uniques[char] = None
 else:
 dupes.add(char)
 # return the first key in the dictionary
 # if the dictionary isn't empty.
 # If the dictionary is empty,
 # return the default.
 return next(iter(uniques), default)

Question 7

Using return next((k for k in uniques.keys()), default) would allow you to skip the try ... except.

Question 8

Nice! Forgot about that @AJNeufeld — have edited it into my answer.

Question 9

@AJNeufeld I guess? But you don't need a generator to do that; just call iter.

Question 10

I have edited my docstring.

Question 11

@Reinderien Yes, iter(uniques) is sufficient; you don't need the .keys()

Question 12

Style

From PEP 8, function names should be lowercase and use underscores to separate words:

def first_non_repeater(s):
 ...

Implementation

In Python 3.6+ dictionaries maintain insertion order and if you're using an earlier version, PEP 372 adds collections.OrderedDict. If collections.Counter used an OrderedDict, you could solve this problem with that, but it doesn't so we will do that work instead:

from collections import OrderedDict
def first_non_repeater(s):
 smap = OrderedDict()
 for char in s:
 if char in smap:
 smap[char] += 1
 else:
 smap[char] = 1
 for char, count in smap.items():
 if count == 1:
 return char
 return None

In the best case, this is O(N), worst case it is O(2N), and average case you would expect O(3N/2), all of which is just O(N).

The room for improvement here would be to eliminate this:

for char, count in smap.items():
 if count == 1:
 return char

But removing this requires more complicated code and makes it more bug-prone with little benefit. As pointed out by @AJNeufeld, your code won't work for "xxyzz" and many other cases. Avoiding a second pass through the characters in your text will add a considerable amount of complexity to your code, and might not even make the solution faster as it will require extra checks in your initial for loop, and in the best case it still doesn't change the complexity from O(N).

Question 13

O(N), O(2N), and O(3N/2) are technically all the same... It doesn't make much sense to use Big-O this way.

Question 14

The main issue I see with this code is the lack of any unit tests. It's quite simple to invoke the test framework at the end of the module:

if __name__ == "__main__":
 import doctest
 doctest.testmod()

Now, just add some test cases:

def firstNonRepeater(s):
 """
 Returns the first character in the input string
 which is not repeated later in the string
 >>> firstNonRepeater('')
 
 >>> firstNonRepeater('a')
 'a'
 >>> firstNonRepeater('aa')
 
 >>> firstNonRepeater('ab')
 'a'
 >>> firstNonRepeater('aab')
 'b'
 >>> firstNonRepeater('aba')
 'b'
 """

Even this small set of tests reveals problems that need to be fixed:

**********************************************************************
File "264433.py", line 8, in __main__.firstNonRepeater
Failed example:
 firstNonRepeater('')
Exception raised:
 Traceback (most recent call last):
 File "/usr/lib/python3.9/doctest.py", line 1336, in __run
 exec(compile(example.source, filename, "single",
 File "<doctest __main__.firstNonRepeater[0]>", line 1, in <module>
 firstNonRepeater('')
 File "264433.py", line 32, in firstNonRepeater
 if smap[out] > 1:
 KeyError: ''
**********************************************************************
File "264433.py", line 16, in __main__.firstNonRepeater
Failed example:
 firstNonRepeater('aab')
Expected:
 'b'
Got nothing
**********************************************************************
1 items had failures:
 2 of 6 in __main__.firstNonRepeater
***Test Failed*** 2 failures.

FMc FMcFMc 13.1k2 gold badges29 silver badges40 bronze badges · Accepted Answer · 2021-07-26 23:40:14Z

Python has a Counter data structure that is handy whenever you are ... counting things. Since strings are directly iterable, you can easily get a dict-like tally for how many times each character occurs in a string. And since modern Python dicts keep track of insertion order, you can just iterate over that tally, returning the first valid character.

from collections import Counter
def first_non_repeater(s):
 for char, n in Counter(s).items():
 if n == 1:
 return char
 return None
print(first_non_repeater("xxyzz")) # y
print(first_non_repeater("xxyzzy")) # None
print(first_non_repeater("xxyzazy")) # a

And if you don't want to use Counter, just add your own counting logic to the function:

tally = {}
for char in s:
 try:
 tally[char] += 1
 except KeyError:
 tally[char] = 1

Even shorter: def first_non_repeater(s): return next((char for char, n in Counter(s).items() if n == 1), None)
This is a great solution, pythons Counter seems very elegant here. I also had no idea that later python versions maintained insertion order in dicts, so I wasn't aware that writing your own tally was an option(since I thought there is no way to check the first occurrence). Thanks for your help. Although your answer doesn't mention that my algorithm is flawed(feel free to update), I'll accept it because I like the solution best.

Stack Exchange Network

First non-repeating Character, with a single loop in Python

EDIT:

4 Answers 4

Style

Implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

First non-repeating Character, with a single loop in Python

EDIT:

4 Answers 4

Style

Implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions