5
\$\begingroup\$

Here is my code which tests whether a given string has unique characters. The function is_unique which does a linear search for each character in the string is \$O(n^2)\$ (if I am right). Is there any more efficient way of doing the same with more efficient DS that takes less time (or space)?

I know with some library functions it can be done more easily, such as len(set(strr)) == len(strr)

def is_unique(strr):
 """Without using any library functions"""
 def _contains(string, char):
 for c in string:
 if c == char: return True
 return False
 for index in range(len(strr)):
 if _contains(strr[:index], strr[index]): return False
 return True 
def main():
 for string in ['abc', 'abcc', 'aabc']:
 print is_unique(string)
if __name__ == '__main__':
 main()

Edit 1 (Adds constraints)

  1. Don't change the original string and memory used should be \$O(1)\$ hence sorting the string may not be the valid solution.
asked Nov 18, 2014 at 5:48
\$\endgroup\$

2 Answers 2

4
\$\begingroup\$

You can do it in linear time via frequency counting. Make an array that is indexable by characters and initialise it with 0. Run over the string, incrementing the frequency for each character until that results in a value greater than 1 or the string is exhausted. Done.

EXTENDED EXPLANATION

Here is the algorithm expressed in pseudo code, which makes it easy to assess its complexity:

var seen: array [char] of boolean
seen = false
for each c in string_to_be_tested
 if seen[c]
 return false // string is not free duplicates
 seen[c] = true
return true // no character occurs more than once

That is the algorithm as such. CodeYogi has already posted a short and succinct pythonic approximation of it (first as a post scriptum, later it was moved into the question body):

len(set(strr)) == len(strr)

This compares the cardinality of the set of characters contained in the string to its length.

The set contains each character at most once, regardless how often it was added. This means that the cardinality of the set can only be equal to the length of the string if each of the characters occurred exactly once. This is not an algorithm as such, it is a functional expression whose algorithmic complexity is more difficult to assess than that of the simpler - if more verbose - imperative pseudo code.

When I wrote the first paragraph of the answer I was thinking along the lines of its most natural/efficient expression, which is in terms of the operation BTS ('bit test and set', or _bittestandset() as a C intrinsic). Here is a rendering of the algorithm in C, for the usual null-terminated strings:

// allocation and zeroing of the bitmap elided - too ugly and not relevant
bts(bitmap, 0); // pretend we've seen the terminator, so that we drop out at the end
while (!bts(bitmap, *s++))
 ;
return s[-1] == '0円'; // arrived at the end - unique, no duplicate characters

However, the BTS operation tends to be available only in assembly languages and C/C++; hence the description of the algorithm with an array of integer counts, which is much more widely available and which allows a compact approximation of the 'test and set' as 'increment and test':

if (++counts[string[i]] > 1)
 return false

That array is a (partial) frequency count, of course. It is possible to do it exactly like this in python, but that would do neither the algorithm nor the language any justice; it would be ugly and inefficient.

answered Nov 18, 2014 at 6:07
\$\endgroup\$
3
  • 1
    \$\begingroup\$ "Make an array that is indexable by characters and initialise it with 0" I think you mean python dictionaries here ({'a': 0, 'b': 0, ...}) right? \$\endgroup\$ Commented Nov 18, 2014 at 8:33
  • \$\begingroup\$ Yes and no. The algorithm as such uses a vector of booleans indexed by character to remember which characters have already been seen; a good approximation for this in python is a set of characters. I'll amend my post to make things clearer. Thanks for pointing out the problem! \$\endgroup\$ Commented Nov 18, 2014 at 8:42
  • 1
    \$\begingroup\$ Also, an early return if we know the character encoding can be helpful too. Eg: if (len(strr) > 256) return False \$\endgroup\$ Commented Nov 19, 2014 at 2:35
1
\$\begingroup\$

Obviously, if the given string is sorted, finding an answer is a matter of a linear search (\$O(n)\$). Sorting a string is \$O(n \log n)\,ドル which surely beats \$O(n^2)\$.

answered Nov 18, 2014 at 6:43
\$\endgroup\$
2
  • \$\begingroup\$ No, its not sure if the string is sorted. \$\endgroup\$ Commented Nov 18, 2014 at 8:30
  • \$\begingroup\$ I think you mean O(nlogn) + O(n) = O(nlogn) \$\endgroup\$ Commented Nov 18, 2014 at 9:09

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.