I was given a task where given a string of '0's and '1's, return the longest substring where the count of '0's and '1's are equal. I was told also that the code should be O(n) and space O(1).
Below is what I came up with:
def max_equal_sub(s):
'''
I get the substrings by iterating over the starts and the ends for each start. For each
of these, check if the counts are equal.
'''
return max((s[b:e] for b in range(len(s)) for e in range(b, len(s)) if s[b:e].count('1') == s[b:e].count('0')), key=len)
def main():
TESTS = ('000', '1111', '001010101', '0001010101', '01010101', '000000111010101',
'0000001110101011', '00000011101010111111111111111')
for binstr in TESTS:
print(max_equal_sub(binstr))
if __name__ == '__main__':
main()
This code is all working fine, but I was told that this runs at O(n2), not O(n).
Following that I have some questions:
- What makes it O(n2), and how can I make it so that it is O(n).
- I was told that it needed to be 'space O(1)'. Does it run at that? What would make it run at space O(1) and why?
- In the function
max_equal_sub
, would it be more efficient to putlen(s)
into a variable to avoid calling it again and again? - In the function
max_equal_sub
, I dos[b:e]
three times. Surely there is a more efficient way?
(Optional, but favoured):
- Are there any other things I could do to make it faster?
- You could make it PEP8-compliant along the way if you wish.
- Any other recommendations, perhaps regarding program logic.
3 Answers 3
Taking this from an algorithmic perspective, rather than a python perspective (I don't know python well enough to write something sensible in it).
Your algorithm can be expressed as the logic:
- Create a virtual list of balanced strings
- Starting from every position in the input string...
- find every substring from that point.
- check to see whether that substring is balanced
- if it is balanced, add it to the virtual list.
- from the virtual list, pull the longest substring.
Assuming the list remains virtual (yielded instead of real), then your algorithm is essentially space-complexity O(1). If the virtual list is really generated, then the space complexity is O(n2)
The time complexity is something more than O(n2). Probably closer to O(n3). For each b
value ( O(n) ) you find all e
values (also O(n) ), and then after doing that, you count each 1 and each 0, which is another two O(n) operations.
So, your code runs at O(n3) not O(n2)
Code Review is not really about helping you rewrite the code, but, there are some changes I can recommend...
Firstly, in complexity analysis, doing two O(n) operations one after the other, is still time-complexity of O(n) ( time complexity of O(n) means that if you double the size of the input data, the time to execute will double.... and that is true if you have 1 O(n) operation or 100 O(n) operations one after the other).
Use this to your advantage.
I can't spend the time to do it all for you, and I am not even certain how to arrive at a fully O(n) solution, but the process you should probably follow is:
- scan all the characters once, and count the 1's and count the 0's - O(n)
- find out which one is more common (difference between overall counts) - O(1)
- call the common character C
- call the uncommon character U
- scan all characters again, and calculate - O(n)
- where the longest sequence of C's is, and how long it is
scan all the characters again, and count the number of U characters before, and after the longest C sequence. O(n)
Decide whether it is possible to contain the complete longest C sequence in the result.
- ..... from here it gets fuzzy, and I m not sure you can keep it to
O(n)
, but you get the idea.
You Could just loop once over the array, starting with a net count 0 and then incrementing by 1 if 1, else decrementing it by 1.
First create an empty map, then do the loop: At each point in the loop, after updating the net count check if it exists in the map,
If it does not exist add it to the map with key =net_count and value=the current index.
If it does exist, then calculate
dist = current_index - map[net_count]
If dist > current_maximum
current_maximum = dist
This would be O(n) in time and in space O(1) in the best case (a 0 1 0 1) in the best case, in the worst case it would be O(n) when all zeroes our ones, on average it is as Janne Karila points out approximately 1.6 * sqrt(n)
-
\$\begingroup\$ That would probably be log n memory on average. \$\endgroup\$Suor– Suor2014年04月07日 15:06:48 +00:00Commented Apr 7, 2014 at 15:06
-
\$\begingroup\$ Thank you Suor, I am not sure, you are possible right indeed. It would be great if someone could share more light on this. \$\endgroup\$Leopoldo Salvo– Leopoldo Salvo2014年04月07日 16:20:27 +00:00Commented Apr 7, 2014 at 16:20
-
1\$\begingroup\$ My experimental result is that the average final size of the dict is approximately 1.6 * sqrt(n) \$\endgroup\$Janne Karila– Janne Karila2014年04月07日 17:08:29 +00:00Commented Apr 7, 2014 at 17:08
-
\$\begingroup\$ Square root makes sense if you compare this to brownian motion. \$\endgroup\$Suor– Suor2014年04月10日 07:39:35 +00:00Commented Apr 10, 2014 at 7:39
This is time O(n) and space O(1). I was going to describe the algorithm, but seeing the code is easier.
(forgive any clumsy python - I'm a Java dev):
def longestBalancedSubstring(input) :
zerocount = onecount = max = 0
previous = input[0]
i=0
while i < len(input):
current = input[i]
if(current==previous):
if(current=='0'):
zerocount = zerocount + 1
else:
onecount = onecount + 1
if(min(zerocount, onecount) > max):
max = min(zerocount, onecount)
result = input[i-(max*2)+1:i+1]
if(current!=previous):
if(current=='0'):
zerocount = 1
else:
onecount = 1
previous = input[i]
i = i+1
return result
-
\$\begingroup\$ Returns
11
for0110
. \$\endgroup\$Janne Karila– Janne Karila2014年04月08日 06:01:08 +00:00Commented Apr 8, 2014 at 6:01 -
\$\begingroup\$ This solution achieves O(1) space complexity by only keeping count of consecutive zeros and ones before the current position. It cannot find substrings with interspersed zeros and ones. (Also, there are bugs in corner cases, but those could be fixed) \$\endgroup\$Janne Karila– Janne Karila2014年04月09日 07:23:51 +00:00Commented Apr 9, 2014 at 7:23
Explore related questions
See similar questions with these tags.