Longest 'balanced' substring

Question 1

I was given a task where given a string of '0's and '1's, return the longest substring where the count of '0's and '1's are equal. I was told also that the code should be O(n) and space O(1).

Below is what I came up with:

def max_equal_sub(s):
 '''
 I get the substrings by iterating over the starts and the ends for each start. For each 
 of these, check if the counts are equal.
 '''
 return max((s[b:e] for b in range(len(s)) for e in range(b, len(s)) if s[b:e].count('1') == s[b:e].count('0')), key=len)
def main():
 TESTS = ('000', '1111', '001010101', '0001010101', '01010101', '000000111010101', 
 '0000001110101011', '00000011101010111111111111111')
 for binstr in TESTS:
 print(max_equal_sub(binstr))
if __name__ == '__main__':
 main()

This code is all working fine, but I was told that this runs at O(n²), not O(n).

Following that I have some questions:

What makes it O(n²), and how can I make it so that it is O(n).
I was told that it needed to be 'space O(1)'. Does it run at that? What would make it run at space O(1) and why?
In the function max_equal_sub, would it be more efficient to put len(s) into a variable to avoid calling it again and again?
In the function max_equal_sub, I do s[b:e] three times. Surely there is a more efficient way?

(Optional, but favoured):

Are there any other things I could do to make it faster?
You could make it PEP8-compliant along the way if you wish.
Any other recommendations, perhaps regarding program logic.

Question 2

Taking this from an algorithmic perspective, rather than a python perspective (I don't know python well enough to write something sensible in it).

Your algorithm can be expressed as the logic:

Create a virtual list of balanced strings
Starting from every position in the input string...
1. find every substring from that point.
2. check to see whether that substring is balanced
3. if it is balanced, add it to the virtual list.
from the virtual list, pull the longest substring.

Assuming the list remains virtual (yielded instead of real), then your algorithm is essentially space-complexity O(1). If the virtual list is really generated, then the space complexity is O(n²)

The time complexity is something more than O(n²). Probably closer to O(n³). For each b value ( O(n) ) you find all e values (also O(n) ), and then after doing that, you count each 1 and each 0, which is another two O(n) operations.

So, your code runs at O(n³) not O(n²)

Code Review is not really about helping you rewrite the code, but, there are some changes I can recommend...

Firstly, in complexity analysis, doing two O(n) operations one after the other, is still time-complexity of O(n) ( time complexity of O(n) means that if you double the size of the input data, the time to execute will double.... and that is true if you have 1 O(n) operation or 100 O(n) operations one after the other).

Use this to your advantage.

I can't spend the time to do it all for you, and I am not even certain how to arrive at a fully O(n) solution, but the process you should probably follow is:

scan all the characters once, and count the 1's and count the 0's - O(n)
find out which one is more common (difference between overall counts) - O(1)
- call the common character C
- call the uncommon character U
scan all characters again, and calculate - O(n)
- where the longest sequence of C's is, and how long it is
scan all the characters again, and count the number of U characters before, and after the longest C sequence. O(n)
Decide whether it is possible to contain the complete longest C sequence in the result.
..... from here it gets fuzzy, and I m not sure you can keep it to O(n), but you get the idea.

Question 3

You Could just loop once over the array, starting with a net count 0 and then incrementing by 1 if 1, else decrementing it by 1.

First create an empty map, then do the loop: At each point in the loop, after updating the net count check if it exists in the map,

If it does not exist add it to the map with key =net_count and value=the current index.

If it does exist, then calculate

dist = current_index - map[net_count]
If dist > current_maximum
 current_maximum = dist

This would be O(n) in time and in space O(1) in the best case (a 0 1 0 1) in the best case, in the worst case it would be O(n) when all zeroes our ones, on average it is as Janne Karila points out approximately 1.6 * sqrt(n)

Question 4

That would probably be log n memory on average.

Question 5

Thank you Suor, I am not sure, you are possible right indeed. It would be great if someone could share more light on this.

Question 6

My experimental result is that the average final size of the dict is approximately 1.6 * sqrt(n)

Question 7

Square root makes sense if you compare this to brownian motion.

Question 8

This is time O(n) and space O(1). I was going to describe the algorithm, but seeing the code is easier.

(forgive any clumsy python - I'm a Java dev):

def longestBalancedSubstring(input) :
 zerocount = onecount = max = 0
 previous = input[0]
 i=0
 while i < len(input):
 current = input[i]
 if(current==previous):
 if(current=='0'):
 zerocount = zerocount + 1
 else:
 onecount = onecount + 1
 if(min(zerocount, onecount) > max):
 max = min(zerocount, onecount)
 result = input[i-(max*2)+1:i+1]
 if(current!=previous):
 if(current=='0'):
 zerocount = 1
 else:
 onecount = 1
 previous = input[i]
 i = i+1
 return result

Question 9

Returns 11 for 0110.

Question 10

This solution achieves O(1) space complexity by only keeping count of consecutive zeros and ones before the current position. It cannot find substrings with interspersed zeros and ones. (Also, there are bugs in corner cases, but those could be fixed)

rolfl rolfl 98.1k17 gold badges219 silver badges419 bronze badges · Answer 1 · 2014-04-06 15:09:50Z

Taking this from an algorithmic perspective, rather than a python perspective (I don't know python well enough to write something sensible in it).

Your algorithm can be expressed as the logic:

Create a virtual list of balanced strings
Starting from every position in the input string...
1. find every substring from that point.
2. check to see whether that substring is balanced
3. if it is balanced, add it to the virtual list.
from the virtual list, pull the longest substring.

Assuming the list remains virtual (yielded instead of real), then your algorithm is essentially space-complexity O(1). If the virtual list is really generated, then the space complexity is O(n²)

The time complexity is something more than O(n²). Probably closer to O(n³). For each b value ( O(n) ) you find all e values (also O(n) ), and then after doing that, you count each 1 and each 0, which is another two O(n) operations.

So, your code runs at O(n³) not O(n²)

Code Review is not really about helping you rewrite the code, but, there are some changes I can recommend...

Firstly, in complexity analysis, doing two O(n) operations one after the other, is still time-complexity of O(n) ( time complexity of O(n) means that if you double the size of the input data, the time to execute will double.... and that is true if you have 1 O(n) operation or 100 O(n) operations one after the other).

Use this to your advantage.

I can't spend the time to do it all for you, and I am not even certain how to arrive at a fully O(n) solution, but the process you should probably follow is:

scan all the characters once, and count the 1's and count the 0's - O(n)
find out which one is more common (difference between overall counts) - O(1)
- call the common character C
- call the uncommon character U
scan all characters again, and calculate - O(n)
- where the longest sequence of C's is, and how long it is
scan all the characters again, and count the number of U characters before, and after the longest C sequence. O(n)
Decide whether it is possible to contain the complete longest C sequence in the result.
..... from here it gets fuzzy, and I m not sure you can keep it to O(n), but you get the idea.

Leopoldo Salvo Leopoldo Salvo 4562 silver badges3 bronze badges · Answer 2 · 2014-04-07 12:39:10Z

You Could just loop once over the array, starting with a net count 0 and then incrementing by 1 if 1, else decrementing it by 1.

First create an empty map, then do the loop: At each point in the loop, after updating the net count check if it exists in the map,

If it does not exist add it to the map with key =net_count and value=the current index.

If it does exist, then calculate

dist = current_index - map[net_count]
If dist > current_maximum
 current_maximum = dist

This would be O(n) in time and in space O(1) in the best case (a 0 1 0 1) in the best case, in the worst case it would be O(n) when all zeroes our ones, on average it is as Janne Karila points out approximately 1.6 * sqrt(n)

Thank you Suor, I am not sure, you are possible right indeed. It would be great if someone could share more light on this.
My experimental result is that the average final size of the dict is approximately 1.6 * sqrt(n)
Square root makes sense if you compare this to brownian motion.

James James 111 bronze badge · Answer 3 · 2014-04-07 22:47:49Z

This is time O(n) and space O(1). I was going to describe the algorithm, but seeing the code is easier.

(forgive any clumsy python - I'm a Java dev):

def longestBalancedSubstring(input) :
 zerocount = onecount = max = 0
 previous = input[0]
 i=0
 while i < len(input):
 current = input[i]
 if(current==previous):
 if(current=='0'):
 zerocount = zerocount + 1
 else:
 onecount = onecount + 1
 if(min(zerocount, onecount) > max):
 max = min(zerocount, onecount)
 result = input[i-(max*2)+1:i+1]
 if(current!=previous):
 if(current=='0'):
 zerocount = 1
 else:
 onecount = 1
 previous = input[i]
 i = i+1
 return result

This solution achieves O(1) space complexity by only keeping count of consecutive zeros and ones before the current position. It cannot find substrings with interspersed zeros and ones. (Also, there are bugs in corner cases, but those could be fixed)

Stack Exchange Network

Longest 'balanced' substring

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Longest 'balanced' substring

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions