Optimal algorithmic complexity of "a nonrepetitive stack"?

Question 1

I'm wondering about the optimal complexity - or at the very least, some way of achieving non-terrible complexity - of a particular stack variant, that I'm calling a 'nonrepetitive stack'.

A nonrepetitive stack is like an ordinary stack, except that a push that would result in a repeated subsequence fails, not updating the stack but instead returning the location of the subsequence that would be repeated. (To disambiguate, because "repeated subsequence" can mean multiple things: I mean multiple contiguous copies of contiguous subsequences. If you treat the stack as a string, something matching .*(.+)1円.*.)

(Assume the usual model, e.g. comparing two individual items for equality is $O(1)$.)

The completely naive approach would be to check the entire stack for any repeated subsequences after each push, and undo the push and fail if one is found. Each check, and thus push, appears to be $O(n^3)$ in the current size of the stack, worst-case.

We can do somewhat better by instead noting that the stack can never actually include any repeated substrings (as you cannot introduce a repeated substring by popping from a stack, and pushes that would introduce a repeated substring instead fail), and so a push only needs to check potential repeats that include the top of stack. This gets us down to $O(n^2)$ in the size of the stack per push.

Some (terrible) Python pseudocode for this approach, to illustrate (again: this code is just to illustrate. Please don't focus on the exact code.)

def push(s, x):
 s.append(x)
 for i in range(1, len(s)//2 + 1):
 # this comparison is _not_ O(1) time.
 if s[-2*i:-i] == s[-i:]:
 ret = s[-i:], len(s)-2*i, len(s)-i
 s.pop()
 return True, *ret
 return False, None
def pop(s):
 return s.pop()
def check(act, exp):
 assert act == exp, (act, exp)
s = []
check(push(s, "A"), (False, None))
check(push(s, "B"), (False, None))
check(push(s, "A"), (False, None))
check(s, ["A", "B", "A"])
# ABAB would have a repeated subsequence AB AB
check(push(s, "B"), (True, ["A", "B"], 0, 2))
check(pop(s), "A")
# ABB would have a repeated subsequence B B
check(push(s, "B"), (True, ["B"], 1, 2))

(Worst-case complexity here is $\sum_{i=1}^{n/2 + 1}i$, which is $O(n^2)$. Best-case for a successful push is if each array comparison compares one element and then short-circuits, which works out to $O(n)$. [n.b. I know that CPython's array comparison with slices I showed doesn't actually work that way.] Best-case for an unsuccessful push is, of course, $O(1)$.)

Is there a way of doing better here? In particular, is it possible to achieve (amortized) worst-case complexity for pushes that is sublinear in the size of the stack?

Question 2

Brainstorming: Can you get anywhere by storing the set of substrings in a trie? (reading out symbols in the stack starting from the top of stack and proceeding to bottom of stack) Would a dynamic suffix array be useful?

Question 3

I've been thinking / attempting to use something involving storing suffixes or substrings; the major difficulty I keep running into is that "every" suffix changes every time you add /remove a character. I haven't been able to figure out a way of handling this efficiently, either with a standard suffix tree (or suffix array) or in general.

Question 4

An arbitrary subsequence can occur an arbitrary number of times in the stack without being repetitive. E.g. ABaABbABcABdABeABfABgABhABi.... So even assuming one somehow stores and updates the set of substrings efficiently, one goes to look up AB and gets O(n) matches that one must somehow sort through. There very well might be a way to get an overall speedup even so; I have as of yet been unable to find one.

Question 5

There are apparently sophisticated algorithms to maintain a suffix array even as the underlying string is changing, with polylog time per update to the string. So if a dynamic suffix array would help, there might be solutions for how to update it. That said, I can't quite tell how to use a suffix array, even if there were a way to update it efficiently.

Question 6

Kosolobov [1] solved this exact problem. The first algorithm in the paper supports stack operations on a string while detecting a repeated substring, and each operation takes amortized $O(\log m)$ time where $m$ is the maximum string length so far.

The algorithm works for unordered alphabets (only equality comparison between characters are allowed), and the time complexity is essentially optimal in this setting because detecting a repeated substring requires $\Omega(n \log n)$ time for unordered alphabets [2].

[1]: Kosolobov, Dmitry. "Online detection of repetitions with backtracking." Annual Symposium on Combinatorial Pattern Matching. Springer, Cham, 2015. https://arxiv.org/abs/1412.4471
[2]: Main, Michael G., and Richard J. Lorentz. "An O(n log n) algorithm for finding all repetitions in a string." Journal of Algorithms 5.3 (1984): 422-432.

Question 7

Great! I'll have to take a look through, but this appears to be exactly what I need. Out of curiosity, did you go hunting for this, or had you already seen it? (I'm trying to improve my skills at seeking such things out...)

Question 8

Yep, that looks promising. Thank you!

Question 9

@TLW I found the paper using Google Scholar. I used search terms such as "online" (en.wikipedia.org/wiki/Online_algorithm ) "square" (repeated substring) "algorithm" (filter out non-CS usage of "square" and "online" words) and also used "Cited by" list.

Question 10

Mm. Looks like square was the key term there, which is a term I'd forgotten. I also kept looking for data structures as opposed to online algorithms. Thanks for the answer and tips!

pcpthm pcpthm 2,9626 silver badges16 bronze badges · Accepted Answer · 2022-10-01 09:06:04Z

Kosolobov [1] solved this exact problem. The first algorithm in the paper supports stack operations on a string while detecting a repeated substring, and each operation takes amortized $O(\log m)$ time where $m$ is the maximum string length so far.

The algorithm works for unordered alphabets (only equality comparison between characters are allowed), and the time complexity is essentially optimal in this setting because detecting a repeated substring requires $\Omega(n \log n)$ time for unordered alphabets [2].

[1]: Kosolobov, Dmitry. "Online detection of repetitions with backtracking." Annual Symposium on Combinatorial Pattern Matching. Springer, Cham, 2015. https://arxiv.org/abs/1412.4471
[2]: Main, Michael G., and Richard J. Lorentz. "An O(n log n) algorithm for finding all repetitions in a string." Journal of Algorithms 5.3 (1984): 422-432.

Great! I'll have to take a look through, but this appears to be exactly what I need. Out of curiosity, did you go hunting for this, or had you already seen it? (I'm trying to improve my skills at seeking such things out...)
@TLW I found the paper using Google Scholar. I used search terms such as "online" (en.wikipedia.org/wiki/Online_algorithm ) "square" (repeated substring) "algorithm" (filter out non-CS usage of "square" and "online" words) and also used "Cited by" list.
Mm. Looks like square was the key term there, which is a term I'd forgotten. I also kept looking for data structures as opposed to online algorithms. Thanks for the answer and tips!

Stack Exchange Network

Optimal algorithmic complexity of "a nonrepetitive stack"?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimal algorithmic complexity of "a nonrepetitive stack"?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions