4
$\begingroup$

I'm wondering about the optimal complexity - or at the very least, some way of achieving non-terrible complexity - of a particular stack variant, that I'm calling a 'nonrepetitive stack'.

A nonrepetitive stack is like an ordinary stack, except that a push that would result in a repeated subsequence fails, not updating the stack but instead returning the location of the subsequence that would be repeated. (To disambiguate, because "repeated subsequence" can mean multiple things: I mean multiple contiguous copies of contiguous subsequences. If you treat the stack as a string, something matching .*(.+)1円.*.)

(Assume the usual model, e.g. comparing two individual items for equality is $O(1)$.)

The completely naive approach would be to check the entire stack for any repeated subsequences after each push, and undo the push and fail if one is found. Each check, and thus push, appears to be $O(n^3)$ in the current size of the stack, worst-case.

We can do somewhat better by instead noting that the stack can never actually include any repeated substrings (as you cannot introduce a repeated substring by popping from a stack, and pushes that would introduce a repeated substring instead fail), and so a push only needs to check potential repeats that include the top of stack. This gets us down to $O(n^2)$ in the size of the stack per push.

Some (terrible) Python pseudocode for this approach, to illustrate (again: this code is just to illustrate. Please don't focus on the exact code.)

def push(s, x):
 s.append(x)
 for i in range(1, len(s)//2 + 1):
 # this comparison is _not_ O(1) time.
 if s[-2*i:-i] == s[-i:]:
 ret = s[-i:], len(s)-2*i, len(s)-i
 s.pop()
 return True, *ret
 return False, None
def pop(s):
 return s.pop()
def check(act, exp):
 assert act == exp, (act, exp)
s = []
check(push(s, "A"), (False, None))
check(push(s, "B"), (False, None))
check(push(s, "A"), (False, None))
check(s, ["A", "B", "A"])
# ABAB would have a repeated subsequence AB AB
check(push(s, "B"), (True, ["A", "B"], 0, 2))
check(pop(s), "A")
# ABB would have a repeated subsequence B B
check(push(s, "B"), (True, ["B"], 1, 2))

(Worst-case complexity here is $\sum_{i=1}^{n/2 + 1}i$, which is $O(n^2)$. Best-case for a successful push is if each array comparison compares one element and then short-circuits, which works out to $O(n)$. [n.b. I know that CPython's array comparison with slices I showed doesn't actually work that way.] Best-case for an unsuccessful push is, of course, $O(1)$.)

Is there a way of doing better here? In particular, is it possible to achieve (amortized) worst-case complexity for pushes that is sublinear in the size of the stack?

喜欢算法和数学
39.2k4 gold badges35 silver badges95 bronze badges
asked Sep 30, 2022 at 2:41
$\endgroup$
4
  • 1
    $\begingroup$ Brainstorming: Can you get anywhere by storing the set of substrings in a trie? (reading out symbols in the stack starting from the top of stack and proceeding to bottom of stack) Would a dynamic suffix array be useful? $\endgroup$ Commented Sep 30, 2022 at 5:21
  • $\begingroup$ I've been thinking / attempting to use something involving storing suffixes or substrings; the major difficulty I keep running into is that "every" suffix changes every time you add /remove a character. I haven't been able to figure out a way of handling this efficiently, either with a standard suffix tree (or suffix array) or in general. $\endgroup$ Commented Sep 30, 2022 at 6:54
  • $\begingroup$ An arbitrary subsequence can occur an arbitrary number of times in the stack without being repetitive. E.g. ABaABbABcABdABeABfABgABhABi.... So even assuming one somehow stores and updates the set of substrings efficiently, one goes to look up AB and gets O(n) matches that one must somehow sort through. There very well might be a way to get an overall speedup even so; I have as of yet been unable to find one. $\endgroup$ Commented Sep 30, 2022 at 7:07
  • 1
    $\begingroup$ There are apparently sophisticated algorithms to maintain a suffix array even as the underlying string is changing, with polylog time per update to the string. So if a dynamic suffix array would help, there might be solutions for how to update it. That said, I can't quite tell how to use a suffix array, even if there were a way to update it efficiently. $\endgroup$ Commented Sep 30, 2022 at 7:48

1 Answer 1

4
$\begingroup$

Kosolobov [1] solved this exact problem. The first algorithm in the paper supports stack operations on a string while detecting a repeated substring, and each operation takes amortized $O(\log m)$ time where $m$ is the maximum string length so far.

The algorithm works for unordered alphabets (only equality comparison between characters are allowed), and the time complexity is essentially optimal in this setting because detecting a repeated substring requires $\Omega(n \log n)$ time for unordered alphabets [2].

  • [1]: Kosolobov, Dmitry. "Online detection of repetitions with backtracking." Annual Symposium on Combinatorial Pattern Matching. Springer, Cham, 2015. https://arxiv.org/abs/1412.4471
  • [2]: Main, Michael G., and Richard J. Lorentz. "An O(n log n) algorithm for finding all repetitions in a string." Journal of Algorithms 5.3 (1984): 422-432.
answered Oct 1, 2022 at 9:06
$\endgroup$
4
  • $\begingroup$ Great! I'll have to take a look through, but this appears to be exactly what I need. Out of curiosity, did you go hunting for this, or had you already seen it? (I'm trying to improve my skills at seeking such things out...) $\endgroup$ Commented Oct 1, 2022 at 18:51
  • $\begingroup$ Yep, that looks promising. Thank you! $\endgroup$ Commented Oct 2, 2022 at 1:23
  • 3
    $\begingroup$ @TLW I found the paper using Google Scholar. I used search terms such as "online" (en.wikipedia.org/wiki/Online_algorithm ) "square" (repeated substring) "algorithm" (filter out non-CS usage of "square" and "online" words) and also used "Cited by" list. $\endgroup$ Commented Oct 2, 2022 at 1:31
  • $\begingroup$ Mm. Looks like square was the key term there, which is a term I'd forgotten. I also kept looking for data structures as opposed to online algorithms. Thanks for the answer and tips! $\endgroup$ Commented Oct 2, 2022 at 1:43

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.