I'm wondering about the optimal complexity - or at the very least, some way of achieving non-terrible complexity - of a particular stack variant, that I'm calling a 'nonrepetitive stack'.
A nonrepetitive stack is like an ordinary stack, except that a push that would result in a repeated subsequence fails, not updating the stack but instead returning the location of the subsequence that would be repeated. (To disambiguate, because "repeated subsequence" can mean multiple things: I mean multiple contiguous copies of contiguous subsequences. If you treat the stack as a string, something matching .*(.+)1円.*
.)
(Assume the usual model, e.g. comparing two individual items for equality is $O(1)$.)
The completely naive approach would be to check the entire stack for any repeated subsequences after each push, and undo the push and fail if one is found. Each check, and thus push, appears to be $O(n^3)$ in the current size of the stack, worst-case.
We can do somewhat better by instead noting that the stack can never actually include any repeated substrings (as you cannot introduce a repeated substring by popping from a stack, and pushes that would introduce a repeated substring instead fail), and so a push only needs to check potential repeats that include the top of stack. This gets us down to $O(n^2)$ in the size of the stack per push.
Some (terrible) Python pseudocode for this approach, to illustrate (again: this code is just to illustrate. Please don't focus on the exact code.)
def push(s, x):
s.append(x)
for i in range(1, len(s)//2 + 1):
# this comparison is _not_ O(1) time.
if s[-2*i:-i] == s[-i:]:
ret = s[-i:], len(s)-2*i, len(s)-i
s.pop()
return True, *ret
return False, None
def pop(s):
return s.pop()
def check(act, exp):
assert act == exp, (act, exp)
s = []
check(push(s, "A"), (False, None))
check(push(s, "B"), (False, None))
check(push(s, "A"), (False, None))
check(s, ["A", "B", "A"])
# ABAB would have a repeated subsequence AB AB
check(push(s, "B"), (True, ["A", "B"], 0, 2))
check(pop(s), "A")
# ABB would have a repeated subsequence B B
check(push(s, "B"), (True, ["B"], 1, 2))
(Worst-case complexity here is $\sum_{i=1}^{n/2 + 1}i$, which is $O(n^2)$. Best-case for a successful push is if each array comparison compares one element and then short-circuits, which works out to $O(n)$. [n.b. I know that CPython's array comparison with slices I showed doesn't actually work that way.] Best-case for an unsuccessful push is, of course, $O(1)$.)
Is there a way of doing better here? In particular, is it possible to achieve (amortized) worst-case complexity for pushes that is sublinear in the size of the stack?
1 Answer 1
Kosolobov [1] solved this exact problem. The first algorithm in the paper supports stack operations on a string while detecting a repeated substring, and each operation takes amortized $O(\log m)$ time where $m$ is the maximum string length so far.
The algorithm works for unordered alphabets (only equality comparison between characters are allowed), and the time complexity is essentially optimal in this setting because detecting a repeated substring requires $\Omega(n \log n)$ time for unordered alphabets [2].
- [1]: Kosolobov, Dmitry. "Online detection of repetitions with backtracking." Annual Symposium on Combinatorial Pattern Matching. Springer, Cham, 2015. https://arxiv.org/abs/1412.4471
- [2]: Main, Michael G., and Richard J. Lorentz. "An O(n log n) algorithm for finding all repetitions in a string." Journal of Algorithms 5.3 (1984): 422-432.
-
$\begingroup$ Great! I'll have to take a look through, but this appears to be exactly what I need. Out of curiosity, did you go hunting for this, or had you already seen it? (I'm trying to improve my skills at seeking such things out...) $\endgroup$TLW– TLW2022年10月01日 18:51:18 +00:00Commented Oct 1, 2022 at 18:51
-
$\begingroup$ Yep, that looks promising. Thank you! $\endgroup$TLW– TLW2022年10月02日 01:23:24 +00:00Commented Oct 2, 2022 at 1:23
-
3$\begingroup$ @TLW I found the paper using Google Scholar. I used search terms such as "online" (en.wikipedia.org/wiki/Online_algorithm ) "square" (repeated substring) "algorithm" (filter out non-CS usage of "square" and "online" words) and also used "Cited by" list. $\endgroup$pcpthm– pcpthm2022年10月02日 01:31:40 +00:00Commented Oct 2, 2022 at 1:31
-
$\begingroup$ Mm. Looks like square was the key term there, which is a term I'd forgotten. I also kept looking for data structures as opposed to online algorithms. Thanks for the answer and tips! $\endgroup$TLW– TLW2022年10月02日 01:43:04 +00:00Commented Oct 2, 2022 at 1:43
Explore related questions
See similar questions with these tags.
ABaABbABcABdABeABfABgABhABi...
. So even assuming one somehow stores and updates the set of substrings efficiently, one goes to look upAB
and gets O(n) matches that one must somehow sort through. There very well might be a way to get an overall speedup even so; I have as of yet been unable to find one. $\endgroup$