4
$\begingroup$

I am interested in the fine-grained complexity of the following NFA infix rejection problem:

  • Input: a word $w$, a nondeterministic finite automaton $A$
  • Output: is there an infix of $w$ (i.e., some $u$ with $w = sut$ for some $s,t$) that is rejected by $A$

The naive algorithm for NFA infix rejection is to consider every infix of $w$ and run a state-set simulation of $A$ on each infix: this gives complexity $O(|w|^2 \times |w| \cdot |A|)$. Less naively, we can run $|w|$ state-set simulations from every starting position and see if we ever encounter a set of states that contains no final state (witnessing that some infix is rejected): this gives complexity $O(|w|^2 \cdot |A|)$.

In terms of lower bounds, it is easy to adapt the result of Backurs and Indyk. They show that, assuming SETH, we cannot determine whether an NFA $A'$ accepts a word $w'$ in time $O((|A'| \cdot |w'|)^{1-\epsilon})$ for any $\epsilon$. Up to adding unique delimiters at the beginning and end of the word, this easily implies that NFA infix rejection also does not have an $O((|w| \cdot |A|)^{1-\epsilon})$ algorithm assuming SETH.

Hence the question: it is possible to show a better conditional lower bound on NFA infix rejection, or can one somehow improve on the upper bound above?

Note that the problem where we want a infix accepted by $A$ is easily seen to be in $O(|w| \cdot |A|)$, simply by building $A'$ from $A$ that recognizes the language $\Sigma^* L(A) \Sigma^*$ and testing whether $A'$ accepts $w$ by state-set simulation.

(This problem is from a recent paper where we investigate related questions. We thought a bit about this NFA infix rejection problem specifically, but did not see what could be said to pinpoint its precise complexity. I am asking the question here in case there are insights from neighboring communities.)

asked Oct 8 at 12:35
$\endgroup$

2 Answers 2

1
$\begingroup$

Yes, there is a strictly stronger SETH-based conditional lower bound for NFA infix rejection, and the best general upper bound remains quadratic in $n$ with word-parallel speedups. In particular, when $m \ge n$ no general improvement below $O(n^{2-\varepsilon})$ is possible under SETH. The following binary and $\varepsilon$-free construction, together with a word-RAM upper bound, gives a complete picture.

Let $n = |w|$ and $m = |Q|$. Fix any $\alpha \in (0,1]$ and $\varepsilon > 0$. Unless SETH fails, there is no deterministic or randomized algorithm that, on all instances with $m = \Theta(n^\alpha)$, decides whether there exists a nonempty infix $u$ of $w$ with $\Delta_u(I) \cap F = \varnothing$ in time

$$O\big(n^{1+\alpha-\varepsilon}\big).$$

For $\alpha = 1$, that is $m = \Theta(n)$, this rules out $O(n^{2-\varepsilon})$. For $\alpha > 1$ the $\alpha = 1$ case already implies the same barrier. The claim holds even for binary, $\varepsilon$-free NFAs.

Reduce from orthogonal vectors on $U, V \subseteq \{0,1\}^d$ with $|U| = |V| = N$ and $d = c\log N$, which is the standard SETH setting. Operate over the binary alphabet $\Sigma = \{0,1\}$. Encode each bit with the 2-bit cell $\mathrm{enc}(0) = 01$ and $\mathrm{enc}(1) = 10$. Any concatenation of such cells may contain 00ドル$ or 11ドル$ across boundaries but never 000ドル$ or 111ドル$. Use self-synchronizing markers that contain a forbidden triple. Take headers $H_U = 0000$ and $H_V = 0001$ and a unique barrier $B = 1110$. Since cell streams avoid 000ドル$ and 111ドル$, neither a header nor the barrier can appear inside cells or straddling a cell boundary. Any substring containing 000ドルx$ aligns to a header and any substring containing 1110ドル$ aligns to the barrier. For $x \in \{0,1\}^d$ define $E_U(x) = H_U \cdot \mathrm{enc}(x_1)\cdots\mathrm{enc}(x_d)$ and $E_V(x) = H_V \cdot \mathrm{enc}(x_1)\cdots\mathrm{enc}(x_d)$, and form the text

$$w = E_U(u^{(1)})\cdots E_U(u^{(N)}) \cdot B \cdot E_V(v^{(1)})\cdots E_V(v^{(N)}).$$

By synchronization, every infix spanning $B$ decomposes uniquely as $E_U(u^{(p)}) \cdot B \cdot E_V(v^{(q)})$, and any other infix is malformed because it chops a header or has the wrong number of cells or the wrong number of barriers.

Build a binary, $\varepsilon$-free NFA $A$ as a disjoint union $A = A_{\mathrm{syn}} \uplus A_{\mathrm{pair}}$ with multiple initial states, so no $\varepsilon$ is used for the union. The component $A_{\mathrm{syn}}$ is an $\varepsilon$-free binary DFA with $O(d)$ states that accepts exactly the malformed strings. It checks that there is exactly one barrier $B$, that headers align, and that there are exactly $d$ cells after each header by maintaining alternation and a small counter. The component $A_{\mathrm{pair}}$ handles the well-formed case. It recognizes exactly strings of the shape $H_U \cdot \text{cells} \cdot B \cdot H_V \cdot \text{cells}$ and otherwise rejects. Upon $H_U$ it nondeterministically chooses $i \in \{1,\dots,d\}$, consumes exactly $d$ cells while checking that the $i$-th $U$-cell is $\mathrm{enc}(1) = 10$, then requires $B$ and $H_V$, then consumes exactly $d$ cells and accepts iff the $i$-th $V$-cell is $\mathrm{enc}(1) = 10$. Accepting states are placed only at end of input so strict superstrings do not accept. All transitions consume one input bit and there are no $\varepsilon$-moves. For a well-formed infix $x = E_U(u^{(p)}) \cdot B \cdot E_V(v^{(q)})$, $A_{\mathrm{pair}}$ accepts iff there exists $i$ with $u^{(p)}_i = v^{(q)}_i = 1$, hence $A$ rejects $x$ iff $\langle u^{(p)}, v^{(q)}\rangle = 0$. For malformed infixes $A_{\mathrm{syn}}$ accepts and thus the union accepts, so

$$\Big(\exists\ \text{nonempty infix } u \text{ of } w\ \text{with}\ u \notin L(A)\Big)\ \Longleftrightarrow\ \exists p,q\ \text{with}\ \langle u^{(p)}, v^{(q)}\rangle = 0.$$

The instance sizes satisfy $n = \Theta(N \cdot d) = \Theta(N\log N)$ and $|A| = \Theta(d)$ before padding. To realize $m = \Theta(n^\alpha)$ for any fixed $\alpha \in (0,1]$, pad by the disjoint union of $\varepsilon$-free binary DFA gadgets that never accept any substring of $w$, for example tiny DFAs for the pattern 1111ドル$, which never appears because cell streams avoid 111ドル$ and the unique barrier is 1110ドル$. This preserves the answer. The construction, including padding, takes time linear in the output size $O(n+m)$ with $n = \Theta(N\log N)$ and $m = \Theta(d)$ before padding. If an algorithm ran in $O(n^{1+\alpha-\varepsilon})$ time for all inputs with $m = \Theta(n^\alpha)$, plugging in $n = \Theta(N\log N)$ gives an algorithm for orthogonal vectors in

$$O\big((N\log N)^{1+\alpha-\varepsilon}\big) = O\big(N^{1+\alpha-\varepsilon/2}\big),$$

which contradicts SETH.

For the upper bound, use the quadratic-frontier dynamic program with bit-parallelism. Let $\Gamma$ be the reflexive $\varepsilon$-closure on $Q$. Define $\Delta_a(S) = \Gamma(\delta(\Gamma(S),a))$ and for $u = a_1\cdots a_k$ define $\Delta_u = \Delta_{a_k} \circ \cdots \circ \Delta_{a_1}$. These keep sets $\varepsilon$-closed. Compute $\varepsilon$-SCCs and the condensation and let $\pi: 2^Q \to \{0,1\}^s$ project to SCC ids. Let $F_{\mathrm{SCC}}$ be the reflexive $\varepsilon$-upward closure of SCCs containing finals. For each $a \in \Sigma$ precompute an SCC-level transfer mapping that from $\pi(T)$ produces $\pi(\Delta_a(T))$ in $O(s/W)$ operations on a word-RAM with size $W = \Theta(\log(n+m))$. Process starts sequentially to minimize space. For each $i$ from 1ドル$ to $n$ set $S \leftarrow \Gamma(I)$ and for $j$ from $i$ to $n$ update $S \leftarrow \Delta_{w[j]}(S)$ using the precomputed SCC transfer, and after each update test rejection of the current factor via $(\pi(S) \wedge F_{\mathrm{SCC}}) = 0$. The one-step predicate using a mask $\mathrm{Good}[a]$ answers whether $\Delta_a(T) \cap F = \varnothing$ for the next letter and is not used to certify the current factor. Each update takes $O(m/W)$ operations, there are $\Theta(n^2)$ updates overall, so the running time is

$$O\left(n^2 \cdot \frac{m}{W}\right).$$

The frontier uses $O(m/W)$ memory, and the precomputed tables are near-linear in $m$ and in the number of letter edges. When $m \ge n$ this matches the lower bound up to the factor $W$ and therefore cannot be improved to $O(n^{2-\varepsilon})$ in general even for binary and $\varepsilon$-free NFAs. When $m < n$ the same algorithm remains the best general bound without extra structure, and any improvement to $o(n^{1+\alpha})$ for $m = \Theta(n^\alpha)$ would contradict the lower bound at $\alpha$.

If empty infixes are allowed, first check whether $\Gamma(I)\cap F=\varnothing$. If it holds, the empty infix is already a witness and the answer is "yes". Otherwise continue, and from then on consider only nonempty infixes.

answered Oct 9 at 17:35
$\endgroup$
5
  • $\begingroup$ Hi, thanks for the answer! I see the proposed coding. I wonder if it couldn't be simplified a bit by having beginning and ending markers for the vectors (and different beginning and ending markers for the u and v vectors) so that $A_{\text{syn}}$ can accept anything that does not start by an u start marker and end by a v end marker or that contains no boundary $B$ (no need to count). I also think there may be a mistake in what $A_{\text{pairs}}$ accepts (the infix will span several $H_U,ドル not just one followed by cells, likewise after $B$). $\endgroup$ Commented Oct 9 at 22:10
  • $\begingroup$ Fortunately I think you can still define $A_{\text{pairs}}$ to have linear size. Code the vectors of $v$ as their mirror image. Now $A_{\text{pairs}}$ reads $H_U,ドル read some number of cells, guess $i$ and read a 10, move to a self-loop $q_i$ that skips until it nondeterminically chooses to read a 10, count $i$ characters, and then must see a v end marker and stop. Crucially the ending part can be mutualized between the various $q_i,ドル i.e., the nondetermininstic choices to read a 10 can jump at some point at distance $i$ from the end of one single path shared by all the $q_i$ $\endgroup$ Commented Oct 9 at 22:15
  • $\begingroup$ So up to these adjustments I think this can work. I think it's a great coding to use infix choice to guess the pair of vectors, use nondeterminism to check non-orthogonality, and take union with an automaton accepting malformed infixes. Thanks a lot for these insights! $\endgroup$ Commented Oct 9 at 22:16
  • $\begingroup$ PS: in fact as $d$ is logarithmic it's not so crucial that $A_{\text{pairs}}$ is in $O(d)$. The naive construction in $O(d^2)$ would also work (or indeed anything polynomial in $d$). $\endgroup$ Commented Oct 9 at 22:52
  • $\begingroup$ In the end, I thought the lower bound in this answer could be rewritten more clearly (and I didn't fact-check the padding argument or the log speedup in the answer), so I wrote my own answer which focuses on what I cared about in the question. Thanks a lot again! $\endgroup$ Commented Oct 13 at 10:21
0
$\begingroup$

For clarity let me write a self-contained answer inspired by the arguments by @AlexKarpy in this answer.

Let $n$ be the size of the input word and $m$ the size of the input NFA. The lower bound applies for NFAs even without epsilon-transitions. We show:

Conditionally to the OV-conjecture (i.e., conditionally to SETH), there do not exist $q \in \mathbb{N}$ and $\epsilon > 0$ such that we can solve the NFA infix rejection problem on alphabet $\Sigma = \{0,1\}$ in time $O(n^{2-ε} m^q)$.

We reduce from the orthogonal vectors problem. Given 2ドルn$ Boolean $d$-dimensional vectors $U = u_1, \ldots, u_n$ and $V = v_1, \ldots, v_n$ with $d = \log n$, we want to determine if there exist 1ドル \leq i, j \leq n$ such that $u_i$ and $v_j$ are orthogonal. Let $h \colon \Sigma^* \to \Sigma^*$ be the morphism defined by $h(0) := 01$ and $h(1) := 10$, and let $B := 11011$ and $E := 11111$ be special markers. For simplicity we assume without loss of generality that $d \geq 6$, which is the case for sufficiently large $n$. We write $u^R$ to denote the reverse of a word $u$.

Build from the input $U,V$ the following word $w$:

$$w = (B h(u_1)) \cdots (B h(u_n)) (h(v_1)^R E) \cdots (h(v_n)^R E)$$

And build an NFA $A$ as the disjunction of three NFAs $A_B, A_E, A'$:

  • $A_B$ accepts precisely the words that do not start with $B$.
  • $A_E$ accepts precisely the words that do not end with $E$.
  • $A'$ accepts precisely the words having the following form for some 0ドル \leq k < d$:

$$B ~ \Sigma^{2k} ~ 10 ~ \Sigma^* ~ (10)^R ~\Sigma^{2k} ~E$$

The word $w$ is build from $U,V$ in time $O(n d) = \tilde{O}(n)$, and the NFA $A$ can be naively built in time $O(d^2) = \tilde{O}(1)$. (Note that we can in fact build $A$ in $O(d)$, but this is not needed for the proof to work.)

We will show the following correctness claim (*): $A$ rejects some infix of $w$ iff the input features two orthogonal vectors. Assuming (*), we can conclude the proof by proceeding by contradiction: let us assume that we solve NFA infix rejection in time $O(n^{2-\epsilon} m^q)$ for some $q \in \mathbb{N}$ and $\epsilon > 0$, then we can solve the orthogonal vectors problem in time $\tilde{O}(n^{2-\epsilon})$, contradicting the OV-conjecture. So all that remains is to show claim (*).

In the forward direction, assume that $u_i$ and $v_j$ are orthogonal, and consider the infix $w_{ij} := B h(u_i) \cdots (h(v_j)^R) E$ of $w$. The infix $w_{ij}$ starts with $B$ and ends with $E$ so it is rejected by $A_B$ and $A_E$. To show that it is rejected by $A'$ (hence by $A$), let us assume by contradiction that $A'$ accepts it. Then it would witness that there is 0ドル \leq k \leq d$ such that the $k+1$-th letter of $u_i$ is 1ドル$ and the $k+1$-th letter of $v_j^R$ from the end is 1ドル$, contradicting the fact that $u_i$ and $v_j$ are orthogonal. This establishes the forward direction of (*).

In the backward direction, assume that some infix $w'$ of $w$ is rejected by $A$. As $w'$ is rejected by $A_B$ and $A_E$, we know that $w' = B w'' E$. From the definition of $h$ we know that the images of $h$ do not feature two contiguous 1ドル$'s, and from the assumption that $d \geq 6$ we know that the $B$'s and $E$'s in the definition of $w$ above are far apart. This implies that the only infixes of the form $w' = B w'' E$ in $w$ are infixes of the form $w_{ij}$ as defined in the previous paragraph. So the rejected infix $w' = B w'' E$ gives us 1ドル \leq i, j \leq n$ such that $w_{ij}$ is rejected by $A'$. This implies that, for each choice of 0ドル \leq k < d$, it cannot be the case that the $k+1$-th letter of $u_i$ and the $k+1$-th letter of $v_j$ are both 1ドル$, i.e., $u_i$ and $v_j$ are orthogonal. This establishes the backward direction of (*) and concludes the proof.

answered Oct 13 at 10:19
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.