Smallest subarray problem

Question 1

Say you have an array of integers like [1, 2, 3, 4, 5, 6], the problem is to find the smallest way to break up the array into sub-arrays where each sub-array satisfies the following requirement:

sub-array.first_integer and sub-array.last_integer must have a common divisor that is not 1.

So for [7, 2, 3, 4, 5, 6] the answer would be 2, because you can break it up like [7] and [2, 3, 4, 5, 6], where 2 and 6 have a common divisor of 2 (which is > 1 so it meets the requirement).

You can assume the array can be huge but the numbers are not too big. Is there a way to do this in n or n*log(n) time? I think with dp and caching n^2 is possible but not sure how to do it faster.

Here is my current attempt with a BFS, but it's still N^2:

https://onlinegdb.com/ryZwkYnA8

Edit: OK I think I found an improvement which is a polynomial times the square root of a pseudo-polynomial: n * sqrt(magnitude of the values). Seems much faster, it basically builds an adjacency list; and can be done BFS, DP, and caching all the same. Although I wonder if anything better is available?

https://onlinegdb.com/SJpl4930L

Question 2

How does the first subarray satisfy the condition? There is no divisor $> 1$ of 1ドル$.

Question 3

Also, what do you mean with smallest way? Do you mean the smallest number of subarrays in the decomposition?

Question 4

"How does the first subarray satisfy the condition? There is no divisor >1 of 1" My bad, 1 will not be in the input. Edited.

Question 5

Smallest way = How to split the original array into as small a number of subarrays as possible where all the subarrays satisfy the requirement. I updated to include my current attempt, which is a working N^2.

Question 6

In my answer I will assume that all numbers are at most $M$ big, i.e. $A[i] \le M$.

First some comments about your both approaches.

Computing the GCD takes $O(\log M)$ time. So your first approach is actually $O(N^2 \log M)$. I also don't think that your second approach is actually better than $O(N^2)$. For instance assume if all the numbers of the first half of the array are equal. Then for each of those numbers you have at least $\Theta(N)$ effort for each divisor of $N$. Which means that the complexity for such a test case is something like $O(N^2 \sqrt M)$.

Let's first discuss a very trivial (but slower) dynamic programming solution, on which I will later base a better one.

Let's define the function $f$ as $f(i)$ is the smallest number of subarrays in which you can split the prefix of size $i$ of the array (the first $i$ numbers). Also let $f(i) := \infty$ if it is not possible to split the array, and $f(0) := 0$.

It's easy to see, that you can compute $f(i)$ using the recursion:

$$f(i) = \min_{\substack{1 \le j \le i\\ \gcd(A[i], A[j]) > 1}} f(j-1) + 1$$

This formula is very trivially to implement together with dynamic programming, and will have the complexity $O(N^2 \log M)$.

f(0) = 0
for i = 1 to N:
 f(i) = infinity
 for j = 1 to i:
 if gcd(A[i], A[j]) > 1:
 f(i) = min(f(i), f(j-1)) + 1
print f(N)

Now to a better approach. We need two mathematical facts that will help us:

If two numbers have a common divisor greater than one, then they have a common prime factor.
A number $\le M$ has at most $\log M$ prime factors.

Let the set of prime factors of $x$ be $P(x)$.

Then we can also rewrite the recursion for $f$ as:

$$f(i) = \min_{p \in P(A[i])} \left( \min_{\substack{1 \le j \le i\\ p ~|~ A[j]}} f(j-1) \right) + 1$$ In other word, if $A[i] = 20 = 2^2 \cdot 5$, then we look for all previous numbers who are divisible by 2, and take the minimum of $f(j-1)$, and for all previous numbers who are divisible by 5ドル$ and take the minimum of $f(j-1)$. The actual optimal value $f(i)$ will one more than the minimum of both.

If we define $g(p, i)$ as the minimum of $f(j-1)$ with $p ~|~ A[j]$ and 1ドル \le j \le i$, then the formula simplifies to:

$$f(i) = \min_{p \in P(A[i])} g(p, i) + 1$$

We can actually apply some sort of dynamic programming also to the function $g$. We store the values in a hash table. First we initialize the function for every possible prime factor with $g(p) = \infty$, and whenever a $f(j-1)$ changes, we update $g(p)$ for every $p \in P(A[j])$.

This means, after we update $f(i)$, we only need to update $O(\log M)$ different values of $g$.

This gives us the following algorithm:

# initialize g
for all p:
 g(p) = infinity
# set the first value f(0)
f(0) = 0
# update g
for p in P(A[1]):
 g(p) = min(g(p), f(0))
for i = 1 to N:
 # first compute f(i)
 f(i) = infinity
 for p in P(A[i]):
 f(i) = min(f(i), g(p)) + 1
 # and then update g
 if i < N:
 for p in P(A[i+1]):
 g(p) = min(g(p), f(i))
print f(N)

It's easy to see that this algorithm runs in $O(N \log M)$ time.

A shorter variation, but with the exact same approach and complexity, would also be:

for all p:
 g(p) = infinity
f(0) = 0
for i = 1 to N:
 f(i) = infinity
 for p in P(A[i]):
 g(p) = min(g(p), f(i-1))
 f(i) = min(f(i), g(p)) + 1
print f(N)

The only thing to discuss is, how we get the prime factorization of each number. There are loads of different approaches. For instance if $M$ is not too big, you can compute the Sieve of Eratosthenes in $O(M \log M)$ and during the computation store a prime factor for each number $\le M$. This allows to compute the prime factorization of each number in $O(\log M)$. Another option would be just to compute the prime factors on the fly, which would give take additionally $O(N \sqrt{M}$) time if you use the typical trial division until square root algorithm.

Jakube Jakube 1,60510 silver badges14 bronze badges · Answer 1 · 2020-07-03 12:33:04Z

In my answer I will assume that all numbers are at most $M$ big, i.e. $A[i] \le M$.

First some comments about your both approaches.

Computing the GCD takes $O(\log M)$ time. So your first approach is actually $O(N^2 \log M)$. I also don't think that your second approach is actually better than $O(N^2)$. For instance assume if all the numbers of the first half of the array are equal. Then for each of those numbers you have at least $\Theta(N)$ effort for each divisor of $N$. Which means that the complexity for such a test case is something like $O(N^2 \sqrt M)$.

Let's first discuss a very trivial (but slower) dynamic programming solution, on which I will later base a better one.

Let's define the function $f$ as $f(i)$ is the smallest number of subarrays in which you can split the prefix of size $i$ of the array (the first $i$ numbers). Also let $f(i) := \infty$ if it is not possible to split the array, and $f(0) := 0$.

It's easy to see, that you can compute $f(i)$ using the recursion:

$$f(i) = \min_{\substack{1 \le j \le i\\ \gcd(A[i], A[j]) > 1}} f(j-1) + 1$$

This formula is very trivially to implement together with dynamic programming, and will have the complexity $O(N^2 \log M)$.

f(0) = 0
for i = 1 to N:
 f(i) = infinity
 for j = 1 to i:
 if gcd(A[i], A[j]) > 1:
 f(i) = min(f(i), f(j-1)) + 1
print f(N)

Now to a better approach. We need two mathematical facts that will help us:

If two numbers have a common divisor greater than one, then they have a common prime factor.
A number $\le M$ has at most $\log M$ prime factors.

Let the set of prime factors of $x$ be $P(x)$.

Then we can also rewrite the recursion for $f$ as:

$$f(i) = \min_{p \in P(A[i])} \left( \min_{\substack{1 \le j \le i\\ p ~|~ A[j]}} f(j-1) \right) + 1$$ In other word, if $A[i] = 20 = 2^2 \cdot 5$, then we look for all previous numbers who are divisible by 2, and take the minimum of $f(j-1)$, and for all previous numbers who are divisible by 5ドル$ and take the minimum of $f(j-1)$. The actual optimal value $f(i)$ will one more than the minimum of both.

If we define $g(p, i)$ as the minimum of $f(j-1)$ with $p ~|~ A[j]$ and 1ドル \le j \le i$, then the formula simplifies to:

$$f(i) = \min_{p \in P(A[i])} g(p, i) + 1$$

We can actually apply some sort of dynamic programming also to the function $g$. We store the values in a hash table. First we initialize the function for every possible prime factor with $g(p) = \infty$, and whenever a $f(j-1)$ changes, we update $g(p)$ for every $p \in P(A[j])$.

This means, after we update $f(i)$, we only need to update $O(\log M)$ different values of $g$.

This gives us the following algorithm:

# initialize g
for all p:
 g(p) = infinity
# set the first value f(0)
f(0) = 0
# update g
for p in P(A[1]):
 g(p) = min(g(p), f(0))
for i = 1 to N:
 # first compute f(i)
 f(i) = infinity
 for p in P(A[i]):
 f(i) = min(f(i), g(p)) + 1
 # and then update g
 if i < N:
 for p in P(A[i+1]):
 g(p) = min(g(p), f(i))
print f(N)

It's easy to see that this algorithm runs in $O(N \log M)$ time.

A shorter variation, but with the exact same approach and complexity, would also be:

for all p:
 g(p) = infinity
f(0) = 0
for i = 1 to N:
 f(i) = infinity
 for p in P(A[i]):
 g(p) = min(g(p), f(i-1))
 f(i) = min(f(i), g(p)) + 1
print f(N)

The only thing to discuss is, how we get the prime factorization of each number. There are loads of different approaches. For instance if $M$ is not too big, you can compute the Sieve of Eratosthenes in $O(M \log M)$ and during the computation store a prime factor for each number $\le M$. This allows to compute the prime factorization of each number in $O(\log M)$. Another option would be just to compute the prime factors on the fly, which would give take additionally $O(N \sqrt{M}$) time if you use the typical trial division until square root algorithm.

Stack Exchange Network

Smallest subarray problem

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Smallest subarray problem

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions