How to optimize algorithm to solve the following time series problem?

Question 1

I am working on a project and came across the following problem I have to solve.

Imagine we have a time series data Ts which is an array of pair, say each element is (Xi, Yi). The length of this array is constant N but is updated regularly. Each update consists of two steps,

We remove Ts[0]
Add a new pair (X, Y) to the end of Ts

Essentially like an online update to a sliding window of size N.

Now here comes the problem. At any given times (could be after an update or before), we are given a nonzero input integer A and B. The goal is to find such i between 0 and N-1 inclusive where the term (A-Xi)/(B-Yi) is maximized.

Another constraints I would like to mention here,

(X0, X1, X2, ... X(N-1)) is non-decreasing, Xi is non-zero non-negative integer for all i.
(Y0, Y1, Y2, ... Y(N-1)) is non-decreasing, Yi is non-zero non-negative integer for all i.
A and B are both non-zero non-negative integer.
For any query, we can safely assume B > Yi for every possible i. Technically for the original problem, B could be equal to the last Y(N-1). But this is a boundary condition and we would just eliminate N-1 in this case.
We can also safely assume A > Xi for every possible i. Unlike constraint (4). It is not possible for A - Xi to be 0 in any case.

A naive approach would be to iterate through all N possibilities and easily find the solution (This is what I am doing). But I am thinking of a way to optimize this in the case of very large N. A query could come at any time for however many number of times, for example,

Query, Update, Query (1000 times), Update. Or could also be,
Query, Update, Query, Update, ...

Is there any room for optimization here? I am interested to see a proof from more mathematical standpoint if there is none as well.

Question 2

If I understand you right, it could be that B-Yi <= 0 ? And do you need abs(A+Xi / B-Yi) to be maximized, or A+Xi / B-Yi??

Question 3

B-Yi couldn't be <= 0. B > Yn (which is the largest Yi since Ys is non-decreasing).

Question 4

And so we can also assume A > Xn ? Both is fundamental so should be part of the question. Please edit

Question 5

Please include all information in the question. Don't just put clarifications in the comments -- edit the question so it contains all information relevant or needed to solve the problem. We don't want people to have to read the comments to understand the problem. Thank you!

Question 6

For sure there is room for optimisation, but you need to clarify the open points first

Question 7

First, apply the standard reduction: to solve a ratio maximization problem (sometimes called "fractional programming"), binary search on the solution $t$:

$$ \max_x \frac{f(x)}{g(x)} \geq t \iff \max_x f(x) - t g(x) \geq 0 \\ \text{assuming } g(x) > 0 \; \forall x. $$

Applying that to this problem,

$$ \begin{align} \max_i (A - X_i) - t (B - Y_i) &= \max_i A - X_i - t B + t Y_i \\ &= \max_i (-X_i + t Y_i ) + (A - t B) \end{align} $$

so it is sufficient to maintain a data structure on $X, Y$ to answer queries of form $Q_{X,Y}(t) := \max_i (-X_i + t Y_i)$.

When data is considered as 2-D points $\{(X_i,Y_i)\}$, a query is a linear function maximization. It is possible to maintain a convex hull of dynamic points to support such query in $O(\log N)$ time using $O(N)$ space [1].

Let $M$ be the maximum integer value of input then we only need $O(\log M)$ iterations of the binary search. Therefore, the final runtime is $O(\log N \log M)$ per query, and $O(\log N)$ time per update.

(Time complexity doesn't change but it is possible to use a simpler incremental convex hull data structure by utilizing the sliding window update. It may also be possible to use the non-decreasing constraint (thinking of Convex hull trick) but I'm not sure.)

[1] Brodal, Gerth Stølting, and Riko Jacob. "Dynamic planar convex hull." The 43rd Annual IEEE Symposium on Foundations of Computer Science (2002).

Question 8

Thanks, this looks very promising. Could you please elaborate on what t represents here?

Question 9

@askyfullofstars $t$ is a parameter used in binary search. It represents a potential maximum value for the ratio being maximized, to check if a solution exists above this target value.

Question 10

I am still trying to wrap my head around this concept of t here. If t is given, then it makes sense that we are able to run a O(log N) query by looking at X and Y as 2-D points. But I guess since t is also a free parameter, are we saying it is bounded by max(A, B)? Hence we can do a binary search to locate the most optimal t?

Question 11

Yes, we can do binary search to locate the most optimal $t$ because the condition is monotonic. The optimal value is always in the range $[0, A],ドル so we can use this range for the initial bounds of the binary search. The optimal value can be a non-integer, but $~ 2 \log_2 B$ precision is enough because differences between different ratios are $\gt 1/(B^2)$.

Question 12

doesn't that imply the most optimal t will always be A if we are maximizing maxi(−Xi+tYi) ?

pcpthm pcpthm 2,9626 silver badges16 bronze badges · Answer 1 · 2024-02-13 19:52:50Z

First, apply the standard reduction: to solve a ratio maximization problem (sometimes called "fractional programming"), binary search on the solution $t$:

$$ \max_x \frac{f(x)}{g(x)} \geq t \iff \max_x f(x) - t g(x) \geq 0 \\ \text{assuming } g(x) > 0 \; \forall x. $$

Applying that to this problem,

$$ \begin{align} \max_i (A - X_i) - t (B - Y_i) &= \max_i A - X_i - t B + t Y_i \\ &= \max_i (-X_i + t Y_i ) + (A - t B) \end{align} $$

so it is sufficient to maintain a data structure on $X, Y$ to answer queries of form $Q_{X,Y}(t) := \max_i (-X_i + t Y_i)$.

When data is considered as 2-D points $\{(X_i,Y_i)\}$, a query is a linear function maximization. It is possible to maintain a convex hull of dynamic points to support such query in $O(\log N)$ time using $O(N)$ space [1].

Let $M$ be the maximum integer value of input then we only need $O(\log M)$ iterations of the binary search. Therefore, the final runtime is $O(\log N \log M)$ per query, and $O(\log N)$ time per update.

(Time complexity doesn't change but it is possible to use a simpler incremental convex hull data structure by utilizing the sliding window update. It may also be possible to use the non-decreasing constraint (thinking of Convex hull trick) but I'm not sure.)

[1] Brodal, Gerth Stølting, and Riko Jacob. "Dynamic planar convex hull." The 43rd Annual IEEE Symposium on Foundations of Computer Science (2002).

Thanks, this looks very promising. Could you please elaborate on what t represents here?
@askyfullofstars $t$ is a parameter used in binary search. It represents a potential maximum value for the ratio being maximized, to check if a solution exists above this target value.
I am still trying to wrap my head around this concept of t here. If t is given, then it makes sense that we are able to run a O(log N) query by looking at X and Y as 2-D points. But I guess since t is also a free parameter, are we saying it is bounded by max(A, B)? Hence we can do a binary search to locate the most optimal t?
Yes, we can do binary search to locate the most optimal $t$ because the condition is monotonic. The optimal value is always in the range $[0, A],ドル so we can use this range for the initial bounds of the binary search. The optimal value can be a non-integer, but $~ 2 \log_2 B$ precision is enough because differences between different ratios are $\gt 1/(B^2)$.
doesn't that imply the most optimal t will always be A if we are maximizing maxi(−Xi+tYi) ?

Stack Exchange Network

How to optimize algorithm to solve the following time series problem?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to optimize algorithm to solve the following time series problem?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions