Data Structures and Algorithms
See recent articles
Showing new listings for Thursday, 2 July 2026
- [1] arXiv:2607.00118 [pdf, html, other]
-
Title: Temporal Path Covers: Dilworth Properties and Parameterized ComplexityLapo Cioni, Sotiris Kanellopoulos, Edouard Nemery, Aris Pagourtzis, Christos Pergaminelis, Manolis VasilakisSubjects: Data Structures and Algorithms (cs.DS)
The Minimum Temporal Path Cover (TPC) and Minimum Temporally Disjoint Path Cover (TDPC) problems were introduced by [Chakraborty, Dailly, Foucaud, Klasing, MFCS '24]. Both were shown to be NP-hard on temporal DAGs, while the latter is also NP-hard on temporal oriented trees. All tractable cases for T(D)PC established in that paper satisfy a temporal Dilworth property, namely that the size of the minimum T(D)PC is equal to the size of the maximum antichain. This raises a natural question: is T(D)PC polynomial-time solvable under the promise that the respective Dilworth property holds? In this work, we answer this question in the affirmative for both problems, proving in fact that, under the respective promise, the size of the minimum T(D)PC is exactly equal to the Lovász number of the connectivity graph.
In another direction, we establish parameterized algorithms and hardness results for TPC and TDPC. Our main result is that TPC is W[1]-hard parameterized by the deletion distance to linear forest even for temporal graphs with two time-steps, answering in the negative an open question by Chakraborty et al. about whether an XP algorithm parameterized by treewidth plus number of time-steps can be improved to FPT. On the other hand, we prove that an FPT algorithm does exist if the vertex cover number is used as parameter instead of the treewidth in the above parameterization. We complement this with a proof that including the number of time-steps in the parameter is necessary to yield tractability, as, otherwise, both TPC and TDPC remain NP-hard even for constant vertex cover size. Along the way, we establish various other para-NP-hardness results involving structural parameters such as the pathwidth and the maximum degree of the underlying graph. - [2] arXiv:2607.00204 [pdf, other]
-
Title: Computing Smallest Suffixient Arrays in Sublinear TimeHiroto Fujimaru, Gonzalo Navarro, Francisco Olivares, Jakub Radoszewski, Giuseppe Romana, Cristian UrbinaSubjects: Data Structures and Algorithms (cs.DS)
A suffixient array is a novel data structure that, when combined with an index providing direct access on a text $T,ドル allows us to answer a variety of pattern matching queries. In this work, we show how to compute a smallest suffixient array for $T[1\dots n]$ in $O(\frac{n\log \sigma}{\sqrt{\log n}}+\min(r,\bar{r})\log^\epsilon n)$ time for any $\epsilon > 0,ドル where $\sigma$ is the alphabet size of $T$ and $r$ and $\bar{r}$ are the numbers of equal-letter runs of the Burrows-Wheeler transforms of $T$ and its reverse $\overline{T},ドル respectively. This time complexity becomes sublinear when $\sigma$ is small enough and $\min(r,\bar{r})=o(\frac{n}{\log^\epsilon n}),ドル yielding an asymptotic improvement over state-of-the-art algorithms. We also present a series of connected algorithmic results.
- [3] arXiv:2607.00389 [pdf, html, other]
-
Title: Efficient LCE Queries and Lexicographic Minimizers on Sliding Suffix TreesSubjects: Data Structures and Algorithms (cs.DS)
We study longest-common-extension (LCE) queries and lexicographic minimizer maintenance on the suffix tree of a sliding window. The main difficulty is that a sliding suffix tree is maintained in an implicit Ukkonen-style form: some suffixes of the current window are not represented by leaves. We show that the longest implicit (i.e. non-leaf) suffix induces a periodic representative map that folds every implicit suffix to an explicit suffix leaf in constant time. Combined with leaf pointers [Leonard et al., PSC 2026] and a dynamic LCA data structure [Cole & Hariharan, SICOMP 2005], this yields a linear-space data structure with amortized constant-time window shifts and worst-case constant-time LCE queries over a constant-size alphabet. For minimizers, the LCE structure gives a direct exact solution, but it uses more machinery than fixed-depth comparisons require. We therefore give an alternative LCE-free algorithm that reports minimizers in constant time per window shift, which is built on BP-linked suffix trees [Sumiyoshi et al, SPIRE 2024] and a standard order maintenance data structure (e.g. [Bender et al., ESA 2002]).
- [4] arXiv:2607.00536 [pdf, html, other]
-
Title: Online Matching with Size-Based and Convex DelaysComments: To appear in APPROX 2026Subjects: Data Structures and Algorithms (cs.DS)
We study the online min-cost perfect matching with delay (MPMD) problem where $m$ requests arrive in a metric space of $n$ points. In MPMD, an algorithm can choose to match a request or to delay, and the objective is to minimise the sum of connection and delay costs. The connection cost of a match is the distance between the locations of two matched requests in the metric, and the increase of the delay cost is a function of the set of unmatched requests at every moment. In this paper, we study two different types of delay functions, size-based (MPMD-Size) and convex delays (MPMD-Convex).
The study of MPMD-Size was initiated by Deryckere and Umboh (APPROX/RANDOM 2023) where the instantaneous delay increment is a non-negative monotone function of the number of unmatched requests. Our bounds are in terms of $n,ドル as opposed to Deryckere and Umboh's bounds that depend on $m$. Our results settle the deterministic competitive ratio (up to constants). At the heart of these results is a succinct encoding scheme of MPMD-Size on a given $n$-point metric as a metrical task system problem on a 2ドル^{n-1}$-point metric.
We also consider MPMD-Convex proposed by Liu et al. (ISAAC 2018) where the delay cost incurred by each request is a uniform convex delay function of the time difference between its arrival time and the moment that it is matched by the algorithm. They focused on delay functions $f$ that are unbounded, non-decreasing, continuous, and satisfy $f(0)=f'(0)=0,ドル and showed that the deterministic competitive ratio is $\Omega(n)$ for $n$-point uniform metrics. We show that, surprisingly, when $f$ is a non-negative, monotone polynomial with $f'(0)>0,ドル there is an $O(1)$-competitive deterministic algorithm for uniform metrics. Our result completes our understanding of MPMD-Convex on uniform metrics for a broad class of functions. - [5] arXiv:2607.00612 [pdf, html, other]
-
Title: Online computation of maximal closed substringsSubjects: Data Structures and Algorithms (cs.DS)
A non-empty string is closed if its length is one or its longest border appears exactly twice in the string. An occurrence of a closed substring is a maximal closed substring (MCS) if it cannot be extended to the left or to the right while preserving closedness. MCSs can be regarded as a general class of maximal repetitive structures including runs. In this paper, we study the computation of MCSs of a string given in an online manner, where one character is appended to the string at a time. Our algorithm detects newly formed MCSs after each append operation by using the rightmost previous occurrences of suffixes. To support this efficiently, we introduce the link-cut suffix tree (LCST), a novel data structure combining an online suffix tree with a link-cut tree. The LCST maintains rightmost occurrence information for substrings represented in the suffix tree in $O(n \log n)$ total time and $O(n)$ space, where $n$ is the length of the input string. Using the LCST, we obtain an $O(n \log n)$-time online algorithm for computing all MCSs, which is worst-case optimal. As further direct applications of the LCST, we obtain online algorithms for rightmost LZ77 factorizations and most recent match queries.
- [6] arXiv:2607.00843 [pdf, html, other]
-
Title: Submodular Maximization over Many Matroids via Ordered Local SearchSubjects: Data Structures and Algorithms (cs.DS)
Given a monotone submodular function, we consider the problem of finding a maximum-valued set in the intersection of $k$ matroids. Our main result is a polynomial time local search based algorithm achieving a $\frac{k}{2} + o(k)$ approximation guarantee. This asymptotically matches the best-known guarantee of $\frac{k}{2} + \epsilon$ in the unweighted setting by Lee, Sviridenko, and Vondrák (2009). Prior to this work, the state-of-the-art was a $\frac{\ln(4)k}{1+\ln(2)} + o(k)$-approximation algorithm obtained by Feldman and Ward (2026). Our approach extends to Matroid $k$-Parity yielding the same approximation guarantee.
In contrast to the weight bucketing approach underlying the recent advances of Singer and Thiery (2025) and Feldman and Ward (2026), our algorithm processes elements greedily in decreasing order of marginal value and searches for sufficiently profitable swaps, whose gain exceeds a parameter $\alpha$ given as a function of $k$. We further combine this idea with the weight bucketing approach to obtain improved guarantees for weighted $k$-Set Packing. Our second main result is a $\frac{\ln(4)k}{3} + o(k)$-approximation algorithm for weighted $k$-Set Packing, improving on the state of the art $\frac{k}{2.00561} + O(1)$-approximation by Neuwohner (2023). - [7] arXiv:2607.00857 [pdf, html, other]
-
Title: Warm-Starting All-Pairs Shortest Paths with PredictionsSubjects: Data Structures and Algorithms (cs.DS)
One of the three key hypotheses of fine-grained complexity asserts that computing All-Pairs Shortest Paths (APSP) requires cubic time, up to subpolynomial factors, in the worst case. We initiate the study of APSP in the paradigm of algorithms with predictions, also known as learning-augmented algorithms. We propose an APSP algorithm that takes as additional input a \emph{prediction} (e.g., given by a model learned from similar instances seen in the past) consisting of sets of vertices causing the shortest \emph{detour} for each pair of vertices. The algorithm runs in time $\mathcal{O}(n^{2.83} + \eta n),ドル where $\eta$ denotes the \emph{prediction error} defined as the number of pairs of vertices for which, informally speaking, the prediction was not sufficient to compute and certify optimality of the shortest path length. This is already subcubic when the prediction error is (polynomially) smaller than its maximum possible values $n^2,ドル i.e., whenever the prediction is at least slightly better than terrible.
We build on the co-nondeterministic algorithm for the Exact Triangle problem by Chan, Vassilevska Williams, and Xu (STOC 2023), essentially enabling this algorithm to detect mistakes in the nondeterministic certificate and recover from them.
Our result constitutes the first necessary step towards designing learning-augmented algorithms for problems with known fine-grained lower bounds conditioned on the APSP Hypothesis. - [8] arXiv:2607.00876 [pdf, html, other]
-
Title: The Binary Tree Mechanism is Optimal for Approximate Differentially Private Continual CountingSubjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Private continual counting is a fundamental problem in differential privacy: given a binary stream of length $n,ドル where each 1ドル$ corresponds to the contribution of one individual, the goal is to release all running counts while protecting the privacy of each individual. The standard algorithm is the binary tree mechanism, whose Gaussian-noise variant achieves expected $\ell_\infty$ error proportional to $\log^{3/2} n$ for approximate differential privacy. Whether this dependence on the stream length is necessary has remained a central open problem.
In this work, we resolve the dependence on $n$ by proving that every differentially private mechanism for continual counting must incur expected $\ell_\infty$ error $\Omega(\log^{3/2} n)$. This shows that the binary tree mechanism is asymptotically optimal in the approximate-DP setting.
As a consequence, we also obtain a largest-possible separation between hereditary discrepancy and private $\ell_\infty$ error for linear queries, showing that the known general upper bound in terms of hereditary discrepancy has the optimal dependence on the number of queries. - [9] arXiv:2607.00878 [pdf, html, other]
-
Title: Improved Approximation Algorithms for Parallel Task Scheduling and Multiple Cluster SchedulingComments: to appear in SPAA 2026Subjects: Data Structures and Algorithms (cs.DS)
In the problem of Parallel Task Scheduling (PTS), we are asked to schedule $n$ jobs, each with a fixed processing time and machine requirement, such that the completion time of the last job is minimized. Jansen and Rau (2019) presented an algorithm for PTS that achieves an approximation ratio of $(3/2)\text{OPT} + p_{\max}$. They additionally posed the open question whether an approximation ratio of $(4/3)\text{OPT} + p_{\max}$ is possible. In this work, we present such an algorithm with a running time of $O(n\log n)$.
The problem of Multiple Cluster Scheduling (MCS) is a natural extension of PTS where we are given $N$ clusters each of $m$ machines to schedule jobs. Jansen and Rau (2019) adapted their PTS algorithm to MCS with the following results: (1) a 2 approximation, and (2) a near-linear 9/4 approximation if $N$ is divisible by 3. We improve the running time of their 2-approximation and generalize the 9/4 approximation to the general case. The 2-approximation for MCS is tight, since one cannot hope for an approximation ratio better than 2, unless P=NP [Zhuk, 2006].
In addition to our theoretical results, we implement our algorithm and show its practical applicability. - [10] arXiv:2607.00894 [pdf, html, other]
-
Title: Space-Optimal Sensitivity Oracles for Single-Source MincutsSubjects: Data Structures and Algorithms (cs.DS)
We study Single-Source Mincut Sensitivity Oracles: compact data structures that, when queried with an edge e, report those affected vertices whose mincut value to source $s$ changes upon the insertion or failure of e. Insertion queries were treated by Baswana, Gupta, and Knollmann [Algorithmica '22], who showed an extremely compact oracle with only O(n) space. In this work, we consider edge failure queries, which are of even greater interest, but far more challenging. The current-best approaches give O(n^2) space: either using n-1 fixed-pair oracles of O(n) space each, based on the Picard-Queyranne representation [MPS '80], or using the O(n^2) space all-pairs oracle by Baswana and Pandey [SODA '22].
-Our key result is an optimal O(n) space single-source mincut sensitivity oracle for edge failure queries. It reports the set of affected vertices in O(n) time, thus matching the state-of-the-art bounds for the insertion case.
-Additionally, we provide oracles with near-optimal query times at the cost of increasing the space to O(n^{1.5}). They can determine if any given vertex is affected by an insertion/failure of an edge in O(log n) time, or reports all affected vertices in amortized O(\log^3 n) time per vertex. Such oracles of subquadratic space were previously unknown, even for insertion.
Our main technical contribution is in establishing novel and intricate connections between two seemingly distant objects, representing two different families of mincuts. The first is the DAG representation of farthest mincuts to the source, which was the central tool introduced by Baswana, Gupta, and Knollmann. The second is the Connectivity Carcass for Steiner mincuts of Dinitz and Vainshtein [STOC '94], which generalizes well-known cactus representations of global mincuts. Our work demonstrates the relatively unexplored potential of the carcass beyond its obvious Steiner mincuts scope. - [11] arXiv:2607.00938 [pdf, other]
-
Title: Tighter bounds for weighted and unweighted shortest cycle approximationSubjects: Data Structures and Algorithms (cs.DS)
We study the problem of approximating the length of a shortest cycle in a given graph, known as the girth of the graph. The state-of-the-art approximation algorithms for unweighted graphs by Kadria et al. [SODA'22] and Roditty and Trabelsi [arXiv'25] achieve the following trade-off: for every integer $k\geq 2,ドル there is an $\tilde{O}(n^{1+2/k})$ time algorithm that achieves a $(2k/3)$-approximation for the girth in unweighted $n$-node graphs. The first result of this paper is to achieve the same trade-off for $m$-edge, $n$-node graphs with non-negative real edge weights: a 2ドルk/3$-approximation algorithm running in $\tilde{O}(m+n^{1+2/k})$ time. The dependence on $m$ is unavoidable in weighted graphs. Our result improves on the work of Kadria et al.~[SODA'23] and Ducoffe [ICALP'19 and SIDMA'21], who were only able to achieve such a trade-off for some values of $k$. We also prove new fine-grained lower bounds for girth approximation and related problems in unweighted graphs.
- [12] arXiv:2607.01007 [pdf, html, other]
-
Title: Tighter Bounds for Wheeler DeterminizationComments: 6 pages main body, 1 figureSubjects: Data Structures and Algorithms (cs.DS)
Given a Wheeler NFA $\mathcal{A},ドル the Wheeler determinization problem is to construct a Wheeler DFA $\mathcal{D}$ that accepts the same language as $\mathcal{A}$. We use the notation $n_{\mathcal{A}},m_{\mathcal{A}}$ for the number of vertices and edges of $\mathcal{A},ドル and equivalently $n_{\mathcal{D}},m_{\mathcal{D}}$ for $\mathcal{D}$. Alanko et al. [SODA 2020, Inf. Comp. 2021] show that we can solve this problem in $O(n_{\mathcal{A}}^3)$ time. In this paper, we show how to improve the running time to $O(n_{\mathcal{A}} + m_{\mathcal{A}} + n_{\mathcal{D}} + m_{\mathcal{D}})$ when given the Wheeler order of $\mathcal{A}$ (which can be computed in $O(m_{\mathcal{A}}\log n_{\mathcal{A}})$ with an algorithm by Becker et al. [ESA 2023]).
Our running time is a factor $n_{\mathcal{A}}^2/\sigma$ faster than the state of the art, where $\sigma$ is the size of the alphabet. Furthermore, for $\sigma=O(1)$ we have the first linear time algorithm for this problem. We show that our bound is tight for sorted inputs with any combination of $n$ and $\sigma,ドル by giving a family of inputs for which our output $\mathcal{D}$ is minimum, and of maximum size $\Theta(n\sigma)$. - [13] arXiv:2607.01216 [pdf, html, other]
-
Title: Query Complexity of Hypergraph Connectivity and Learnability using CUT OraclesSubjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
We investigate the power of CUT queries to reveal the structure of unknown hypergraphs. While simple graphs allow for optimal $O(n)$-query connectivity algorithms, hypergraphs face a fundamental identifiability barrier in that distinct hypergraphs can share identical cut-profiles, making exact edge learning impossible in general, a primitive crucial in the graph connectivity algorithms.
We first present a zero-error randomized algorithm that identifies the connected components of any weighted hypergraph using $O(n)$ expected queries, matching the $\Omega(n)$ lower bound. This approach bypasses the reconstruction barrier by introducing the notion of ``independent families'' -- vertex subpartitions that do not share hyperedges -- and iteratively coarsening them using auxiliary weighted graph connectivity techniques [Liao-Chakrabarty, 2024].
Second, we demonstrate that the impossibility of exact learning depends on hyperedge parity. For even-parity hypergraphs, we show that the structure is reconstructible using a Möbius transform on the CUT function to implement binary-search-style vertex identification. This yields deterministic algorithms for obtaining $k$-connectivity certificates for $r$-bounded even hypergraphs in $\tilde{O}_r(kn)$ queries. Finally, we bypass parity and rank constraints for linear hypergraphs, achieving a subquadratic $\tilde{O}(kn^{1.5})$ query complexity for $k$-connectivity. This significantly improves upon the general $\tilde{O}(n^2)$ bound derived via symmetric submodular function minimization.
New submissions (showing 13 of 13 entries)
- [14] arXiv:2607.00244 (cross-list from cs.CC) [pdf, html, other]
-
Title: Independent Set Hardness in Graphs of Bounded Twin-Width and Low-Radius Merge-WidthComments: 18 pages, 2 figuresSubjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
For every $\varepsilon > 0,ドル Max Independent Set admits a polynomial-time $n^\varepsilon$-approximation algorithm on $n$-vertex graphs of effectively bounded twin-width [Bergé et al., STACS '23]. The approximation factor actually obtained is more precisely $n^{O(1/ \log \log n)}$. Prior to the current paper, no approximation hardness was known for this problem, and the existence of a polynomial-time approximation scheme (PTAS) was repeatedly raised as an open question. We answer this question in a strong sense: We show that there is a constant $\gamma > 0$ such that a polynomial-time $n^{\gamma/ (\log \log n)^2}$-approximation algorithm for Max Independent Set on graphs of twin-width at most 4 would refute the Exponential-Time Hypothesis (ETH). This lower bound further holds if a 4-sequence is provided as part of the input. We show the same hardness of approximation for Min Coloring, which also has a nearly matching $n^{O(1/ \log \log n)}$-approximation algorithm on graphs of effectively bounded twin-width.
We also clarify the parameterized complexity of $k$-Independent Set on graphs of bounded radius-$r$ merge-width when the range of $r$ is limited. There is a fixed-parameter tractable algorithm for $k$-Independent Set on graphs given with radius-2ドル^{O(k^2)}$ merge sequences of bounded width [Dreier and Toruńczyk, STOC '25]. We complement this result by showing that $k$-Independent Set is W[1]-hard on graphs given with radius-$o(k)$ merge sequences of bounded width. We further show that this result also holds for $k$-Dominating Set. - [15] arXiv:2607.00252 (cross-list from cs.LG) [pdf, html, other]
-
Title: Distributionally Robust Linear Regression With Block Lewis WeightsComments: ICLR 2026. Comments welcome!Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)
We present an algorithm for the group distributionally robust (GDR) least squares problem. Given $m$ groups, a parameter vector in $\mathbb{R}^d,ドル and stacked design matrices and responses $\mathbf{A}$ and $\mathbf{b},ドル our algorithm obtains a $(1+\varepsilon)$-multiplicative optimal solution using $\widetilde{O}(\min\{\mathsf{rank}(\mathbf{A}),m\}^{1/3}\varepsilon^{-2/3})$ linear-system-solves of matrices of the form $\mathbf{A}^{\top}\mathbf{B}\mathbf{A}$ for block-diagonal $\mathbf{B}$. Our technical methods follow from a recent geometric construction, block Lewis weights, that relates the empirical GDR problem to a carefully chosen least squares problem and an application of accelerated proximal methods. Our algorithm improves over known interior point methods for moderate accuracy regimes and matches the state-of-the-art guarantees for the special case of $\ell_{\infty}$ regression. We also give algorithms that smoothly interpolate between minimizing the average least squares loss and the distributionally robust loss.
- [16] arXiv:2607.00263 (cross-list from math.CO) [pdf, html, other]
-
Title: Determining the Complexity of Chromatic Sum in Classes Defined by a Set of Forbidden GraphsSubjects: Combinatorics (math.CO); Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
The Chromatic Sum problem asks, given a graph $G$ and an integer $k,ドル whether $G$ admits a colouring $c$ with sum $\sum_{v\in V}c(v) \leq k$. We study the complexity of Chromatic Sum on graph classes defined by some set of forbidden graphs. First, we show that three known frameworks fully classify the complexity of Chromatic Sum on $HH$-minor-free graphs and $HH$-topological-minor-free graphs for any set of graphs $HH,ドル and on $HH$-subgraph-free graphs for any finite set of graphs $HH$. To show this, we prove a new NP-completeness result for Chromatic Sum on certain subdivisions of planar subcubic graphs. Next, we consider other containment relations. We formalise a novel framework of problems that are NP-complete for planar graphs as well as for graphs of bounded independence number. For every problem in this framework, we obtain an almost complete complexity classification on $H$-induced-minor-free graphs, $H$-induced-topological-minor-free graphs, and $H$-free graphs for every graph $H$. We show that Chromatic Sum belongs to this framework, as do several other problems. We also define a more fine-grained framework for the induced subgraph relation. We apply this to obtain a complete complexity classification for Chromatic Sum on $H$-free graphs, as well as for several other problems. We justify the choice of this framework by proving that Chromatic Sum is NP-complete for graphs of clique-width at most 3ドル$. This result complements a known polynomial-time result for graphs of clique-width at most 2ドル$.
- [17] arXiv:2607.00328 (cross-list from math.OC) [pdf, html, other]
-
Title: Killing the Case for Randomization in Dynamic Assortment OptimizationComments: 21 pages in main text, 8 pages in appendix, 2 figures in appendixSubjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS)
One of the traditional approaches for constructing approximate policies for dynamic assortment optimization problems is to use sampling-based inventory-agnostic policies. Such policies are called sampling-based, as they sample an assortment of products from a fixed distribution at each time period to offer to a customer of each type. Such policies are called inventory-agnostic, as the sampled assortments may include products without remaining inventories, so if a customer chooses a product without remaining inventories, then she leaves without a purchase. Inventory-agnostic nature of a policy is not a concern, because it is known that if the policy samples an assortment that includes products without remaining inventories, then dropping the products without remaining inventories does not degrade the performance. However, sampling-based nature of a policy is a concern, because sampling brings another source of uncertainty in the performance. In this paper, we give an algorithm to de-randomize any sampling-based inventory-agnostic policy, so the de-randomized policy offers a deterministic sequence of assortments within the support of the original policy without degrading the performance. Furthermore, we give a variation of our de-randomization algorithm that searches for a deterministic sequence of assortments beyond the support of the original policy. We show that we can implement the latter variation efficiently as long as we can solve the static assortment optimization problem under the choice model governing the choice process of the customers. As our crowning technical contribution, we study locally-optimal deterministic policies, where changing any single one of the assortments in the policy does not improve the total expected revenue. We show that any locally-optimal policy has a performance guarantee of 1/2 - epsilon when compared with the best sampling-based policy.
- [18] arXiv:2607.00773 (cross-list from cs.LG) [pdf, html, other]
-
Title: Accelerating Discrete Diffusion Models with Parallel-In-Time SamplingComments: 33 pages, 10 figuresSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA)
Discrete diffusion models are widely used for learning and generating discrete distributions. As the generation process is inherently sequential, the acceleration of sampling is of significant importance. In this work, we parallelize the mainstream $\tau$-leaping algorithm for absorbing discrete diffusion in a Continuous-Time Markov Chain (CTMC) framework. By leveraging the continuous-time stochastic integral form of the $\tau$-leaping algorithm and the Picard iteration method, we achieve parallel-in-time sampling acceleration and provide a proof of exponential-factorial convergence for our algorithm. We improve the overall time complexity of $\tau$-leaping under absorbing settings from ${\mathcal{O}}(d \log S)$ to ${\mathcal{O}}(\log (d\log S)\cdot \log d)$ with respect to NFE. Empirically, our method shows consistent acceleration across synthetic and real-data settings. The new sampler achieves at most 7ドル$--9ドル\times$ runtime speedup for synthetic distribution, and maintains the same quality with 50ドル\%$ fewer NFE and 1ドル.45$--1ドル.86\times$ runtime speedups in image/text tasks on a single GPU. Our research expands the potential of discrete diffusion models for efficient parallel inference, with broader implications for applications such as molecular structure and language generation.
- [19] arXiv:2607.00966 (cross-list from math.PR) [pdf, html, other]
-
Title: Sharp Bounds for Dynamic Averaging on CyclesSubjects: Probability (math.PR); Data Structures and Algorithms (cs.DS)
We study a dynamic averaging process on the cycle \(C_n\). At each discrete time, an edge is chosen uniformly at random, one unit of load is introduced, and the two endpoint loads are replaced by their common average after the new unit has been added. Starting from the zero configuration, we prove that the expected gap between the largest and smallest loads is \(O(\sqrt n)\), uniformly in time. Building on the lower-bound argument of Alistarh, Nadiradze, and Sabour for the expected square of the gap, we further show that the expected gap is \(\Omega(\sqrt n)\) in the long run. This confirms their conjecture that the expected gap is of order \(\sqrt n\).
- [20] arXiv:2607.01059 (cross-list from cs.GT) [pdf, html, other]
-
Title: Fair Allocation under Conflict Constraints via Strong ColorabilityComments: 32 pagesSubjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
In the fair allocation problem under conflict constraints, the goal is to partition the vertices of a graph among agents in a fair manner, such that no two adjacent vertices are assigned to the same agent. We study this problem for agents with common preferences through the lens of three fairness criteria: stochastic-dominance envy-freeness up to one item for preference orders (SD-EF1), envy-freeness up to one item for monotone additive valuations (EF1), and envy-freeness up to one item from each side for general additive valuations (EF[1,1]). To do so, we introduce a hierarchy of variants of the strong chromatic number, a graph quantity introduced independently by Alon and Fellows in the early nineties. Our results reveal a close connection between fair allocation under conflict constraints and the first two levels of this hierarchy, providing a unified route to both existential and algorithmic results.
For SD-EF1, we fully characterize the number of agents needed to guarantee a fair allocation of a given graph for every common preference order. For EF1 and EF[1,1], we provide analogous sufficient conditions, extending a result on path graphs due to Equbal, Gurjar, Igarashi, Kumar, Manurangsi, Nath, Saxena, Vaish, and Yoneda. We also show that, unlike in the SD-EF1 setting, the sufficient conditions for EF1 and EF[1,1] are not necessary in general. Our framework yields existential and algorithmic consequences in terms of the maximum degree. We obtain that every graph with maximum degree $\Delta$ admits SD-EF1, EF1, and EF[1,1] allocations for common preferences whenever the number of agents is at least 3ドル\Delta-1$. We further provide, for any $\varepsilon>0,ドル deterministic polynomial-time algorithms that find such allocations whenever the number of agents is at least $(3+\varepsilon)\Delta$. These guarantees strengthen earlier work by Barman and Viswanathan on equitable colorings. - [21] arXiv:2607.01159 (cross-list from cs.GT) [pdf, html, other]
-
Title: Online Fair Division Meets Reordering BuffersSubjects: Computer Science and Game Theory (cs.GT); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
We study the online fair division of indivisible mixed manna among agents with additive valuation functions. Under the standard online model, at each time step an indivisible item arrives; each agent may assign it a positive, negative, or zero value, and it must be irrevocably allocated, before the arrival of the next item. At the same time, we also wish to maintain some fairness guarantee, and in this work we focus on envy-freeness (EF) and one of its most prominent relaxations, envy-freeness up to one item (EF1). Given the strong negative and the scarce positive results for this problem without additional assumptions, we augment our algorithms with buffers that can store and rearrange a limited number of items. This setting interpolates naturally between the fully online case (no buffer) and the fully offline case (a buffer large enough to hold all items). We show that algorithms equipped with reasonably sized buffers can achieve strong guarantees for personalized $k$-value instances, i.e., instances in which each agent assigns at most $k$ distinct values to items. In particular, we construct allocations that are EF1 at every time step and EF at most time steps, using a buffer of size linear in $k$ and in the number of agents. Our approach relies on novel combinatorial arguments and on constructing a sequence of envy-free matchings that allocates most items. Finally, we extend our results to general additive valuation functions, with a dependence on the largest per-agent ratio between two values of the same sign, and we also identify limitations of our approach via impossibility results on the use of buffers with smaller size.
Cross submissions (showing 8 of 8 entries)
- [22] arXiv:2106.14969 (replaced) [pdf, other]
-
Title: Hop-Constrained Metric Embeddings and their ApplicationsSubjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
In network design problems, such as compact routing, the goal is to route packets between nodes using the (approximated) shortest paths. A desirable property of these routes is a small number of hops, which makes them more reliable, and reduces the transmission costs. Following the overwhelming success of stochastic tree embeddings for algorithmic design, Haeupler, Hershkowitz, and Zuzic (STOC'21) studied hop-constrained Ramsey-type metric embeddings into trees. Specifically, embedding $f:G(V,E)\rightarrow T$ has Ramsey hop-distortion $(t,M,\beta,h)$ (here $t,\beta,h\ge1$ and $M\subseteq V$) if $\forall u,v\in M,ドル $d_G^{(\beta\cdot h)}(u,v)\le d_T(u,v)\le t\cdot d_G^{(h)}(u,v)$. $t$ is called the distortion, $\beta$ is called the hop-stretch, and $d_G^{(h)}(u,v)$ denotes the minimum weight of a $u-v$ path with at most $h$ hops. Haeupler {\em et al.} constructed embedding where $M$ contains 1ドル-\epsilon$ fraction of the vertices and $\beta=t=O(\frac{\log^2 n}{\epsilon})$. They used their embedding to obtain multiple bicriteria approximation algorithms for hop-constrained network design problems.
In this paper, we first improve the Ramsey-type embedding to obtain parameters $t=\beta=\frac{\tilde{O}(\log n)}{\epsilon},ドル and generalize it to arbitrary distortion parameter $t$ (in the cost of reducing the size of $M$). This embedding immediately implies polynomial improvements for all the approximation algorithms from Haeupler {\em et al.}. Further, we construct hop-constrained clan embeddings (where each vertex has multiple copies), and use them to construct bicriteria approximation algorithms for the group Steiner tree problem, matching the state of the art of the non constrained version. Finally, we use our embedding results to construct hop constrained distance oracles, distance labeling, and most prominently, the first hop constrained compact routing scheme with provable guarantees. - [23] arXiv:2312.11873 (replaced) [pdf, html, other]
-
Title: Nearly Optimal Internal Dictionary MatchingComments: To appear in ESA'26Subjects: Data Structures and Algorithms (cs.DS)
We study the internal dictionary matching (IDM) problem where a dictionary $\mathcal{D}$ containing $d$ substrings of a text $T$ is given, and each query concerns the occurrences of patterns in $\mathcal{D}$ in another substring of $T$. We propose a novel $O(n)$-sized data structure named Basic Substring Structure (BASS) where $n$ is the length of the text $T.$ With BASS, we are able to handle all types of queries in the IDM problem in nearly optimal query and preprocessing time. Specifically, our results include:
$\bullet$ The first algorithm that answers the CountDistinct query in $\tilde{O}(1)$ time with $\tilde{O}(n+d)$ preprocessing, where we need to compute the number of distinct patterns that exist in $T[l,r]$. Previously, the best result was $\tilde{O}(m)$ time per query after $\tilde{O}(n^2/m+d)$ or $\tilde{O}(nd/m+d)$ preprocessing, where $m$ is a chosen parameter.
$\bullet$ Faster algorithms for two other types of internal queries. We improve the runtime for (1) Occurrence counting (Count) queries to $O(\log n/\log\log n)$ time per query with $O(n+d\sqrt{\log n})$ preprocessing from $O(\log^2 n/\log\log n)$ time per query with $O(n\log n/\log \log n+d\log^{3/2} n)$ preprocessing. (2) Distinct pattern reporting (ReportDistinct) queries to $O(1+|\text{output}|)$ time per query from $O(\log n+|\text{output}|)$ per query.
In addition, we match the optimal runtime in the remaining two types of queries, pattern existence (Exists), and occurrence reporting (Report). We also show that BASS is more generally applicable to other internal query problems. - [24] arXiv:2511.15849 (replaced) [pdf, html, other]
-
Title: Connectivity-Preserving Important Separators: A Framework for Cut-Uncut ProblemsSubjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
Important separators are a cornerstone of parameterized algorithms for graph separation: they reduce an a priori enormous search space of separators to a small, structured family that can be enumerated efficiently. This principle has been remarkably successful for parameterized separation problems, but it does not address cut-uncut problems, where one must cut some connections while preserving the connectivity of a given set of terminals. These connectivity-preservation requirements create a qualitatively different type of structure, and the classical important-separator machinery no longer gives the right objects to enumerate.
We introduce connectivity-preserving important separators: separators that disconnect $s$ from $t,ドル keep a prescribed terminal set connected to $s,ドル and are extremal among separators with this property. Our main result shows that, despite the additional connectivity constraints, the number of such separators of size at most $k$ is bounded by 2ドル^{O(k^2\log k)},ドル and they can be enumerated in $O(2^{O(k^2\log k)}\cdot n\cdot T(n,m))$ time, where $T(n,m)$ is the time for computing a minimum-cardinality $s,t$-separator.
This gives a systematic extension of the important-separator method with connectivity constraints. The quadratic dependence on $k$ reflects a real phenomenon: in directed graphs, we construct instances with at least $\frac{2^{k^2-1}}{k}$ connectivity-preserving important separators of size at most $k$.
As applications, we obtain an FPT algorithm for optimizing over all minimal $s,t$-separators whose source component must contain a prescribed set $A$ and avoid a prescribed set $B,ドル a constraint pattern not expressible as a standard cut-uncut instance. We also apply the framework to Node Multiway Cut-Uncut. - [25] arXiv:2512.08392 (replaced) [pdf, html, other]
-
Title: Finding All Bounded-Length Simple Cycles in a Directed Graph -- RevisitedComments: 16 pages, 9 figuresSubjects: Data Structures and Algorithms (cs.DS)
In 2021, Gupta and Suzumura proposed a novel algorithm for enumerating all bounded-length simple cycles in directed graphs (arXiv:2105.10094). In this work, we present a concrete counter-example demonstrating that the proposed algorithm fails to enumerate certain valid cycles. Analyzing it, we pinpoint the precise step at which the original correctness proof breaks down. We also identify a gap in the original proof of the delay bound claimed. Finally, we propose algorithm \textsc{SimpleSearch} avoiding these flaws by construction, while achieving the delay bound $O(k(n + m))$ per cycle output or termination; where $k$ is the length bound, $n$ the number of nodes, and $m$ the number of edges in the finite simple directed graph $G$.
- [26] arXiv:2606.01330 (replaced) [pdf, html, other]
-
Title: On Thin Perfect Matchings up to Polylogarithmic FactorsSubjects: Data Structures and Algorithms (cs.DS)
We resolve the thin matching problem proposed by Anari, Charikar and Ramakrishnan [ACR23] up to polylogarithmic factors. Given a fractional perfect matching $x,ドル we say a perfect matching $M$ is $\alpha$-thin w.r.t. $x$ if for any cut $(S,\overline{S}),ドル we have $$ |M \cap E(S,\overline{S})| \leq \alpha\cdot x(S,\overline{S}).$$ [ACR23] conjectured that for any fractional perfect matching $x,ドル there exists a perfect matching $M$ which is $O(1)$-thin w.r.t. $x$.
First, we show that if $M$ is restricted to be in the support of $x,ドル then $\alpha \geq \Omega(n)$ and we complement this by designing an efficient algorithm that outputs an $O(n\log n)$-thin perfect matching where $n$ is the number of vertices.
Then, we relax this constraint and show that for any fractional perfect matching $x,ドル there is a perfect matching $M$ (which is not necessarily in the support of $x$) such that $M$ is $\text{polylog}(n)$-thin w.r.t. $x$. All results work for both bipartite and non-bipartite graphs. We also discuss applications to the metric distortion problem. - [27] arXiv:2312.15427 (replaced) [pdf, html, other]
-
Title: Semi-Bandit Learning for Monotone Stochastic OptimizationComments: Full version (and extension) of FOCS 2024 paper. Fixes some missing assumptions in our results for continuous distributions. Also adds extensions to censored and binary feedback settings (along with applications) Revision: We improved the $k$ dependenceSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get good (approximation) algorithms if these distributions are unknown, and the algorithm needs to learn them through repeated interactions? In this paper, we resolve this question for a large class of ''monotone'' stochastic problems, by providing a generic online learning algorithm with $\sqrt{T\log(T)}$ regret relative to the best approximation algorithm (under known distributions). Importantly, our online algorithm works in a semi-bandit setting, where in each period, the algorithm only observes samples from the random variables that were actually probed. Moreover, our result extends to settings with censored and binary feedback, where the policy only observes truncated or thresholded versions of the probed variables. Our framework applies to several fundamental problems such as prophet inequality, Pandora's box, stochastic knapsack, single-resource revenue management and sequential posted pricing.
- [28] arXiv:2602.09948 (replaced) [pdf, other]
-
Title: Non-Additive Discrepancy: Coverage Functions in a Beck-Fiala SettingComments: To appear at ESA'26Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
Recent concurrent work by Dupré la Tour and Fujii and by Hollender, Manurangsi, Meka, and Suksompong [ITCS'26] introduced a generalization of classical discrepancy theory to non-additive functions, motivated by applications in fair division. As many classical techniques from discrepancy theory seem to fail in this setting, including linear algebraic methods like the Beck-Fiala Theorem [Discrete Appl. Math '81], it remains widely open whether comparable non-additive bounds can be achieved.
Towards a better understanding of non-additive discrepancy, we study coverage functions in a sparse setting comparable to the classical Beck-Fiala Theorem. Our setting generalizes the additive Beck-Fiala setting, rank functions of partition matroids, and edge coverage in graphs. More precisely, assuming each of the $n$ items covers only $t$ elements across all functions, we prove a constructive discrepancy bound that is polynomial in $t,ドル the number of colors $k,ドル and $\log n$. - [29] arXiv:2606.31759 (replaced) [pdf, html, other]
-
Title: Decoupling Trust in Byzantine CRDTs: Fine-grained Post-Compromise Handling without Breaking CausalitySubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Conflict-free Replicated Data Types (CRDTs) provide strong eventual consistency without coordination, but classical approaches assume benign participants. In Byzantine settings, convergence is typically enforced through agreement on update validity, often relying on identity-based filtering. However, such approaches struggle in post-compromise scenarios, where a previously correct participant becomes malicious: retroactive exclusion of its updates may break causal dependencies and invalidate subsequent computations. In this paper, we decouple identity-based trust from content-based trust and introduce a fine-grained trust model that combines both dimensions. Building on deterministic reconstruction, our approach allows replicas to preserve previously accepted updates while enabling selective inclusion or exclusion based on both the originating identity (e.g., public keys) and the semantics of individual updates. Trust decisions can incorporate application-level policies, enabling precise control over the impact of each update on the system state. Our approach preserves causal consistency and enables robust and flexible handling of both Byzantine and faulty behavior in decentralized CRDT systems.