Shaving Logs via Large Sieve Inequality: Faster Algorithms for Sparse Convolution and More

Speaker

Ce Jin & Yinzhan Xu
MIT

Host

Noah Golowich
MIT
Abstract:
In sparse convolution-type problems, a common technique is to hash the input integers modulo a random prime p\in [Q/2,Q] for some parameter Q, which reduces the range of the input integers while preserving their additive structure. However, this hash family suffers from two drawbacks, which led to bottlenecks in many state-of-the-art algorithms: (1) The collision probability of two elements from [N] is O(\frac{\log N}{Q}) rather than O(\frac{1}{Q}); (2) It is difficult to derandomize the choice of p; known derandomization techniques lead to super-logarithmic overhead [Chan, Lewenstein STOC'15].
In this paper, we partially overcome these drawbacks in certain scenarios, via novel applications of the large sieve inequality from analytic number theory. Consequently, we obtain the following improved algorithms for various problems (in the standard word RAM model):

- Sparse Nonnegative Convolution: We obtain an O(t\log t)-time Las Vegas algorithm that computes the convolution A\star B of two nonnegative integer vectors A,B, where t is the output sparsity \|A\star B\|_0. Moreover, our algorithm terminates in O(t\log t) time with 1-1/\mathrm{poly}(t) probability. This simultaneously improves the O(t\log t \log \log t)-time Las Vegas algorithm [Bringmann, Fischer, Nakos SODA'22] and the Monte Carlo O(t\log t)-time algorithm with failure probability 2^{-\sqrt{\log t}} [Bringmann, Fischer, Nakos STOC'21].

- Text-to-Pattern Hamming Distances: Given a length-m pattern P and a length-n text T, we obtain an O(n\sqrt{m\log \log m})-time deterministic algorithm that exactly computes the Hamming distance between P and every length-m substring of T. This improves the previous O(n\sqrt{m}(\log m\log \log m)^{1/4})-time deterministic algorithm [Chan, Jin, Vassilevska Williams, Xu FOCS'23] and nearly matches their O(n\sqrt{m})-time Las Vegas algorithm.

- Sparse General Convolution: For sparse convolution with possibly negative input, all previous approaches required \Omega(t\log ^2 t) time, where t is the maximum of input and output sparsity, and an important question left open by [Bringmann, Fischer, Nakos STOC'21] is whether this can be improved. We make partial progress towards solving this question by giving a Monte Carlo O(t\log t) time algorithm in the restricted case where the length N of the input vectors satisfies N\le t^{1.99}.

To appear in STOC 2024. Link: https://arxiv.org/abs/2403.20326