## Select Seminar Series

All seminars- Home
- ›
- Studying at the Faculty
- ›
- Seminars ›
- Foundations of Computer Science Seminar

# Foundations of Computer Science Seminar

We prove the first super-logarithmic lower bounds on the cell probe complexity of dynamic *boolean* (a.k.a. decision) data structure problems, a long-standing milestone in data structure lower bounds. We introduce a new technique and use it to prove a ~ log^{1.5}(n) lower bound on the operational time of a wide range of boolean data structure problems, most notably, on the query time of dynamic range counting *over F_2* ([Patrascu07]). Proving a super-logarithmic lower bound for this problem was explicitly posed as one of five important open problems in the late Mihai Patrascu's obituary [Tho13]. This result also implies the first super-logarithmic lower bound for the classical 2D range counting problem,one of the most fundamental data structure problems in computational geometry and spatial databases. We derive similar lower bounds for boolean versions of dynamic polynomial evaluation and 2D "rectangle stabbing", and for the (non-boolean) problems of range selection and range median. Our technical centerpiece is a new way of "weakly" simulating dynamic data structures using efficient *one-way* communication protocols with small advantage over random guessing. This simulation involves a surprising excursion to low-degree (Chebychev) polynomials which may be of independent interest, and offers an entirely new algorithmic angle on the "cell sampling" method of Panigrahy et al. [PTW10].

Joint work with Kasper Green-Larsen and Huacheng Yu.

We study the classic bipartite matching problem in the online setting, first introduced in the seminal work of Karp, Vazirani and Vazirani. Specifically, we consider the problem for the well-studied class of regular graphs. Matching in this class of graphs was studied extensively in the offline setting. In the online setting, an optimal deterministic algorithm, as well as efficient algorithms under stochastic input assumptions were known. In this work, we present a novel randomized algorithm with competitive ratio tending to one on this family of graphs, under adversarial arrival order. Our main contribution is a novel algorithm which achieves competitive ratio 1-O(\sqrt{\log d}/\sqrt{d}) in expectation on d-regular graphs. In contrast, we show that all previously-known online algorithms, such as the generally worst-case optimal ranking algorithm of Karp et al., are restricted to a competitive ratio strictly bounded away from one, even as d grows. Moreover, we show the convergence rate of our algorithm's competitive ratio to one is nearly tight, as no algorithm achieves competitive ratio better than 1-O(1/\sqrt{d}). Finally, we show that our algorithm yields a similar competitive ratio with high probability, as well as guaranteeing each offline vertex a probability of being matched tending to one.

Computational social choice deals with algorithms for aggregating individual preferences or opinions towards collective decisions. AI researchers (including myself) have long argued that such algorithms could play a crucial role in the design and implementation of multiagent systems. However, in the last few years I have come to realize that the "killer app" of computational social choice is helping people -- not software agents -- make joint decisions. I will illustrate this theme through two recent endeavors: Spliddit.org, a website that offers provably fair solutions to everyday problems; and Robovote.org, which provides optimization-driven voting methods.

Throughout the talk, I will devote special attention to the theoretical foundations and results that make these services possible.

We show that there exist binary locally testable codes (for all rates) and locally correctable codes (for low rates) with rate and distance approaching the Gilbert-Varshamov bound (which is the best rate-distance tradeoff known for general binary error-correcting codes). Our constructions use a number of ingredients: Thommesen's random concatenation technique, the Guruswami-Sudan-Indyk strategy for list-decoding concatenated codes, the Alon-Edmonds-Luby distance amplification method, and the local list-decodability and local testability of Reed-Muller codes. Interestingly, this seems to be the first time that local testability is used in the construction of locally correctable codes.

Joint work with Sivakanth Gopi, Rafael Oliveira, Noga Ron-Zewi and Shubhangi Saraf

Graphs that are prevalent in current applications (the Internet, Facebook etc.) are not only very large and highly dynamic, but also distributed between many servers, none of which sees the graph in its entirety. The distributed monitoring problem deals with the question of imposing conditions on the local graphs, such that as long as they hold, it is guaranteed that some desired property holds for the global graph.

While defining local conditions for linear properties (e.g. average degree) is relatively easy, they are more difficult to derive for non-linear functions over the graph. We propose a solution and a general definition of solution optimality, and demonstrate how to apply it to two important graph properties -- spectral gap and number of triangles. We also define an absolute lower bound on the communication overhead for distributed monitoring, and compare our algorithm to it, with good results. Performance improves as the graph becomes larger and denser -- that is, when distributing it is more important.

I will talk about my recent adventures with ants. Together with biologists we study P. longicornis ants as they collaboratively transport a large food item to their nest. This collective navigation process is guided by pheromones which are laid by individual ants. Using a new methodology to detect scent marks, we identify a new kind of ant trail characterized by very short and dynamic pheromone markings and highly stochastic navigation response to them. We argue that such a trail can be highly beneficial in conditions in which knowledge of individual ants regarding the underlying topological structure is unreliable. This gives rise to a new theoretical search model on graphs under unreliable guiding instructions, which is of independent computational interest. To illustrate the model, imagine driving a car in an unknown country that is in the aftermath of a major hurricane which has randomly flipped a certain small fraction of the road-signs. Under such conditions of unreliability, how can you still reach your destination fast? I will discuss the limits of unreliability that allow for efficient navigation. In trees, for example, there is a phase transition phenomenon that occurs roughly around the inverse of the square root of the maximal degree. That is, if noise is above this threshold then any algorithm cannot avoid finding the target in exponential time (in the original distance), while below the threshold we identify an optimal, almost linear, walking algorithm. Finally, I will discuss algorithms that under such a noisy model aim to minimize the number of queries to find a target (rather than the number of moves).

This talk is based on joint works with biologists from the Weizmann Institute: Ofer Feinerman, Udi Fonio, and others, and with CS researchers: Lucas Bockowski, Adrian Kosowski, and Yoav Rodeh.

While there has been a lot of progress in designing efficient custom protocols for computing Private Set Intersection (PSI), there has been less research on using generic MPC protocols for this task. However, there are many variants of the set intersection functionality which seem hard to compute with existing custom protocols and are easy to compute with generic MPC based solutions (for example comparing the cardinality of the intersection with a threshold or measuring ad conversion rates). Generic protocols work over circuits which compute the intersection. For sets of size n the best known circuit constructions compute O(n log n) comparisons. In this work we propose new circuit-based protocols for computing variants of the intersection, with circuits computing only O(n) comparisons. Our constructions are based on a new variant of Cuckoo hashing in two dimensions. We employ several optimizations and determine experimentally the required sizes of tables and circuits, and measure the runtime, showing that our protocol is more efficient in concrete terms than existing constructions. The proof technique is new and can be generalized to analyzing simple Cuckoo hashing as well as new variants.

Joint work with Srikanth Srinivasan.

The algorithm referred to in the title builds on Luks's powerful group-theoretic divide-and-conquer method (1980) and addresses the bottleneck situation where Luks's method fails to "divide".

Luks will continue to "conquer" if an alternative method "divides"; we develop such a partitioning technique.

In the talk we shall outline the algorithm and explain in some detail its group theoretic core, the "Unaffected Stabilizers Lemma" and the "Local Certificates" routine. The Lemma is used to construct, somewhat implausibly, global automorphisms out of local information -- a key step toward the construction of combinatorial structures to which the partitioning method from the previous day's lecture will be applied, providing the required "divide" step.

I will describe two branches of my work related to algorithms for distributed networks. The main focus will be devoted for Fault-Tolerant (FT) Network Structures.

The undisrupted operation of structures and services is a crucial requirement in modern day communication networks. As the vertices and edges of the network may occasionally fail or malfunction, it is desirable to make those structures robust against failures.

FT Network Structures are low cost highly resilient structures, constructed on top of a given network, that satisfy certain desirable performance requirements concerning, e.g., connectivity, distance or capacity. We will overview some results on fault tolerant graph structures with a special focus on FT Breadth-First-Search.

The second part of the talk will discuss distributed models and algorithms for large-scale networks. Towards the end, we will see some connections between distributed computing and other areas such as EE and Biology.

Pseudo-deterministic algorithms are randomized search algorithms that on different executions on the same input, output the same solution with high probability.

We will discuss how pseudo-deterministic algorithms bridge the gap between randomized search and decision problems for problems in P and in NC. Next, we will show a pseudo-deterministic NC algorithm for bipartite matching. Finally, we will show how pseudo-determinism can be used to save on random bits used by classical randomized algorithms, and apply the method to obtain an algorithm for RNC depth first search using only O(log^2 n) random bits. This is joint work with Shafi Goldwasser.

The Operator Scaling problem asks whether a set of complex matrices can be jointly moved to a certain canonical (isotropic) position. This problem has a remarkable number of myriad incarnations: non-commutative algebra, invariant theory, arithmetic complexity, quantum information theory, analytic inequalities and more. We will describe an efficient algorithm solving all these related problems, and explain how their analysis combines ideas from all these areas.

Through these connections, the algorithm can be shown to solve some non-convex optimization problems, some systems of quadratic equations, and some linear programs with exponentially many inequalities - all these, and concrete examples we will give, suggest that it might be a powerful algorithmic tool via reductions to these problems.

No special background will be assumed!

Joint on two joint works with Ankit Garg, Leonid Gurvits and Rafael Olivera.

This talk is longer than usual and has a two-hour slot.

The sensitivity conjecture is a famous open problem in the theory of boolean functions. Let f be a boolean function defined on the hypercube. The sensitivity of a node x is the number of its neighbours in the hypercube, for which f give the opposite value as that it does on x. The sensitivity conjecture speculates that if all nodes have low sensitivity, then the function f must be simple. Concretely, all its Fourier mass is supported on levels with low hamming weight.

Recently, Gopalan et al [CCC 2016] conjectured a robust analogue of the sensitivity conjecture: if most of the nodes have low sensitivity, then most of the Fourier mass is supported on levels with low hamming weight. They also prove it under the stronger assumption that all nodes have low sensitivity. In this work, we prove this conjecture, with near tight quantitative bounds.

Joint work with Avishay Tal (IAS) and Jiapeng Zhang (UCSD).

We show an efficient method for converting a logic circuit of gates with fan-out 1 into an equivalent circuit that works even if some fraction of its gates are short-circuited, i.e., their output is short-circuited to one of their inputs. Our conversion can be applied to any circuit with fan-in k>= 2, yielding a resilient circuit whose size is polynomial in the size of the (non-resilient) input circuit.

The resilient circuit gives the correct output as long as less than 1/3 of the gates in any of its input-to-output paths are corrupted. Furthermore, we prove that a resilience level of 1/3 is optimal (maximal) for this type of faulty gates. This fully answers an open question by Kalai et al. (FOCS 2012).

Joint work with Mark Braverman and Michael Yitayew.

We discuss recent progress on hardness of 2-to-2 games, with applications to the inapproximability of the Vertex-Cover problem.

A 2-to-2 game (which is a variant of Khot's well known unique games), is defined by a graph where there is a variable in each node, and a constraint of a specific structure defined on each edge. While in unique games each edge- constraint must be a one-to-one correspondence -- i.e. for each assignment to one node there is exactly one assignent to the other node that satisfies the constraint -- in 2-to-2 games the correspondence has a "two-to-two" structure.

The goal is to distinguish between instances in which almost all of the edge- constraints can be satisfied, and instances in which almost none of them can be satisfied simultaneously.

We present a new combinatorial hypothesis regarding Grassmann graphs, and show that it implies that 2-to-2 games are NP-hard *in a certain sense*. As a consequence, the hypothesis implies that it is NP-hard to distinguish between graphs that have an independent set of fractional size (1- 1/sqrt{2}), and graphs with no independent sets of any constant fractional size. This easily implies that it is NP-hard to approximate the Vertex Cover problem within a factor \sqrt{2} - o(1).

The talk is mostly based on a joint work with Subhash Khot and Muli Safra, nevertheless, we will also mention results from a more recent extension, which is a joint work with Irit Dinur, Subhash Khot, Guy Kindler and Muli Safra.

We characterize the communication complexity of truthful mechanisms. Our departure point is the well known taxation principle. The taxation principle asserts that every truthful mechanism can be interpreted as follows: every player is presented with a menu that consists of a price for each bundle (the prices depend only on the valuations of the other players). Each player is allocated a bundle that maximizes his profit according to this menu. We define the taxation complexity of a truthful mechanism to be the logarithm of the maximum number of menus that may be presented to a player.

Our main finding is that in general the taxation complexity essentially equals the communication complexity. The proof consists of two main steps. First, we prove that for rich enough domains the taxation complexity is at most the communication complexity. We then show that the taxation complexity is much smaller than the communication complexity only in "pathological" cases and provide a formal description of these extreme cases.

Our approach yields several applications, including strengthening the solution concept with low communication overhead, fast computation of prices, and hardness of approximation by computationally efficient truthful mechanisms.

In the classical Node-Disjoint Paths (NDP) problem, the input consists of an undirected n-vertex graph G, and a collection M of pairs of its vertices, called source-destination, or demand, pairs. The goal is to route the largest possible number of the demand pairs via node-disjoint paths. The best current approximation for the problem is achieved by a simple greedy algorithm, whose approximation factor is O(\sqrt n), while the best current negative result is a roughly \Omega(log^{1/2}n)-hardness of approximation. Even seemingly simple special cases of the problem are still poorly understood: when the input graph is a grid, the best current algorithm achieves a \tilde{O}(n^{1/4})- approximation, and when it is a general planar graph, the best current approximation ratio of an efficient algorithm is \tilde{O}(n^{9/19}). The best currently known lower bound for both these versions of the problem is APX- hardness.

In this talk we will show that NDP is 2^{\Omega(\log n)}-hard to approximate, unless all problems in NP have algorithms with running time n^{O(\log n)}. Our result holds even when the underlying graph is a planar graph with maximum vertex degree 3, and all source vertices lie on the boundary of a single face. We extend this result to the closely related Edge-Disjoint Paths problem, showing the same hardness of approximation ratio even for sub-cubic planar graphs with all sources lying on the boundary of a single face.

This is joint work with David H.K. Kim and Rachit Nimavat.

One of the most common tasks in cryptography and cryptanalysis is to find some interesting event (a needle) in an exponentially large collection (haystack) of N=2^n possible events, or to demonstrate that no such event is likely to exist. In particular, we are interested in finding needles which are defined as events that happen with an unusually high probability of p>>1/N in a haystack which is an almost uniform distribution on N possible events. When the search algorithm can only sample values from this distribution, the best known time/memory tradeoff for finding such an event requires O(1/Mp^2) time given O(M) memory.

In this talk I will describe much faster needle searching algorithms in the common cryptographic setting in which the distribution is defined by applying some deterministic function f to random inputs. Such a distribution can be modeled by a random directed graph with N vertices in which almost all the vertices have O(1) predecessors while the vertex we are looking for has an unusually large number of O(pN) predecessors. When we are given only a constant amount of memory, we propose a new search methodology which we call **NestedRho**. As p increases, such random graphs undergo several subtle phase transitions, and thus the log-log dependence of the time complexity T on p becomes a piecewise linear curve which bends four times. Our new algorithm is faster than the O(1/p^2) time complexity of the best previous algorithm in the full range of 1/N < p < 1 , and in particular it improves the previous time complexity by a significant factor of \sqrt{N} for any p in the range N^(- 0.75) < p < N^(-0.5). When we are given more memory, we show how to combine the **NestedRho** technique with the parallel collision search technique in order to further reduce its time complexity. Finally, we show how to apply our new search technique to more complicated distributions with multiple peaks when we want to find all the peaks whose probabilities are higher than p.

Joint work with Itai Dinur, Orr Dunkelman and Nathan Keller.

Suppose that you have n truly random bits X=(X1,...,Xn) and you wish to use them to generate m>>n pseudorandom bits Y=(Y1,..., Ym) using a local mapping, i.e., each Yi should depend on at most d=O(1) bits of x. In the polynomial regime of m=n^s, s>1, the only known solution, originates from (Goldreich, ECCC 2000), is based on Random Local Functions: Compute Yi by applying some fixed (public) d-ary predicate P to a random (public) tuple of distinct inputs. In this talk, we will try to understand, for any value of s, how the pseudorandomness of the resulting sequence depends on the choice of the underlying predicate.

Based on joint work with Shachar Lovett.

Interactive proofs have had a dramatic impact on Complexity Theory and Cryptography. The celebrated IP=PSPACE Theorem [LFKN92,Shamir92] allows an all-powerful but untrusted prover to convince a polynomial-time verifier of the validity of extremely complicated statements (as long as they can be evaluated using polynomial space). The interactive proof system designed for this purpose requires a polynomial number of communication rounds. This talk will focus on studying the power of more efficient interactive proof systems.

Our main result is that for every statement that can be evaluated in polynomial time and bounded-polynomial space, there exists an interactive proof that satisfies the following strict efficiency requirements:

(1) The honest prover runs in polynomial time.

(2) The verifier is almost linear time (and under some conditions even sub linear).

(3) The interaction consists of only a constant number of communication rounds.

To obtain this result, we introduce and study several new notions for interactive proofs, which may be of independent interest. One of these notions is that of unambiguous interactive proofs, where the prover has a unique successful strategy. Another notion is that of probabilistically checkable interactive proofs (PCIPs) where the verifier only reads a few bits of the transcript in checking the proof (this could be viewed as an interactive extension of PCPs).

Joint work with Omer Reingold and Ron Rothblum.

Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge-coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. In this paper we prove that these exponential gaps are necessary and establish connections between the deterministic and randomized complexities in the LOCAL model. Each result has a very compelling take-away message:

1. Fast Δ-coloring of trees requires random bits: Building on the recent lower bounds of Brandt et al., we prove that the randomized complexity of Δ-coloring a tree with maximum degree Δ≥55 is Θ(log_Δ log n), whereas its deterministic complexity is Θ(log_Δ n) for any Δ≥3. This also establishes a large separation between the deterministic complexity of Δ-coloring and (Δ+1)-coloring trees.

2. Randomized lower bounds imply deterministic lower bounds: We prove that any deterministic algorithm for a natural class of problems that runs in O(1)+o(log_Δ n) rounds can be transformed to run in O(log*n −log*Δ+1) rounds. If the transformed algorithm violates a lower bound (even allowing randomization), then one can conclude that the problem requires Ω(log_Δ n) time deterministically.

3. Deterministic lower bounds imply randomized lower bounds: We prove that the randomized complexity of any natural problem on instances of size n is at least its deterministic complexity on instances of size √ log n. This shows that a deterministic Ω(log_Δ n) lower bound for any problem implies a randomized Ω(log_Δ log n) lower bound. It also illustrates that the graph shattering technique is absolutely essential to the LOCAL model.

Joint work with Tsvi Kopelowitz and Seth Pettie. http://arxiv.org/abs/1602.08166

In distributed systems, communication between the participants in the computation is usually the most expensive part of the computation. Theoretical models of distributed systems usually reflect this by neglecting the cost of local computation, and charging only for messages sent between the participants; in particular, we usually assume that the computation proceeds in rounds, and in each round, each participant can send only a limited number of bits. We are interested in characterizing the number of rounds required to perform various tasks.

In this talk I will describe two sets of results. The first concerns the complexity of distributed subgraph detection: we have n servers, each representing a node in an undirected graph, and each server receives as input its adjacent edges in the graph. The goal of the computation is to determine whether the global input graph contains some fixed subgraph. I will describe upper and lower bounds for several classes of subgraphs, through a connection to Turan numbers. The general case remains open.

In the second part of the talk I will describe recent work on multi- party number-in-hand communication and information complexity, and show a tight upper and lower bound for set disjointness in the shared blackboard model.

Joint work with Mark Braverman, Andrew Drucker and Fabian Kuhn.

The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. In a sense, the strongest guarantee available is the L2 guarantee, which requires finding all items that occur at least eps*||f|| times in the stream, where the i-th coordinate of the vector f is the number of occurrences of i in the stream. The first algorithm to achieve the L2 guarantee was the CountSketch (Charikar, Chen, and Farach-Colton ICALP'02), which, for constant eps, requires O(log n) words of memory and O(log n) update time. It is known to be space-optimal if the stream includes deletions.

In this talk I will discuss recent improvements that allow us to find L2 heavy hitters in O(1) memory and O(1) update time in insertion only streams. The improvements rely on a deeper understanding of the AMS sketch (Alon, Matias, and Szegedy STOC'96) and similar sketches and draw on the theory of Gaussian processes. This talk is based on joint work with Vladimir Braverman, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P. Woodruff in arxiv:1511.00661 and arxiv:1603.00759.

In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, we propose the methodology of sampling correctors. Such algorithms use structure that the distribution is purported to have, in order to allow one to make "on-the-fly" corrections to samples drawn from probability distributions. These algorithms may then be used as filters between the noisy data and the end user. We show connections between sampling correctors, distribution learning algorithms, and distribution property testing algorithms. We show that these connections can be utilized to expand the applicability of known distribution learning and property testing algorithms as well as to achieve improved algorithms for those tasks.Warning: This talk contains more questions than answers...

Joint work with Clement Canonne and Themis Gouleakis.

We present a randomized algorithm that computes a Minimum Spanning Tree (MST) in O(log^* n) rounds, with high probability, in the Congested Clique model of distributed computing. In this model, the input is a graph on n nodes, initially each node knows only its incident edges, and per round each two nodes can exchange O(log n) bits.

Our key technical novelty is an O(log^* n) Graph Connectivity algorithm, the heart of which is a (recursive) forest growth method, based on a combination of two ideas: a sparsity-sensitive sketching aimed at sparse graphs and a random edge sampling aimed at dense graphs.

Our result improves significantly over the $O(\log \log \log n)$ algorithm of Hegeman et al. [PODC 2015] and the $O(\log \log n)$ algorithm of Lotker et al. [SPAA 2003; SICOMP 2005].

Join work with Mohsen Ghaffari.

Estimating the amount of distinct elements in a dataset by examining only a fraction of the data is known to be a hard problem, both theoretically and in practice.

Our work explores a breakthrough theoretical result by Valiant and Valiant from 2011 that presents a provably accurate method for doing such estimations.

Our goal is to put this theory into practice for the important task of estimating the deduplication ratio of a very large dataset. However, deploying this technique in a real world setting runs into significant obstacles.

In the talk I will describe new techniques that help bridging the gap and enable the use of this exciting approach. Our work achieves a major improvement over the current state of the art practical solutions.

The talk is for a general audience, no prior knowledge is assumed.

Based on joint work with Dmitry Sotnikov and Ety Khaitzin that appeared at Usenix FAST 2016.

Expander graphs are widely studied, and various methods are known to obtain bounded degree expander graphs. Recently, there is a growing interest in understanding combinational expansion in higher dimensions (higher dimensional simplicial complexes). However, bounded degree combinatorial expanders (random or explicit) were not known till our work.

We present a local to global criterion on a complex that implies combinatorial expansion. We use our criterion to present explicit bounded degree high dimensional expanders. This solves in the affirmative an open question raised by Gromov, who asked whether bounded degree high dimensional expanders could at all exist.

We expect that the emerging theory of high dimensional expansion is likely to have various application in the theory of computation. Thus, one of the goals of this talk in to introduce this concept to the theory community.

Based on joint works with David Kazhdan and Alex Lubotzky, and with Shai Evra.

Raz's celebrated Parallel Repetition Theorem shows that the probability of simultaneously winning n independent instances of a two-player one-round game G is exponentially small in n, when the maximum success probability of G is less than 1. Though the statement is intuitive, the proof is rather nontrivial and has found important application in hardness of approximation, cryptography, and communication complexity.

There are two major open problems regarding the parallel repetition of games: does an analogue of Raz's theorem hold for (a) games with more than two players, and (b) games with quantumly entangled players? Extending Raz’s theorem to these settings is a challenging problem for a number of reasons: techniques for attacking direct sum/direct product problems in multiparty settings are lacking, and our understanding of quantum entanglement as an information theoretic resource is quite limited.

In this work, we show to sidestep these barriers and make progress on the two open problems. We first prove exponential-decay parallel repetition theorems for a class of games we called "anchored games" in the multiplayer and entangled-player settings. Then, we show how to efficiently transform any game into an equivalent anchored game. Together, our results provide a simple hardness-amplification technique for games in both the classical multiplayer and quantum settings.

Joint work with Mohammad Bavarian and Thomas Vidick.

Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) initiated the formal study of this problem, and gave the first upper bounds on the achievable generalization error for adaptive data analysis.

The results of Dwork et al. are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that stable algorithms of the kind guaranteed by differential privacy imply low generalization error. We also show that weaker stability guarantees such as bounded KL divergence and total variation distance lead to correspondingly weaker generalization guarantees.

Joint work with Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, and Jonathan Ullman.

For the past 40 years computer scientists generally believed that NP-complete problems are intractable. In particular, Boolean satisfiability (SAT), as a paradigmatic NP-complete problem, has been considered to be intractable. Over the past 20 years, however, there has been a quiet, but dramatic, revolution, and very large SAT instances are now being solved routinely as part of software and hardware design.

In this talk I will review this amazing development and show that we can leverage SAT solving to accomplish other Boolean reasoning tasks. Counting the number of satisfying truth assignments of a given Boolean formula or sampling such assignments uniformly at random are fundamental computational problems in computer science with numerous applications. While the theory of these problems has been thoroughly investigated in the 1980s, approximation algorithms developed by theoreticians do not scale up to industrial-sized instances. Algorithms used by the industry offer better scalability, but give up certain correctness guarantees to achieve scalability. We describe a novel approach, based on universal hashing and Satisfiability Modulo Theory, that scales to formulas with hundreds of thousands of variable without giving up correctness guarantees.

In today’s world there are huge amounts of data that need to get reliably stored or transmitted. However, some amount of noise or corruption is inevitable. An error-correcting code is a scheme for robustly representing data in the form of a codeword that allows one to detect and correct errors in transmission. Locally-testable and locally-decodable codes are special families of error-correcting codes that admit highly efficient algorithms that detect and correct errors in sublinear time with high probability, probing only a small number of entries of the corrupted codeword. While locally-testable and locally-decodable codes have been intensely studied in the past 2 decades, in recent years there has been even further incentive for their study due to their relevance for transmission and storage of massive data and the successful implementation of local codes in cloud storage systems.

In this talk, I will show an exponential improvement on the best-known running time of error detection and correction algorithms for locally-testable and locally-decodable codes. Specifically, I will describe new families of locally-testable codes with constant rate that can detect a constant fraction of errors in time (log n)^{O(log log n)} and new families of locally-decodable codes of constant rate that can correct a constant fraction of errors in time exp(\sqrt{log n}). Prior to that, the best known running time for such codes was n^{epsilon} (for a constant epsilon) using several, quite different, constructions.

(Based on joint work with Swastik Kopparty, Or Meir and Shubhangi Saraf)

Locality-Sensitive Hashing (LSH) is a powerful technique for the approximate nearest neighbor search (ANN) in high dimensions. In this talk I will present two recent results:

1) I will show a data structure for ANN for the Euclidean distance that provably outperforms the best possible LSH-based data structure. We proceed via designing a good *data-dependent* hash family.

2) I will show a practical and optimal LSH family for the cosine similarity (a.k.a. Euclidean distance on a sphere). It substantially outperforms the celebrated Hyperplane LSH family. Along the way, I will try to debunk two popular myths about LSH:

* LSH-based data structures consume too much memory and are thus impractical;

* Optimal LSH constructions are too complicated to be made practical.

The talk is based on two papers: arXiv: 1501.01062 (joint with Alexandr Andoni, STOC 2015) and arXiv: 1509.02897 (joint with Alexandr Andoni, Piotr Indyk, Thijs Laarhoven and Ludwig Schmidt, NIPS 2015).

We show techniques for decreasing the error probability of randomized algorithms and for converting randomized algorithms to deterministic (non-uniform) algorithms. Unlike most existing techniques that involve repetition of the randomized algorithm and hence a slowdown, our techniques produce algorithms with a similar run-time to the original randomized algorithms.

The amplification technique is related to a certain stochastic multi-armed bandit problem. The derandomization technique -- which is the main contribution of this work -- points to an intriguing connection between derandomization and sketching/sparsification.

We demonstrate the techniques by showing applications to max-cut on dense graphs, approximate clique, constraint satisfaction problems on dense bipartite graphs, and list decoding to unique decoding for Reed-Muller code.

This is joint work with Ofer Grossman.

The class P attempts to capture the efficiently solvable computational tasks. It is full of practically relevant problems, with varied and fascinating combinatorial structure.

In this talk, I will give an overview of a rapidly growing body of work that seeks a better understanding of the structure within P. Inspired by NP-hardness, the main tool in this approach are combinatorial reductions. Combining these reductions with a small set of plausible conjectures, we obtain tight lower bounds on the time complexity of many of the most important problems in P.

I will present the most recent landscape of P and the conjectures on which this project is based on (e.g. the Strong Exponential Time Hypothesis). I will discuss recent attempts on identifying new conjectures: either more reliable ones, or ones that will get us closer to a full classification of the important problems in P.

Finally, I will highlight a surprising new reduction from Circuit-SAT to natural problems in P like Edit-Distance which proves that minor improvements over the quadratic running time of Edit-Distance are enough to prove major complexity separations.

We study the two-party communication complexity of the geometric problem of finding an approximate Brouwer fixed-point of a composition of two Lipschitz functions g*f, where Alice knows f and Bob knows g.

We prove an essentially tight communication lower bound on this problem, using a novel adaptation of the Raz-McKenzie simulation theorem into geometric settings.

We show that a slightly stronger version of this communication problem would imply an (essentially) tight communication lower bounds on the problem of finding an approximate Nash equilibrium in 2-player (and n-player) games, where each player initially knows only his own payoff matrix.

Joint work with Tim Roughgarden.

Online d-dimensional vector packing models many settings such as minimizing resources in data centers where jobs have multiple resource requirements (CPU, Memory, etc.). However, no online d-dimensional vector packing algorithm can achieve a competitive ratio better than d. Fortunately, in many natural applications, vectors are relatively small, and thus the lower bound does not hold. For sufficiently small vectors, an O(log d)-competitive algorithm was known. We improve this to a constant competitive ratio, arbitrarily close to e (where e is the base of the natural logarithm), given that vectors are sufficiently small.

We give improved results for the two dimensional case. For arbitrarily small vectors, the First Fit algorithm for two dimensional vector packing is no better than 2-competitive. We present a natural family of First Fit variants, and for optimized parameters get a competitive ratio of approximately 1.48 for sufficiently small vectors.

We improve upon the 1.48 competitive ratio - not via a First Fit variant - and give a competitive ratio arbitrarily close to 4/3 for packing small, two dimensional vectors. We show that no algorithm can achieve better than a 4/3 competitive ratio for two dimensional vectors, even if one allows the algorithm to split vectors among arbitrarily many bins.

Much of the theory of mathematical programs for combinatorial optimization can be described in the following way: A polytope of interest has exponentially many (in the dimension) facets, but can be written as the linear projection of a simpler convex body in a higher-dimensional space. Simple might mean a polytope with a much smaller number of facets, or a spectrahedron (the intersection of an affine subspace with the PSD cone) of small dimension. This allows one to optimize linear functionals over the base polytope by instead optimizing a lifted functional over the lifted body.

Unless P=NP, one does not expect certain polytopes--like the convex hull of indicators of traveling salesman tours in a graph--to have a small lift. But it remained open to prove any non-trivial lower bound on the necessary dimension for a spectrahedral lift, i.e. to prove that semi-definite programs do not yield efficient optimization procedures over these polytopes.

We show that the cut, TSP, and stable set polytopes on n-vertex graphs are not the linear image of a spectrahedron of dimension less than exp(n^c) for some constant c > 0. In the process, many interesting phenomena emerge: Factorization of operators through the PSD cone, quantum information theory, discrete Fourier analysis, and real algebraic geometry.

This is based joint work with Prasad Ragahvendra and David Steurer.

An Oblivious RAM (ORAM), introduced by Goldreich and Ostrovsky (JACM 1996), is a (probabilistic) RAM that hides its access pattern, i.e. for every input the observed locations accessed are similarly distributed. Great progress has been made in recent years in minimizing the overhead of ORAM constructions, with the goal of obtaining the smallest overhead possible.

We revisit the lower bound on the overhead required to obliviously simulate programs, due to Goldreich and Ostrovsky. While the lower bound is fairly general, including the offline case, when the simulator is given the reads and writes ahead of time, it does assume that the simulator behaves in a "balls and bins" fashion. That is, the simulator must act by shuffling data items around, and is not allowed to have sophisticated encoding of the data.

We prove that for the OFFLINE case, showing a lower bound without the above restriction is related to the size of the circuits for sorting. Our proof is constructive, and uses a bit-slicing approach which manipulates the bit representations of data in the simulation. This implies that without obtaining yet unknown superlinear lower bounds on the size of such circuits, we cannot hope to get lower bounds on offline (unrestricted) ORAMs.

The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence that for a given text and pattern, one cannot compute the exact Hamming distance for all locations in the text in time which is polynomially less than o(n\sqrt m). Nevertheless, Karloff showed that if one is willing to suffer a 1+-\epsilon approximation, then it is possible to solve the problem with high probability in O~(n / \epsilon^2) time.

Due to related lower bounds for computing the Hamming distance of two strings in the one-way communication complexity model, it is strongly believed that obtaining an algorithm for solving the approximation version cannot be done much faster as a function of 1 / \epsilon. We will show that this belief is false by introducing a new O~(n / \epsilon) time algorithm that succeeds with high probability.

The main idea behind our algorithm, which is common in sparse recovery problems, is to reduce the variance of a specific randomized experiment by (approximately) separating heavy hitters from non-heavy hitters. However, while known sparse recovery techniques work very well on vectors, they do not seem to apply here, where we are dealing with mismatches between pairs of characters. We introduce two main algorithmic ingredients. The first is a new sparse recovery method that applies for pair inputs (such as in our setting). The second is a new construction of hash/projection functions, which allows to count the number of projections that induce mismatches between two characters exponentially faster than brute force. We expect that these algorithmic techniques will be of independent interest.

We show a 2^{n+o(n)}-time algorithm for the Shortest Vector Problem on n-dimensional lattices (improving on the previous best-known algorithm of Micciancio and Voulgaris, which runs in time 4^{n+o(n)}). The algorithm uses the elementary yet powerful observation that, by properly combining samples from a Gaussian distribution over the lattice, we can produce exact samples from a narrower Gaussian distribution on the lattice. We use such a procedure repeatedly to obtain samples from an arbitrarily narrow Gaussian distribution over the lattice, allowing us to find a shortest vector.

Both the algorithm and the analysis are quite simple in hindsight. (The main technical tool is an identity on Gaussian measures with a simple geometric proof originally due to Riemann.) If time allows and interest merits, we will discuss a more technical variant of this algorithm that solves the Closest Vector Problem (a seemingly harder problem) in the same asymptotic running time.

Based on joint work with Divesh Aggarwal, Daniel Dadush, and Oded Regev. (See http://arxiv.org/abs/1412.7994 and http://arxiv.org/abs/1504.01995.)

The Hamming and the edit metrics are two common notions of measuring distances between pairs of strings x,y lying in the Boolean hypercube. The edit distance between x and y is defined as the minimum number of character insertion, deletion, and bit flips needed for converting x into y. Whereas, the Hamming distance between x and y is the number of bit flips needed for converting x to y.

In this paper we study a randomized injective embedding of the edit distance into the Hamming distance with a small distortion. This question was studied by Jowhari (ESA 2012) and is mainly motivated by two questions in communication complexity: the document exchange problem and deciding edit distance using a sketching protocol.

We show a randomized embedding with quadratic distortion. Namely, for any $x,y$ satisfying that their edit distance equals $k$, the Hamming distance between the embedding of $x$ and $y$ is $O(k^2)$ with high probability. This improves over the distortion ratio of $O(\log n \log^* n)$ obtained by Jowhari for small values of $k$. Moreover, the embedding output size is linear in the input size and the embedding can be computed using a single pass over the input.

Assortment planning is a major operational issue that arises in many industries, such as retailing, airlines and consumer electronics. Given a set of products that are differentiated by price, quality and possibly other attributes, one has to decide on the subset of products and the respective quantities that will be stocked and offered to heterogeneous customers, who exhibit substitution behavior.

The general problem can be shown to be NP-hard to approximate better than a factor linear in the number of products. In this talk we discuss how for a range of practically interesting special cases, one could design conceptually simple policies that admit provably near-optimal solutions. The analysis reveals interesting structural properties, including hidden submodularity and decomposition properties.

The talk is based on several papers which are Joint work with Ali Aouad, Vineet Goyal and Danny Segev

The theory of structure and pseudo-randomness has been very influential in several areas of mathematics, such as number theory, graph theory and harmonic analysis. It is also been influential in theoretical computer science, with applications in complexity theory, cryptography and property testing. At a high level, it allows to analyze arbitrary objects by decomposing them to a "structural" component and a "pseudo-random" component. The pseudo-random component behaves in many ways like random noise, while the structural component has a concise representation which makes it amenable to analysis and algorithmic manipulation.

In this talk, I will describe applications of this paradigm to coding theory. I will describe a new general approach to list decoding, which follows by decomposing an arbitrary received word to a structural received word and pseudo-random noise. This allows for a simplified analysis of the list decoding problem. In particular, I will describe how this approach leads to a resolution of a conjecture by Gopalan, Klivans and Zuckerman [STOC 2008], that the list decoding radius of Reed-Muller codes (in certain regimes) is equal to the minimal distance of the code.

Based on joint work with Abhishek Bhowmick.

There are at least 2 aspects to learning: predicting the outcome of unseen events, and finding simple explanations of observed systems. We shall discuss 2 formal abstractions of these aspects: PAC learning and sample compression schemes. We shall start with an introduction to these notions, and then discuss the equivalence between them.

Based on a joint project with Shay Moran, Amir Shpilka and Avi Wigderson.

We generalize the technique of smoothed analysis to distributed algorithms in dynamic network models. Whereas standard smoothed analysis studies the impact of small random perturbations of input values on algorithm performance metrics, dynamic graph smoothed analysis studies the impact of random perturbations of the underlying changing network graph topologies. Similar to the original application of smoothed analysis, our goal is to study whether known strong lower bounds in dynamic network models are robust or fragile: do they withstand small (random) perturbations, or do such deviations push the graphs far enough from a precise pathological instance to enable much better performance? Fragile lower bounds are likely not relevant for real-world deployment, while robust lower bounds represent a true difficulty caused by dynamic behavior. We apply this technique to three standard dynamic network problems with known strong worst-case lower bounds: random walks, flooding, and aggregation. We prove that these bounds provide a spectrum of robustness when subjected to smoothing---some are extremely fragile (random walks), some are moderately fragile / robust (flooding), and some are extremely robust (aggregation).

Joint work with Jeremy Fineman (Georgetown), Seth Gilbert (National University of Singapore), and Calvin Newport (Georgetown).

Given an input $x$, and a search problem $F$, local computation algorithms (LCAs) implement access to specified locations of $y$ in a legal output $y \in F(x)$, using polylogarithmic time and space. Previous work on LCAs restricted its focus to graphs of bounded degree, or degree of bounded expectation that is distributed binomially.

Using a new palette of techniques, we show that it is possible to obtain LCAs for maximal independent set (MIS) and maximal matching (MM) on trees with degree bounded by $\polylog{n}$. Our result immediately extends to all graphs with degree bounded by $\polylog{n}$, as long as they do not contain short cycles (of length $\polylog{n}$).

We define a family of graphs called $d$-light graphs, and show how to convert a large class of online algorithms (including MIS and MM) to LCAs on these graphs. We then show that applying the MIS (or MM) LCA on appropriately selected $d$-light subgraphs, allows us to quickly address all of the vertices of the $\polylog{n}$-degree graph.

In addition to expanding the range of LCAs to graphs of polylogarithmic degree, our new techniques also significantly improve the running times and space requirements, expand the family of graphs, and better define the family of algorithms to which previous results apply. Furthermore our proofs are simpler and more concise than the previous proof methods.

Joint work with Omer Reingold.

The interplay between **algorithms** and **proofs** has been one of the most fascinating and fruitful themes in theoretical Computer Science. In this talk I will describe a different connection between these two concepts - a general methodology for algorithm design using an automatic transformation of a proof into an algorithm. This methodology yields a systematic way to design and analyze algorithms across many domains. In particular we and others have used it for problems in combinatorial optimization, machine learning, and quantum information theory, and it shows promise for making progress on important open questions such as settling Khot's Unique Games Conjecture. I will demonstrate this methodology by presenting an algorithm for the Sparse Coding / Dictionary Learning problem that handles much more general inputs than was known before, and an algorithm for the Unique Games problem that can solve all the previously known candidate hard instances. Key to our approach is the notion of "pseudo-distributions", which are objects that are emphatically different than actual distributions but behave like them in the eyes of low degree polynomials. We use these pseudo-distributions to "lift" a proof into an algorithm via the Shor-Parrilo-Lasserre "Sum-of-Squares" semidefinite programming hierarchy. The talk will be based on several joint works with (varying subsets of) Fernando Brandao, Aram Harrow, Jonathan Kelner, David Steurer and Yuan Zhou.

In many settings, people exhibit behavior that is inconsistent across time — we allocate a block of time to get work done and then procrastinate, or put effort into a project and then later fail to complete it. An active line of research in behavioral economics and related fields has developed and analyzed models for this type of time-inconsistent behavior.

Here we propose a graph-theoretic model of tasks and goals, in which dependencies among actions are represented by a directed graph, and a time-inconsistent agent constructs a path through this graph. We first show how instances of this path-finding problem on different input graphs can reconstruct a wide range of qualitative phenomena observed in the literature on time-inconsistency, including procrastination, abandonment of long-range tasks, and the benefits of reduced sets of choices. We then explore a set of analyses that quantify over the set of all graphs; among other results, we find that in any graph, there can be only polynomially many distinct forms of time-inconsistent behavior; and any graph in which a time-inconsistent agent incurs significantly more cost than an optimal agent must contain a large “procrastination” structure as a minor. Finally, we use this graph-theoretic model to explore ways in which tasks can be designed to help motivate agents to reach designated goals.

This is joint work with Jon Kleinberg.

I will describe recent applications of "birthday repetition" that can give (conditional) quasi-polynomial time hardness results for Densest k-Subgraph and for $\epsilon$-Nash in 2-player games. (For the latter result we use a non-standard "PCP for PPAD" assumption which I will also discuss.) Both results are tight by [FS97], [LMM03], [Barman15].

Based on:

DkS- http://eccc.hpi-web.de/report/2015/074/

(joint work with Mark Braverman, Young Kun Ko, and Omri Weinstein)

Nash- http://arxiv.org/abs/1504.02411

(joint work with Yakov Babichenko and Christos Papadimitriou)

Research can be viewed as a search of an accepting-state in an exponential-space. When we look in hind-sight after we find an accepting-state, can we identify a much shorter path than the one that was discovered by trial and error as the search actually proceeded?

In this talk I'll show that this is the case for Distributed-Computability: The discovery that different distributed problems have different levels of difficulty, and identifying the weakest model of distributed-computation that allows to solve a problem. I'll explain the essence of 40 years of research in an hour, by showing that if the right questions were asked at the right time, all the results could have been had in a span of time order-of-magnitude shorter.

Some of the major ideas in the talk were developed in works with Afek (TAU), and Borowsky (Akamai), Lynch (MIT), and Rajsbaum (UNAM).

The PCP theorem (AS,ALMSS 1991) guarantees that every NP language has a Probabilistically Checkable Proof (PCP) system allowing a verifier to check a witness very efficiently using randomness, and allowing for small error.

Most of the talk will not assume prior knowledge, but I will also devote some time to some recent work joint with Harsha and Kindler.

In this work we make (some) progress towards proving the so-called "sliding-scale conjecture". This is a conjecture of BGLR from 1993 about the tradeoff between the number of bits read from the PCP proof and the error of the verifier.

Our work revisits older constructions and analyzes them using the more modern "modular-composition" approach.

Based on joint work with Prahladh Harsha and Guy Kindler.

Motivated by the goal of securely searching and updating distributed data, we introduce the notion of function secret sharing (FSS), a form of “additive secret sharing” for {\em functions} f: {0,1}^n → G, where G is an abelian group.

An m-party FSS scheme for function class F allows one to split any function f from F into m succinctly described functions f_i, such that: (1) for every input x, f(x) is equal to the sum of evaluations \sum_i f_i(x), and (2) any strict subset of "share functions" f_i hides f. FSS provides a natural generalization of distributed point functions, as introduced by (Gilboa-Ishai Eurocrypt 2014), which coincide with the special case of two parties and the class F of point functions (which evaluate to 0 at all but one point).

We present two types of results:

- We obtain efficiency improvements and extensions of the original distributed point function construction.

- We then initiate a systematic study of general FSS, providing constructions for richer function classes, and establishing relations with other cryptographic primitives.

Joint work with Niv Gilboa and Yuval Ishai.

We introduce and construct a pseudo-random object which we call a local correlation breaker (LCB). This is an algorithm that gets as input a sequence of (possibly correlated) random variables and an independent weak source of randomness. The output of the LCB is a sequence of random variables with the following property. If the i'th input random variable is uniform then the i'th output variable is uniform even if a bounded number of any other output variables are given. That is, an LCB uses the weak-source to "break" local correlations between random variables.

Based on LCBs we obtain improved constructions of mergers with weak-seeds and multi-source extractors. In particular, we construct a 3-source extractor for entropies delta*n, O(log n) and O(loglog n), for any constant delta. We further construct a 7-source extractor for poly-logarithmic entropy.

Joint work with Guy Rothblum.

No prior knowledge is assumed.

Bloom filters and Counting Bloom Filters (CBFs) are widely used in networking device algorithms. They implement fast set representations to support membership queries with limited error. Unlike Bloom filters, CBFs also support element deletions. In the first part of the talk, I will introduce a new general method based on variable increments to improve the efficiency of CBFs and their variants. I will demonstrate that this method can always achieve a lower false positive rate and a lower overflow probability bound than CBFs in practical systems.

Next, I will present ongoing research on data center networking. In particular, I will introduce a new approach to providing network isolation, so customers can feel alone in shared clouds, without any network contention from other customers. I will also demonstrate theoretical conditions for the isolation algorithm.

An instance of the E2-Lin(2) problem is a system of equations of the form "x_i + x_j = b (mod 2)". Given such a system in which it is possible to satisfy all but an epsilon fraction of the equations, we would like to find an assignment that violates as few equations as possible. In this paper, we show that it is NP-hard to satisfy all but a C*epsilon fraction of the equations, for any C< 11/8 and 0 < epsilon <= 1/8. Our result holds also for the special case of Max-Cut. The previous best NP-hardness result, standing for over 15 years, had 5/4 in place of 11/8.

Our proof is by a modified gadget reduction from a predicate that supports a pairwise independent distribution. We also show an inherent limitation to this type of gadget reduction. In particular, we show that no such reduction can establish a hardness factor C greater than ~2.54.

Joint work with Johan Hastad, Rajsekar Manokaran, Ryan O'Donnell, John Wright.