## Select Seminar Series

All seminars- Home
- ›
- Studying at the Faculty
- ›
- Seminars ›
- Previous Seminars

# Previous Seminars

In analysis, a convolution of two functions usually results in a smoother, better behaved function. Given two morphisms f,g from algebraic varieties X,Y to an algebraic group G, one can define a notion of convolution of these morphisms. Analogously to the analytic situation, this operation yields a morphism (from X x Y to G) with improved smoothness properties.

In this talk, I will define a convolution operation and discuss some of its properties. I will then present a recent result; if G is an algebraic group, X is smooth and absolutely irreducible, and f:X-->G is a dominant map, then after finitely many self convolutions of f, we obtain a morphism with the property of being flat with fibers of rational singularities (a property which we call (FRS)).

Moreover, Aizenbud and Avni showed that the (FRS) property has an equivalent analytic characterization, which leads to various applications such as counting points of schemes over finite rings, representation growth of certain compact p-adic groups and arithmetic groups of higher rank, and random walks on (algebraic families of) finite groups. We will discuss some of these applications, and maybe some of the main ideas of the proof of the above result.

Joint with Yotam Hendel.

Random surfaces in statistical physics are commonly modeled by a real-valued function phi on a lattice, whose probability density penalizes nearest-neighbor fluctuations. Precisely, given an even function V, termed the potential, the energy H(phi) is computed as the sum of V over the nearest-neighbor gradients of phi, and the probability density of phi is set proportional to exp(-H(phi)). The most-studied case is when V is quadratic, resulting in the so-called Gaussian free field. Brascamp, Lieb and Lebowitz initiated in 1975 the study of the global fluctuations of random surfaces for other potential functions and noted that understanding is lacking even in the case of the quartic potential, V(x)=x^4. We will review the state of the art for this problem and present recent work with Alexander Magazinov which finally settles the question of obtaining upper bounds for the fluctuations for the quartic and many other potential functions.

Fast and robust three-dimensional reconstruction of facial geometric structure from a single image is a challenging task with numerous applications in computer vision and graphics. We propose to leverage the power of convolutional neural networks (CNNs) to produce highly detailed face reconstruction directly from a single image. For this purpose, we introduce an end-to-end CNN framework which constructs the shape in a coarse-to-fine fashion. The proposed architecture is composed of two main blocks, a network which recovers the coarse facial geometry (CoarseNet), followed by a CNN which refines the facial features of that geometry (FineNet). To alleviate the lack of training data for face reconstruction, we train our model using synthetic data as well as unlabeled facial images collected from the internet. The proposed model successfully recovers facial geometries from real images, even for faces with extreme expressions and under varying lighting conditions. In this talk, I will summarize three papers that were published at 3DV 2016, CVPR 2017 (as an oral presentation), and ICCV 2017.

Bio: Matan Sela holds a Ph.D in Computer Science from the Technion - Israel Institute of Technology. He received B.Sc. and M.Sc. (both with honors) in electrical engineering, both from The Technion - Israel Institute of Technology in 2012 and 2015, respectively. During summer 2017, he was a research intern at Google, Mountain View, California. His interests are Machine Learning, Computer Vision, Computer Graphics, Geometry Processing and any combination of thereof.

The Baum-Bott indexes are important local invariants for singular holomorphic foliations by curves with isolated singularities. On a compact complex manifold, we consider foliations with fixed cotangent bundle. Then, the Baum-Bott map associates to a foliation its Baum-Bott indexes on its singularities. We focus on foliations on the projective space and we are interested in the generic rank of that map. The generic rank for foliations on the projective plane is known. For high-dimensional projectives spaces, we give an upper bound and in some cases we determine the generic rank.

This paper makes progress on the problem of explicitly constructing a binary tree code with constant distance and constant alphabet size.

For every constant delta < 1 we give an explicit binary tree code with distance delta and alphabet size poly(log n), where n is the depth of the tree. This is the first improvement over a two-decade-old construction that has an exponentially larger alphabet of size poly(n).

As part of the analysis, we prove a bound on the number of positive integer roots a real polynomial can have in terms of its sparsity with respect to the Newton basis---a result of independent interest.

Joint work with G. Cohen and B. Haeupler

**David Jerison:** Localization of eigenfunctions via an effective potential

We discuss joint work with Douglas Arnold, Guy David, Marcel Filoche and Svitlana Mayboroda. Consider the Neumann boundary value problem for the operator $L u = - \mbox{div} (A \nabla u) + Vu$ on a Lipschitz domain $\Omega$ and, more generally, on a manifold with or without boundary. The eigenfunctions of $L$ are often localized, as a result of disorder of the potential $V$, the matrix of coefficients $A$, irregularities of the boundary, or all of the above. In earlier work, Filoche and Mayboroda introduced the function $u$ solving $Lu = 1$, and showed numerically that it strongly reflects this localization. Here, we deepen the connection between the eigenfunctions and this {\em landscape} function $u$ by proving that its reciprocal $1/u$ acts as an {\em effective potential}. The effective potential governs the exponential decay of the eigenfunctions of the system and delivers information on the distribution of eigenvalues near the bottom of the spectrum.

**Ron Rosenthal:** Eigenvector correlation in the complex Ginibre ensemble

The complex Ginibre ensemble is a non-Hermitian random matrix on C^N with i.i.d. complex Gaussian entries normalized to have mean zero and variance 1=N. Unlike the Gaussian unitary ensemble, for which the eigenvectors are orthogonal, the geometry of the eigenbases of the Ginibre ensemble are not particularly well understood. We will discuss a some results regarding the analytic and algebraic structure of eigenvector correlations in this matrix ensemble. In particular, we uncover an extended algebraic structure which describes the asymptotic behavior (as N goes to infinity) of these correlations. Our work extends previous results of Chalker and Mehlig [CM98], in which the correlation for pairs of eigenvectors was computed. Based on a joint work with Nick Crawford.

Recent work has shown impressive success in automatically creating new images with desired properties such as transferring painterly style, modifying facial expressions or manipulating the center of attention of the image. In this talk I will discuss two of the standing challenges in image synthesis and how we tackle them:

- I will describe our efforts in making the synthesized images more photo-realistic.

- I will further show how we can broaden the scope of data that can be used for training synthesis networks, and with that provide a solution to new applications.

This talk will include an introduction to the topic of V(D)J rearrangements of particular subsets of T cells and B cells of the adaptive human immune system, in particular of IgG heavy chains. There are many statistical problems that arise in trying to understand these cells. They involve estimating aspects of functionals of discrete probabilities on (random) finite sets. Topics include but are not limited to exchangeability, estimating non-centrality parameters, and estimating covariance matrices from what are called "replicates" that have been amplified by the PCR process and (partially) sequenced.

I have received considerable assistance from Lu Tian, and also Yi Liu; as well, I have been helped considerably by Andrew Fire and Scott Boyd, and also Jorg Goronzy

Single cells and cells in a tissue respond to stimuli by deforming, changing their shape, and/or moving. Some of these responses can be understood from the underlying biochemical signaling, a topic that has been of interest to both biologists and modelers. In our recent work, my group has studied the link between mechanical tension on cells and their internal chemical signaling. (Our primary focus has been, and remains, that of Rho proteins.) Here I will describe a simple "toy model" that captures key aspects of what is known biologically. The model is simple enough to understand mathematically, and yet capable fo displaying several regimes of behavior consistent with experimental observations. I describe how we investigated the model in a single cell, and how this was then used to capture multiple cells that interact with one another mechanically. We find waves of expansion and contraction that sweep through the model "tissue" is certain parameter regimes. This work is joint with Cole Zmurchok and Dhanajay Bhaskar.

We study the problem of identifying correlations in multivariate data, under information constraints:

Either on the amount of memory that can be used by the algorithm, or the amount of communi- cation when the data is distributed across several machines. We prove a tight trade-off between the memory/communication complexity and the sample complexity, implying (for example) that to detect pairwise correlations with optimal sample complexity, the number of required mem-ory/communication bits is at least quadratic in the dimension. Our results substantially improve those of Shamir (2014), which studied a similar question in a much more restricted setting. To the best of our knowledge, these are the first provable sample/memory/communication trade-offs for a practical estimation problem, using standard distributions, and in the natural regime where the memory/communication budget is larger than the size of a single data point. To derive our theorems, we prove a new information-theoretic result, which may be relevant for studying other information-constrained learning problems.

Joint work with Ohad Shamir

Non-signaling strategies are collections of distributions with certain non-local correlations that have been studied recently in the context of delegation of computation.

In this talk I will discuss a recent result showing that the Hadamard based PCP of [ALMSS'92] is sound against non-signaling strategies. As part of the proof, we study the classical problem of linearity testing [BLR'93] in the setting of non-signaling strategies, and prove that any no-signaling strategy that passes the linearity test with high probability must be close to a quasi-distribution over linear functions.

Joint work with Alessandro Chiesa and Peter Manohar (UC Berkeley).

Consider a sequence of growing graphs converging locally weakly to an infinite (possibly random) tree. As there are uncountably many Ising and Potts Gibbs measures on the limiting tree in the low-temperature regime it is not apriori clear whether the local weak limit of such measures exists and if so, identifying the limit remains a challenge. In this talk, I will describe these limits. The talk is based on joint works with Amir Dembo and Allan Sly.

Recently, there has been increasing interest to leverage the competence of neural networks to analyze data. In particular, new clustering methods that employ deep embeddings have been presented.

In this talk, we depart from centroid-based models and suggest a new framework, called Clustering-driven deep embedding with PAirwise Constraints (CPAC), for non-parametric clustering using a neural network. We present a clustering-driven embedding based on a Siamese network that encourages pairs of data points to output similar representations in the latent space. Our pair-based model allows augmenting the information with labeled pairs to constitute a semi-supervised framework. Our approach is based on analyzing the losses associated with each pair to refine the set of constraints.

We show that clustering performance increases when using this scheme, even with a limited amount of user queries.

We present state-of-the-art results on different types of datasets and compare our performance to parametric and non-parametric techniques.

Our first theorem is a hierarchy theorem for the query complexity of testing graph properties with one-sided error; more precisely, we show that for every sufficiently fast-growing function f from (0,1) to the natural numbers, there is a graph property whose one-sided-error query complexity is precisely f(\Theta(\epsilon)). No result of this type was previously known for any f which is super-polynomial. Goldreich [ECCC 2005] asked to exhibit a graph property whose query complexity is exponential in 1/\epsilon. Our hierarchy theorem partially resolves this problem by exhibiting a property whose one-sided-error query complexity is exponential in 1/\epsilon. We also use our hierarchy theorem in order to resolve a problem raised by Alon and Shapira [STOC 2005] regarding testing relaxed versions of bipartiteness.

Our second theorem states that for any function f there is a graph property whose one-sided-error query complexity is at least f(\epsilon) while its two-sided-error query complexity is only polynomial in 1/\epsilon. This is the first indication of the surprising power that two-sided-error testing algorithms have over one-sided-error ones, even when restricted to properties that are testable with one-sided error. Again, no result of this type was previously known for any f that is super-polynomial.

The above theorems are derived from a graph theoretic result which we think is of independent interest, and might have further applications. Alon and Shikhelman [JCTB 2016] introduced the following generalized Turan problem: for fixed graphs H and T, and an integer n, what is the maximum number of copies of T, denoted by ex(n,T,H), that can appear in an n-vertex H-free graph? This problem received a lot of attention recently, with an emphasis on T = C_3, H=C_{2m+1}. Our third theorem gives tight bounds for ex(n,C_k,C_m) for all the remaining values of k and m.

Joint work with Asaf Shapira.

Non-Euclidean, or incompatible elasticity is an elastic theory for bodies that do not have a reference (stress-free) configuration. It applies to many systems, in which the elastic body undergoes inhomogeneous growth (e.g. plants, self-assembled molecules). Mathematically, it is a question of finding the "most isometric" immersion of a Riemannian manifold (M,g) into Euclidean space of the same dimension, by minimizing an appropriate energy functional.

Much of the research in non-Euclidean elasticity is concerned with elastic bodies that have one or more slender dimensions (such as leaves), and finding appropriate dimensionally-reduced models for them.

In this talk I will give an introduction to non-Euclidean elasticity, and then focus on thin bodies and present some recent results on the relations between their elastic behavior and their curvature.

Based on a joint work with Asaf Shachar.

We prove a dichotomy theorem for two-party protocols, and show that for every poly-time two-party protocol with single-bit output, at least one of following holds:

- The protocol can be used to construct a key-agreement protocol.
- For every constant ρ > 0 the parties' output is ρ -uncorrelated: let (X; Y; T) denote the parties' outputs and the protocol's transcript respectively. A protocol is &rho -uncorrelated if there exists an efficient "decorralizer" algorithm Decor, that when given a random transcript T, produces two numbers P
_{A}; P_{B}, such that no efficient algorithm can distinguish (U_{PS};U_{PB}; T) (where Up denotes a biassed coin with bias ρ from (X; Y; T), with distinguishing advantage larger than ρ.

Namely, if the protocol cannot be used to construct key-agreement, then its output distribution (X; Y; T) is trivial: it can be simulated non-interactively by the parties given public randomness (used to sample T). (The precise statement also has qualifiers of the form: "on infinitely many choices of the security parameter").

We use the above characterization to prove that (α= 24ε^{2})-correct differentially private symmetric protocol for computing XOR, implies the existence of key-agreement protocol. The above dependency between α and &epsilon is tight since an θ( ε^{2})-correct "-differentially private protocol for computing XOR is known to exists unconditionally. It also improves, in the ( ε,α)dependency aspect, upon Goyal et al. [ICALP '16] who showed that, for some constant c > 0, a c-correct "-differentially private protocol for computing XOR implies oblivious transfer. Our result extends to a weaker notion of di erential privacy in which the privacy only requires to hold against external observer. Interestingly, the reductions used for proving the above results are non black box.

Joint work with: Eran Omri and Kobbi Nissim and Ronen Shaltiel and Jad Silbak

Talk 1: David Ellis (Queen Mary U)

**Title: The edge-isoperimetric problem for antipodally symmetric subsets of the discrete cube.**

Abstract: A major open problem in geometry is to solve the isoperimetric problem for n-dimensional real projective space, i.e. to determine, for each real number V, the minimum possible size of the boundary of a (well-behaved) set of volume V, in n-dimensional real projective space. We study a discrete analogue of this question: namely, among all antipodally symmetric subsets of {0,1}^n of fixed size, which sets have minimal edge-boundary? We obtain a complete answer to the second question. This is joint work with Imre Leader (Cambridge)

Talk 2: Benjamin Fehrman (Max Planck Institute)

**Title: Well-posedness of stochastic porous media equations with nonlinear, conservative noise.**

Abstract: In this talk, which is based on joint work with Benjamin Gess, I will describe a pathwise well-posedness theory for stochastic porous media equations driven by nonlinear, conservative noise. Such equations arise in the theory of mean field games, as an approximation to the Dean-Kawasaki equation in fluctuating hydrodynamics, to describe the fluctuating hydrodynamics of a zero range process, and as a model for the evolution of a thin film in the regime of negligible surface tension. Our methods are loosely based on the theory of stochastic viscosity solutions, where the noise is removed by considering a class of test functions transported along underlying stochastic characteristics. We apply these ideas after passing to the equation's kinetic formulation, for which the noise enters linearly and can be inverted using the theory of rough paths.

The harnessing of modern computational abilities for many-body wave-function representations is naturally placed as a prominent avenue in contemporary condensed matter physics. Specifically, highly expressive computational schemes that are able to efficiently represent the entanglement properties which characterize many-particle quantum systems are of interest. In the seemingly unrelated field of machine learning, deep network architectures have exhibited an unprecedented ability to tractably encompass the convoluted dependencies which characterize hard learning tasks such as image classification or speech recognition. However, theory is still lagging behind these rapid empirical advancements, and key questions regarding deep learning architecture design have no adequate answers. In the presented work, we establish a Tensor Network (TN) based common language between the two disciplines, which allows us to offer bidirectional contributions. By showing that many-body wave-functions are structurally equivalent to mappings of convolutional and recurrent arithmetic circuits, we construct their TN descriptions in the form of Tree and Matrix Product State TNs, and bring forth quantum entanglement measures as natural quantifiers of dependencies modeled by such networks. Accordingly, we propose a novel entanglement based deep learning design scheme that sheds light on the success of popular architectural choices made by deep learning practitioners, and suggests new practical prescriptions. Specifically, our analysis provides prescriptions regarding connectivity (pooling geometry) and parameter allocation (layer widths) in deep convolutional networks, and allows us to establish a first of its kind theoretical assertion for the exponential enhancement in long term memory brought forth by depth in recurrent networks. In the other direction, we identify that an inherent re-use of information in state-of-the-art deep learning architectures is a key trait that distinguishes them from TN based representations. Therefore, we suggest a new TN manifestation of information re-use, which enables TN constructs of powerful architectures such as deep recurrent networks and overlapping convolutional networks. This allows us to theoretically demonstrate that the entanglement scaling supported by state-of-the-art deep learning architectures can surpass that of commonly used expressive TNs in one dimension, and can support volume law entanglement scaling in two dimensions with an amount of parameters that is a square root of that required by Restricted Boltzmann Machines. We thus provide theoretical motivation to shift trending neural-network based wave-function representations closer to state-of-the-art deep learning architectures.

Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. In this talk I will introduce our neural relational inference model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks.

I will describe work with C.E.Wayne in which we study how dissipation leads to a very slow drift in secular parameters for the Non-linear Schroedinger equation. The nice thing about this model is that one can describe a relatively complex phenomenon in almost explicit terms as a perturbation from an integrable Hamiltonian system

The advancement in quantum computing, where Google, IBM, Microsoft, Intel are competing in the (exponentially growing) number of qubits in their (some already) commercial quantum computers that they produce, requires the reexamination of the Internet Security, and the public key infrastructure. The talk will describe the concept of overlay security together with blockchain based directories for establishing symmetric keys. When combined with nested Lamport signature and Merkle trees for digital signatures the result is a complete, easily implementable architecture with information theoretically secure communication, and hash based signatures.

**Mark Rudelson (UMich)**

**Title**: Invertibility of the adjacency matrices of random graphs.

**Abstract**: Consider an adjacency matrix of a bipartite, directed, or undirected Erdos-Renyi random graph. If the average degree of a vertex is large enough, then such matrix is invertible with high probability. As the average degree decreases, the probability of the matrix being singular increases, and for a sufficiently small average degree, it becomes singular with probability close to 1. We will discuss when this transition occurs, and what the main reason for the singularity of the adjacency matrix is.

This is a joint work with Anirban Basak.

**Yinon Spinka (TAU)**

**Title**: Finitary codings of Markov random fields

**Abstract**: Let X be a stationary Z^d-process. We say that X can be coded by an i.i.d. process if there is a(deterministic and translation-invariant) way to construct a realization of X from i.i.d. variables associated to the sites of Z^d. That is, if there is an i.i.d. process Y and a measurable map F from the underlying space of Y to that of X, which commutes with translations of Z^d and satisfies that F(Y)=X in distribution. Such a coding is called finitary if, in order to determine the value of X at a given site, one only needs to look at a finite (but random) region of Y.

It is known that a phase transition (existence of multiple Gibbs states) is an obstruction for the existence of such a finitary coding. On the other hand, we show that when X is a Markov random field satisfying certain spatial mixing conditions, then X can be coded by an i.i.d. process in a finitary manner. Moreover, the coding radius has exponential tails, so that typically the value of X at a given site is determined by a small region of Y.

We give applications to models such as the Potts model, proper colorings and the hard-core model.

Image restoration algorithms are typically evaluated by some distortion measure (e.g. PSNR, SSIM, IFC, VIF) or by human opinion scores that quantify perceived perceptual quality. In this work, we prove mathematically that distortion and perceptual quality are at odds with each other. Specifically, we study the optimal probability for correctly discriminating the outputs of an image restoration algorithm from real images. We show that as the mean distortion decreases, this probability must increase (indicating worse perceptual quality). As opposed to the common belief, this result holds true for any distortion measure, and is not only a problem of the PSNR or SSIM criteria. However, as we show experimentally, for some measures it is less severe (e.g. distance between VGG features). We also show that generative-adversarial-nets (GANs) provide a principled way to approach the perception-distortion bound. This constitutes theoretical support to their observed success in low-level vision tasks. Based on our analysis, we propose a new methodology for evaluating image restoration methods, and use it to perform an extensive comparison between recent super-resolution algorithms. Our study reveals which methods are currently closest to the theoretical perception-distortion bound.

* Joint work with Yochai Blau.

We present novel oblivious routing algorithms for the splittable and the unsplittable multicommodity flow settings. Our algorithms for both models improve upon the state-of-the-art, in terms of running time and performance with respect to graphs that exhibit good expansion guarantees. As an intermediate step towards the unsplittable setting, we present a novel generalization of Valiant's classical load balancing scheme for packet-switched networks to arbitrary graphs, which is of independent interest. Our approach relies on diffusing traffic throughout the network and then regathering it to its destination, via iterative applications of the random walk operator. Consequently, the performance guarantees of our algorithms are derived from the convergence of the random walk operator to the stationary distribution and are expressed in terms of the spectral gap of the graph (which dominates the mixing time).

Genome-wide forces of mutation and selection create local and global sequence patterns that carry information on functional role of genomic DNA. I will describe algorithmic approaches to genome analysis that decode sequence patterns and identify features of structural gene organization and gene expression (e.g. leaderless transcription). The algorithms make unsupervised inference of the structure of underlying graphical model and its parameters.

The subsequently developed software tools were, among other applications, used at NCBI (Bethesda, MD) to annotate and re-annotate 2,500+ fungal genomes and 130,000+ prokaryotic genomes. Yet another algorithm was employed by DOE JGI (Walnut Creek, CA) to annotate the largest collection of metagenomes.

Speaker s short Bio

Mark Borodovsky, PhD, a Regents' Professor at the Joint Georgia Tech and Emory University Department of Biomedical Engineering and the School of Computational Science and Engineering. Since 1990 when he arrived to Atlanta from Moscow, Dr Borodovsky had also joint faculty appointments at the School of Biology and the School of Mathematics. Dr. Borodovsky is a Founder of the Georgia Tech Bioinformatics graduate program.

Suppose a language L can be decided by a bounded-error randomized algorithm that runs in space S and time n * poly(S). We give a randomized algorithm for L that still runs in space O(S) and time n * poly(S) that uses only O(S) random bits; our algorithm has a low failure probability on all but a negligible fraction of inputs of each length. An immediate corollary is a deterministic algorithm for L that runs in space O(S) and succeeds on all but a negligible fraction of inputs of each length. We also discuss additional complexity-theoretic applications of our technique.

Points of infinite multiplicity are particular points which the Brownian motion visits infinitely often. Following a work of Bass, Burdzy and Khoshnevisan, we construct and study a measure carried by these points. Joint work with Yueyun Hu and Zhan Shi.

Scattering effects in images, including those related to haze, fog, and appearance of clouds, are fundamentally dictated by microphysical characteristics of the scatterers. We define and derive recovery of these characteristics, in a three-dimensional heterogeneous medium. Recovery is based on a novel tomography approach. Multiview (multi-angular) and multi-spectral data are linked to the underlying microphysics using 3D radiative transfer, accounting for multiple-scattering. Despite the nonlinearity of the tomography model, inversion is enabled using a few approximations that we describe. As a case study, we focus on passive remote sensing of the atmosphere, where scatterer retrieval can benefit modeling and forecasting of weather, climate, and pollution.

We present a new way to derive the replica symmetric solution for the free energy in mean-field spin glasses. Only the Sherrington-Kirpatrick case has been worked out in details, but the method also works in other cases, for instance for the perceptron (work in progress), and probably also for the Hopfield net. The method is closely related to the TAP equations (for Thouless-Anderson-Palmer). It does not give any new results, presently, but it gives a new viewpoint, and it looks to be quite promising. As the TAP equations are widely discussed in the physics literature, also at low temperature, it is hoped that the method could be extended to this case, too. But this is open, and probably very difficult.

We will introduce the (new) notion of approximability in triangulated categories and show its power.

The brief summary is that the derived category of quasicoherent sheaves on a separated, quasicompact scheme is an approximable triangulated category.

As relatively easy corollaries one can: (1) prove an old conjecture of Bondal and Van den Bergh, about strong generation in D^{perf}(X), (2) generalize an old theorem of of Rouquier about strong generation in D^b_{coh}(X). Rouquier proved the result only in equal characteristic, we can extend to mixed characteristic, and (3) generalize a representability theorem of Bondal and Van den Bergh,from proper schemes of finite type over fields to proper schemes of finite type over any noetherian rings.

After stating these results and explaining what they mean, we will (time permitting) also mention structural theorems. It turns out that approximable triangulated categories have a fair bit of intrinsic, internal structure that comes for free.

Recent success stories of using machine learning for diagnosing skin cancer, diabetic retinopathy, pneumonia, and breast cancer may give the impression that artificial intelligence (AI) is on the cusp of radically changing all aspects of health care. However, many of the most important problems, such as predicting disease progression, personalizing treatment to the individual, drug discovery, and finding optimal treatment policies, all require a fundamentally different way of thinking. Specifically, these problems require a focus on *causality* rather than simply prediction. Motivated by these challenges, my lab has been developing several new approaches for causal inference from observational data. In this talk, I describe our recent work on the deep Markov model (Krishnan, Shalit, Sontag AAAI '17) and TARNet (Shalit, Johansson, Sontag, ICML '17).

Recent success stories of using machine learning for diagnosing skin cancer, diabetic retinopathy, pneumonia, and breast cancer may give the impression that artificial intelligence (AI) is on the cusp of radically changing all aspects of health care. However, many of the most important problems, such as predicting disease progression, personalizing treatment to the individual, drug discovery, and finding optimal treatment policies, all require a fundamentally different way of thinking. Specifically, these problems require a focus on *causality* rather than simply prediction. Motivated by these challenges, my lab has been developing several new approaches for causal inference from observational data. In this talk, I describe our recent work on the deep Markov model (Krishnan, Shalit, Sontag AAAI '17) and TARNet (Shalit, Johansson, Sontag, ICML '17).

We show that any area-preserving C^{r}-diffeomorphism of a two-dimensional surface displaying an elliptic fxed point can be C^{r}-perturbed to one exhibiting a chaotic island whose metric entropy is positive, for every 1 ≤ r≤ 1. This proves a conjecture of Herman stating that the identity map of the disk can be C^{ ∞}-perturbed to a conservative di eomorphism with positive metric entropy. This implies also that the Chirikov standard map for large and small parameter values can be C^{∞}- approximated by a conservative diffeomorphisms displaying a positive metric entropy (a weak version of Sinai's positive metric entropy conjecture). Finally, this sheds light onto a Herman's question on the density of C^{r}-conservative di eomorphisms displaying a positive metric entropy: we show the existence of a dense set formed by conservative diffeomorphisms which either are weakly stable (so, conjecturally, uniformly hyperbolic) or display a chaotic island of positive metric entropy.

This is a joint work with Pierre Berger.

The generalized Jacobian Jac_m(C ') of a smooth hyperelliptic curve C' associated with a module m is an algebraic group that can be described by using lines bundle of the curve C' or by using a symmetric product of the curve C' provided with a law of composition. This second definition of the Jacobian Jac_m(C') is directly related to the fibres of a Mumford system. To be precise it is a subset of the compactified Jac_m(C') which is related to the fibres. This presentation will help us to demystify the relationship of these two mathematical objects.

Multiple wireless sensing tasks, e.g. radar detection for driver safety, involve estimating the "channel" or relationship between signal transmitted and received.

In this talk I will tell about the standard math model for the radar channel. In the case where the channel is sparse, I will demonstrate a channel estimation algorithm that is sub-linear in sampling and arithmetic complexity (and convince you of the need for such).

The main ingredients in the algorithm will be the use of an intrinsic algebraic structure known as the Heisenberg group and recent developments in the theory of the sparse Fast Fourier Transform (sFFT, due to Indyk et al.)

The talk will assume minimal background knowledge.

Understanding the ways in which the Brain gives rise to the Mind (memory, behavior, cognition, intelligence, language) is the most challengingproblem facing science today. As the answer seems likely to be at least partly computational, computer scientists should work on this problem --- except there is no obvious place to start. I shall recount recent work (with W. Maass and S. Vempala) on a model for the formation and association of memories in humans, and reflect on why it may be a clue about language.

Variational problems in geometry, fabrication, learning, and related applications lead to structured numerical optimization problems for which generic algorithms exhibit inefficient or unstable performance. Rather than viewing the particularities of these problems as barriers to scalability, in this talk we embrace them as added structure that can be leveraged to design large-scale and efficient techniques specialized to applications with geometrically structure variables. We explore this theme through the lens of case studies in surface parameterization, optimal transport, and multi-objective design

A long line of research studies the space complexity of estimating a norm l(x) in the data-stream model, i.e., when x is the frequency vector of an input stream consisting of insertions and deletions of items of n types. I will focus on norms l (in R^n) that are *symmetric*, meaning that l is invariant under sign-flips and coordinate-permutations, and show that the streaming space complexity is essentially determined by the measure-concentration characteristics of l. These characteristics are known to govern other phenomena in high-dimensional spaces, such as the critical dimension in Dvoretzky's Theorem.

The family of symmetric norms contains several well-studied norms, such as all l_p norms, and indeed we provide a new explanation for the disparity in space complexity between p<=2 and p>2. We also obtain bounds for other norms that are useful in applications.

Joint work with Jaroslaw Blasiok, Vladimir Braverman, Stephen R. Chestnut, and Lin F. Yang.

Multi-finger caging offers a robust approach to object grasping. To securely grasp an object, the fingers are first placed in caging regions surrounding a desired immobilizing grasp. This prevents the object from escaping the hand, and allows for great position uncertainty of the fingers relative to the object. The hand is then closed until the desired immobilizing grasp is reached.

While efficient computation of two-finger caging grasps for polygonal objects is well developed, the computation of three-finger caging grasps has remained a challenging open problem. We will discuss the caging of polygonal objects using three-finger hands that maintain similar triangle finger formations during the grasping process. While the configuration space of such hands is four dimensional, their contact space which represents all two and three finger contacts along the grasped object's boundary forms a two-dimensional stratified manifold.

We will present a caging graph that can be constructed in the hand's relatively simple contact space. Starting from a desired immobilizing grasp of the object by a specific triangular finger formation, the caging graph is searched for the largest formation scale value that ensures a three-finger cage about the object. This value determines the caging regions, and if the formation scale is kept below this value, any finger placement within the caging regions will guarantee a robust object grasping.

An extremal metric, as defined by Calabi, is a canonical Kahler metric: it minimizes the curvature within a given Kahler class. According to the Yau-Tian-Donaldson conjecture, polarized Kahler manifolds admitting an extremal metric should correspond to stable manifolds in a Geometric Invariant Theory sense.

In this talk, we will explain that a projective extremal Kahler manifold is asymptotically relatively Chow stable. This fact was conjectured by Apostolov and Huang, and its proof relies on quantization techniques. We will explain various implications, such that unicity or splitting results for extremal metrics.

Joint work with Yuji Sano ( Fukuoka University).

A curious property of randomized log-space search algorithms is that their outputs are often longer than their workspace. One consequence is that there is no clear way to reproduce the same output when running the algorithm twice on the same input. It is not feasible to store the random bits (or the output) of the previous run in log-space, and using new random bits in another execution can result in a different output. This leads to the question: how can we reproduce the results of a randomized log space computation of a search problem?

We will give a precise definition of this notion of "reproducibility". Then we will show that every problem in search-RL has a randomized log-space algorithm where the output can be reproduced. Reproducibility can be thought of as an extension of pseudo-determinism. Indeed, for some problems in search-RL we show pseudo-deterministic algorithms whose running time significantly improve on known deterministic algorithms.

Joint work with Yang Liu.

Oren Louidor (Technion)

**Title: Dynamical freezing in a spin-glass with logarithmic correlations**.

Abstract: We consider a continuous time random walk on the 2D torus, governed by the exponential of the discrete Gaussian free field acting as potential. This process can be viewed as Glauber dynamics for a spin-glass system with logarithmic correlations. Taking temperature to be below the freezing point, we then study this process both at pre-equilibrium and in-equilibrium time scales. In the former case, we show that the system exhibits aging and recover the arcsine law as asymptotics for a natural two point temporal correlation function. In the latter case, we show that the dynamics admits a functional scaling limit, with the limit given by a variant of Kolmogorov's K-process, driven by the limiting extremal process of the field, or alternatively, by a super-critical Liouville Brownian motion. Joint work with A. Cortines, J. Gold and A. Svejda.

Alexander Glazman (Tel Aviv)

**Title: Level lines of a random Lipschitz function**

Abstract: We consider the uniform distribution on Lipschitz functions on the triangular lattice, i.e. all integer-valued functions which differ by 0 or 1 on any two adjacent vertices. We show that with a positive probability such a function exhibits macroscopic level lines. Instead of working directly with Lipschitz functions we map this model to the loop $O(2)$ model with parameter $x=1$. The loop $O(n)$ model is a model for a random collection of non-intersecting loops on the hexagonal lattice, which is believed to be in the same universality class as the spin $O(n)$ model.

A main tool in the proof is a positive association (FKG) property that was recently shown to hold when $n \ge 1$ and $0<x\le\frac{1}{\sqrt{n}}$. Though the case $n=2$, $x=1$ is not in the FKG regime, it turns out that when loops are assigned one of two colours independently the marginal on loops of either of the colours does satisfy the FKG property. The colouring of loops allows to view the loop $O(2)$ model with $x=1$ as coupling of two percolation configurations. Studying each of them independently and using the XOR operation we establish existence of macroscopic loops, i.e. level lines in the original setting.

(Based on a joint work with H. Duminil-Copin, and I. Manolescu)

In unsupervised domain mapping, the learner is given two unmatched datasets A and B. The goal is to learn a mapping G_AB that translates a sample in A to the analog sample in B. Recent approaches have shown that when learning simultaneously both G_AB and the inverse mapping G_BA, convincing mappings are obtained. In this work, we present a method of learning G_AB without learning G_BA. This is done by learning a mapping that maintains the distance between a pair of samples. Moreover, good mappings are obtained, even by maintaining the distance between different parts of the same sample before and after mapping. We present experimental results that the new method not only allows for one sided mapping learning, but also leads to preferable numerical results over the existing circularity-based constraint.

In formal verification, one uses mathematical tools in order to prove that a system satisfies a given specification.

In this talk I consider three limitations of traditional formal verification:

The first is the fact that treating systems as finite-state machines is problematic, as systems become more complex, and thus we must consider infinite-state machines.

The second is that Boolean specifications are not rich enough to specify properties required by today's systems. Thus, we need to go beyond the Boolean setting.

The third concerns certifying the results of verification: while a verifier may answer that a system satisfies its specification, a designer often needs some palpable evidence of correctness.

I will present several works addressing the above limitations, and the mathematical tools involved in overcoming them.

No prior knowledge is assumed. Posterior knowledge is guaranteed.

We develop the theory of semisimplifications of tensor categories defined by Barrett and Westbury. By definition, the semisimplification of a tensor category is its quotient by the tensor ideal of negligible morphisms, i.e., morphisms f such that Tr(fg)=0 for any morphism g in the opposite direction. In particular, we compute the semisimplification of the category of representations of a finite group in characteristic p in terms of representations of the normalizer of its Sylow p-subgroup. This allows us to compute the semisimplification of the representation category of the symmetric group S_{n+p} in characteristic p, where n=0,...,p-1, and of the Deligne category Rep^{ab} S_t, t in N. We also compute the semisimplification of the category of representations of the Kac-De Concini quantum group of the Borel subalgebra of sl_2. Finally, we study tensor functors between Verlinde categories of semisimple algebraic groups arising from the semisimplification construction, and objects of finite type in categories of modular representations of finite groups (i.e., objects generating a fusion category in the semisimplification).

This is joint work with Victor Ostrik.

In the area of distributed graph algorithms a number of network's entities with local views solve some computational task by exchanging messages with their neighbors. Quite unfortunately, an inherent property of most existing distributed algorithms is that throughout the course of their execution, the nodes get to learn not only their own output but rather learn quite a lot on the inputs or outputs of many other entities. This leakage of information might be a major obstacle in settings where the output (or input) of network's individual is a private information (e.g. distributed networks of selfish agents, decentralized digital currency such as Bitcoin, voting systems).

While being quite unfamiliar notion in the classical distributed setting, the notion of secure multi-party computation (MPC) is one of the main themes in the Cryptography community. Yet despite all extensive work in the area, no existing algorithm fits the framework of classical distributed models in which there are no assumptions on the graph topologies and only messages of bounded size are sent on the edges in each round.

In this work, we introduce a new framework for \emph{secure distributed graph algorithms} and provide the first \emph{general compiler} that takes any "natural" non-secure distributed algorithm that runs in $r$ rounds, and turns it into a secure algorithm that runs in $\widetilde{O}(r \cdot D \cdot poly(\Delta))$ rounds where $\Delta$ is the maximum degree in the graph and $D$ is its diameter. This round complexity is nearly optimal for bounded degree graphs.

The main technical part of our compiler is based on a new cycle cover theorem: We show that the edges of every bridgeless graph $G$ of diameter $D$ can be covered by a collection of cycles such that each cycle is of length $\widetilde{O}(D)$ and each edge of the graph $G$ appears in $\widetilde{O}(1)$ many cycles. This provides the basis for additional combinatorial constructions required by our compiler and might be of independent combinatorial and algorithmic interest.

Joint work with Merav Parter.

In recent years, robots have played an active role in everyday life: medical robots assist in complex surgeries, low-cost commercial robots clean houses and fleets of robots are used to efficiently manage warehouses. A key challenge in these systems is motion planning, where we are interested in planning a collision-free path for a robot in an environment cluttered with obstacles. While the general problem has been studied for several decades now, these new applications introduce an abundance of new challenges.

In this talk I will describe some of these challenges as well as algorithms developed to address them. I will overview general challenges such as compression and graph-search algorithms in the context of motion planning. I will show why traditional Computer Science tools are ill-suited for these problems and introduce alternative algorithms that leverage the unique characteristics of robot motion planning. In addition, I will describe domains-specific challenges such as those that arise when planning for assistive robots and for humanoid robots and overview algorithms tailored for these specific domains.

Language technology has become pervasive in everyday life, powering applications like Apple's Siri or Google's Assistant. Neural networks are a key component in these systems thanks to their ability to model large amounts of data. Contrary to traditional systems, models based on deep neural networks (a.k.a. deep learning) can be trained in an end-to-end fashion on input-output pairs, such as a sentence in one language and its translation in another language, or a speech utterance and its transcription. The end-to-end training paradigm simplifies the engineering process while giving the model flexibility to optimize for the desired task. This, however, often comes at the expense of model interpretability: understanding the role of different parts of the deep neural network is difficult, and such models are often perceived as "black-box". In this work, we study deep learning models for two core language technology tasks: machine translation and speech recognition. We advocate an approach that attempts to decode the information encoded in such models while they are being trained. We perform a range of experiments comparing different modules, layers, and representations in the end-to-end models. Our analyses illuminate the inner workings of end-to-end machine translation and speech recognition systems, explain how they capture different language properties, and suggest potential directions for improving them. The methodology is also applicable to other tasks in the language domain and beyond.

In this talk, I present an analogue of the Hardy-Littlewood conjecture on the asymptotic distribution of prime constellations in the setting of short intervals in function fields of smooth projective curves over finite fields.

I will discuss the definition of a "short interval" on a curve as an additive translation of the space of global sections of a sufficiently positive divisor E by a suitable rational function f, and show how this definition generalizes the definition of a short interval in the polynomial setting.

I will give a sketch of the proof which includes a computation of a certain Galois group, and a counting argument, namely, Chebotarev density type theorem.

This is a joint work with Tyler Foster.

Computational problems whose input is a program are central in Cryptography, as well as Complexity, Learning, and Optimization. The nature of such problems crucially depends on the way the program is accessed -- as a black box or explicitly by its implementation.

In which settings can we exploit code to gain an advantage over black-box access? In Cryptography, we explore this question from two opposing perspectives:

Protecting Code: Can we obfuscate a program's code so that its functionality is preserved but it is otherwise unintelligible? Intuitively, such obfuscated code reveals nothing more than black-box access to the program. Obfuscation is, therefore, a powerful tool with numerous applications in software protection and Cryptography.

Exploiting Code: Most security proofs in cryptography consist of a reduction that translates any potential attacker into an algorithm solving some underlying hard problem. While most security reductions only require black-box access to the attacker, for many applications black-box reductions are provably insufficient. Can we exploit the attacker's code to prove security where black-box reductions fail?

In this talk, I will describe new techniques for protecting and exploiting code, taking advantage of the inherent tension between these two tasks. I will also demonstrate applications of these techniques in and beyond cryptography.

Sebastien Bubeck (Microsoft Research)

**Title: k-server via multiscale entropic regularization**

Abstract: I will start by describing how mirror descent is a natural strategy for online decision making, specifically in online learning and metrical task systems. To motivate the k-server problem I will also briefly recall what we know and what we don't know for structured state/action spaces in these models. Using the basic mirror descent calculations I will show how to easily obtain a log(k)-competitive algorithm for k-paging. I will then introduce our new parametrization of fractional k-server on a tree, and explain how to analyze the movement cost of entropy-regularized mirror descent on this parametrization. This leads to a depth*log(k)-competitive (fractional) algorithm for general trees, and log^2(k) for HSTs. I will also briefly mention dynamic embeddings to go beyond the standard log(n) loss in the reduction from general metrics to HSTs.

Joint work with Michael B. Cohen, James R. Lee, Yin Tat Lee, and Aleksander Madry.

Percy Deift (Courant Institute )

**Title: Universality in numerical analysis with some examples of cryptographic algorithms.**

Abstract: We show that a wide variety of numerical algorithms with random data exhibit universality. Most of the results are computational, but in some important cases universality is established rigorously. We also discuss universality for some cryptographic algorithms.

Joint work with C. Pfrany, G. Menon, S. Olver, T. Trogdan and S. Miller.

Given a p-adic group G, number theorists are interested in producing admissible representations of G: representations which have a well-defined character functional. One way to produce such representations is by "Jacquet induction" from smaller groups, whose characters can be understood inductively. The complementary space of "new" characters which are not obtained by induction (complementary with respect to a natural metric on the space of characters) is given by what is called "elliptic" characters. Given a representation V of G, the "new" input from its character is captured by the operator Ax(V), with A (the Bernstein-Deligne-Kazhdan A-operator) the projector to the elliptic component (note that this is different from the component of the character lattice valued in elliptic elements). I will talk about my ongoing work with Xuhua He on extending this operator to a trace functional Ax(V) for V a finitely-generated representation (whose Grothendieck group is well understood), which works by first constructing a virtual elliptic admissible representation from any finitely generated representation.

Being the gradient flow of the area functional, the mean curvature flow can be thought of as a greedy algorithm for simplifying embedded shapes. But how successful is this algorithm?

In this talk, I will describe three examples for how mean curvature flow, as well as its variants and weak solutions, can be used to achieve this desired simplification.

The first is a short time smoothing effect of the flow, allowing to smooth out some rough, potentially fractal initial data.

The second is an application of mean curvature flow with surgery to smooth differential topology, allowing to conclude Schoenflies-type theorems about the moduli space of smooth embedded spheres and tori, satisfying some curvature conditions.

The third is an application of (weak,modified) mean curvature flow to differential geometry, allowing to relate bounds on the gaussian entropy functional to the topology of a closed hypersurface.

In this talk, which will assume no prior knowledge in PDE or mean curvature flow, I will try to highlight the relation between the analysis of the flow and in particular, its singularity formation, to both ''time dependent'' and "classical" geometry.

Some of the results described in this talk are joint works with Reto Buzano, Robert Haslhofer and Brian White.

We recall the notion of a hall algebra associated to a category, and explain how this construction can be done in a way that naturally includes a higher algebra structure, motivated by work of Toen and Dyckerhoff-Kapranov. We will then explain how this leads to new insights about the bi-algebra structure and related concepts.

The problem of computational super-resolution asks to recover high-frequency features of an object from the noisy and blurred/band-limited samples of its Fourier transform, based on some a-priori information about the object class. A core theoretical question is to quantify the possible amount of bandwidth extension and the associated stability of recovering the missing frequency components, as a function of the sample perturbation level and the object prior.

In this work we assume that the object has a compact space/time extent in one dimension (but otherwise can be fairly arbitrary), while the low-pass window can have a super-exponentially decaying "soft" shape (such as a Gaussian). In contrast, previously known results considered only the ideal "hard" window (a characteristic function of the band) and objects of finite energy. The super-resolution problem in this case is equivalent to a stable analytic continuation of an entire function of finite exponential type. We show that a weighted least-squares polynomial approximation with equispaced samples and a judiciously chosen number of terms allows one to have a super-resolution factor which scales logarithmically with the noise level, while the pointwise extrapolation error exhibits a Holder-type continuity with an exponent derived from weighted potential theory. The algorithm is asymptotically minimax, in the sense that there is essentially no better algorithm yielding meaningfully lower error over the same smoothness class.

The results can be considered as a first step towards analyzing the much more realistic model of a sparse combination of compactly-supported "approximate spikes", which appears in applications such as synthetic aperture radar, seismic imaging and direction of arrival estimation, and for which only limited special cases are well-understood.

Joint work with L.Demanet and H.Mhaskar.

The use of a computational PIR scheme has been instrumental in reducing interaction from interactive proofs, and in converting multi-prover interactive proofs to (single prover) 2-message computationally sound proofs (also known as arguments).

In this talk we will focus on the secrecy guarantees of this transformation.

We show that if we start with an interactive proof which is only *honest-verifier* zero-knowledge, and we use a quasi-poly secure *symmetric* PIR scheme (or a 2-message OT scheme) to reduce interaction, then the resulting 2-message argument is witness indistinguishable, and in the delayed-input setting it is distributional weak zero-knowledge (which implies strong witness indistinguishable and witness hiding in the delayed input setting). Moreover, under the same assumption (which can be instantiated from quasi-poly DDH/QR/N'th residuosity assumption), we construct a two-message argument with (similar) *statistical* secrecy guarantees. For the latter, we apply the PIR heuristic on a computationally sound proof, which is honest-verifier statistical zero-knowledge.

This is based on joint works with Abhishek Jain, Dakshita Khurana, Ron Rothblum and Amit Sahai.

Graph matching is one of the most well-studied problems in combinatorial optimization, with applications ranging from scheduling and object recognition to numerical analysis and computational chemistry.

Nevertheless, until recently very little was unknown about this problem in real-life **dynamic networks**, which aim to model the constantly changing physical world.

In the first part of the talk we'll discuss our work on dynamic graph matching, and in the second part we'll highlight our work on a few related problems.

Talk 1: Amir Dembo (Stanford)

**Title: Large deviations theory for chemical reaction networks.**

The microscopic dynamics of well-stirred networks of chemical reactions are modeled as jump Markov processes. At large volume, one may expect in this framework to have a straightforward application of large deviation theory. This is not at all true, for the jump rates are typically neither globally Lipschitz, nor bounded away from zero, with both blowup and absorption as quite possible scenarios. In joint work with Andrea Agazzi and Jean-Pierre Eckman, we utilize Lyapunov stability theory to bypass this challenge and to characterize a large class of network topologies that satisfy the full Wentzell-Freidlin theory of asymptotic rates of exit from domains of attraction.

Talk 2: Yuval Peres (Microsoft)

Title: **Trace reconstruction for the deletion channel**

Abstract: In the trace reconstruction problem, an unknown string $x$ of $n$ bits is observed through the deletion channel, which deletes each bit with some constant probability q, yielding a contracted string. How many independent outputs (traces) of the deletion channel are needed to reconstruct $x$ with high probability?

The best lower bound known is linear in $n$. Until 2016, the best upper bound was exponential in the square root of $n$. We improve the square root to a cube root using statistics of individual output bits and some inequalities for Littlewood polynomials on the unit circle. This bound is sharp for reconstruction algorithms that only use this statistical information. (Similar results were obtained independently and concurrently by De O'Donnell and Servedio). If the string $x$ is random, we can show a subpolynomial number of traces suffices by comparison to a random walk. (Joint works with Fedor Nazarov, STOC 2017, with Alex Zhai, FOCS 2017 and with Nina Holden & Robin Pemantle, preprint (2017).)

Image captioning -- automatic production of text describing a visual scene -- has received a lot of attention recently. However, the objective of captioning, evaluation metrics, and the training protocol remain somewhat unsettled. The general goal seems to be for machines to describe visual signal like humans do. We pursue this goal by incorporating a discriminability loss in training caption generators. This loss is explicitly "aware" of the need for the caption to convey information, rather than appear fluent or reflect word distribution in the human captions. Specifically, the loss in our work is tied to discriminative tasks: producing a referring expression (text that allows a recipient to identify a region in the given image) or producing a discriminative caption which allows the recipient to identify an image within a set of images. In both projects, use of the dscriminability loss does not require any additional human annotations, and relies on collaborative training between the caption generator and a comprehension model, which is a proxy for a human recipient. In experiments on standard benchmarks, we show that adding discriminability objectives not only improves the discriminative quality of the generated image captions, but, perhaps surprisingly, also makes the captions better under a variety of traditional metrics.

Joint work with Ruotian Luo (TTIC), Brian Price and Scott Cohen (Adobe).

Let K be a commutative ring. Consider the groups GLn(K). Bernstein and Zelevinsky have studied the representations of the general linear groups in case the ring K is a nite eld. Instead of studying the representations of GLn(K) for each n separately, they have studied all the representations of all the groups GLn(K) si- multaneously. They considered on R := nR(GLn(K)) structures called parabolic (or Harish-Chandra) induction and restriction, and showed that they enrich R with a structure of a so called positive self adjoint Hopf algebra (or PSH algebra). They use this structure to reduce the study of representations of the groups GLn(K) to the following two tasks:

1. Study a special family of representations of GLn(K), called cuspidal representa- tions. These are representations which do not arise as direct summands of parabolic induction of smaller representations.

2. Study representations of the symmetric groups. These representation also has a nice combinatorial description, using partitions.

In this talk I will discuss the study of representations of GLn(K) where K is a nite quotient of a discrete valuation ring (such as Z=pr or k[x]=xr, where k is a nite eld). One reason to study such representation is that all continuous complex representations of the groups GLn(Zp) and GLn(k[[x]]) (where Zp denotes the p-adic integers) arise from these nite quotients. I will explain why the natural generalization of the Harish-Chandra functors do not furnish a PSH algebra in this case, and how is this related to the Bruhat decomposition and Gauss elimination. In order to overcome this issue we have constructed a generalization of the Harish-Chandra functors. I will explain this generalization, describe some of the new functors properties, and explain how can they be applied to studying complex representations.

The talk will be based on a joint work with Tyrone Crisp and Uri Onn.

Today's technological world is increasingly dependent upon the reliability, robustness, quality of service and timeliness of networks including those of power distribution, financial, transportation, communication, biological, and social. For the time-critical functionality in transferring resources and information, a key requirement is the ability to adapt and reconfigure in response to structural and dynamic changes, while avoiding disruption of service and catastrophic failures. We will outline some of the major problems for the development of the necessary theory and tools that will permit the understanding of network dynamics in a multiscale manner.

Many interesting networks consist of a finite but very large number of nodes or agents that interact with each other. The main challenge when dealing with such networks is to understand and regulate the collective behavior. Our goal is to develop mathematical models and optimization tools for treating the ** Big Data** nature of large scale networks while providing the means to understand and regulate the collective behavior and the dynamical interactions (short and long-range) across such networks.

The key mathematical technique will be based upon the use optimal mass transport theory and resulting notions of curvature applied to weighted graphs in order to characterize network robustness. Examples will be given from biology, finance, and transportation.

I shall review the framework of algebraic families of Harish-Chandra modules, introduced recently, by Bernstein, Higson, and the speaker. Then, I shall describe three of their applications.

The first is contraction of representations of Lie groups. Contractions are certain deformations of representations with applications in mathematical physics.

The second is the Mackey bijection, this is a (partially conjectural) bijection between the admissible dual of a real reductive group and the admissible dual of its Cartan motion group.

The third is the hidden symmetry of the hydrogen atom as an algebraic family of Harish-Chandra modules.

Consider a random sequence of n bits that has entropy at least n-k, where k << n. A commonly used observation is that an average coordinate of this random sequence is close to being uniformly distributed, that is, the coordinate "looks random''. In this work, we prove a stronger result that says, roughly, that the average coordinate looks random to an adversary that is allowed to query about n/k other coordinates of the sequence, even if the adversary is non-deterministic.

As an application of this result, we prove a new result on depth-3 circuits, which recovers as a direct corollary the known lower bounds for the parity and majority functions, as well as a lower bound on sensitive functions due to Boppana.

In this talk we discuss the fine scale $L^2$-mass distribution of toral Laplace eigenfunctions with respect to random position. For the 2-dimensional torus, under certain flatness assumptions on the Fourier coefficients of the eigenfunctions and generic restrictions on energy levels, both the asymptotic shape of the variance and the limiting Gaussian law are established, in the optimal Planck-scale regime. We also discuss the 3-dimensional case, where the asymptotic behaviour of the variance is analysed in a more restrictive scenario. This is joint work with Igor Wigman.

Deep Learning has led to a dramatic leap in Super-Resolution (SR) performance in the past few years. However, being supervised, these SR methods are restricted to specific training data, where the acquisition of the low-resolution (LR) images from their high-resolution (HR) counterparts is predetermined (e.g., bicubic downscaling), without any distracting artifacts (e.g., sensor noise, image compression, non-ideal PSF, etc). Real LR images, however, rarely obey these restrictions, resulting in poor SR results by SotA (State of the Art) methods. In this paper we introduce "Zero-Shot" SR, which exploits the power of Deep Learning, but does not rely on prior training. We exploit the internal recurrence of information inside a single image, and * train a small image-specific CNN at test time, on examples extracted solely from the input image itself.* As such, it can adapt itself to different settings per image. This allows to perform SR of real old photos, noisy images, biological data, and other images where the acquisition process is unknown or non-ideal. On such images, our method outperforms SotA CNN-based SR methods, as well as previous unsupervised SR methods. To the best of our knowledge, this is the first unsupervised CNN-based SR method.

Abstract. As is well known and easy to prove the Weyl algebras A_n over a field of characteristic zero are simple. Hence any non-zero homomorphism from A_n to A_m is an embedding and m \geq n. V. Bavula conjectured that the same is true over the fields with finite characteristic. It turned out that exactly one half of his conjecture is correct (m \geq n but there are homomorphisms which are not embeddings).

If we replace the Weyl algebra by its close relative symplectic Poisson algebra (polynomial algebra with F[x_1, ..., x_n; y_1, ..., y_n] variables and Poisson bracket given by {x_i, y_i} =1 and zero on the rest of the pairs), then independently of characteristic all homomorphisms are embeddings.

In this talk I will describe a family of integral representations for the standard twisted L-function of a cuspidal representation of the exceptional group of type G_2. This integral representations. These integral representations are unusual in the sense that they unfold with a non-unique model. A priori this integral is not Eulerian but using remarkable machinery proposed by I. Piatetski-Shapiro and S. Rallis we prove that in fact the integral does factor. In the course of the plocal unramified calculation we use another non-standard method, approximations of generating functions. I will then describe a few applications of these integral representations to the study of the analytic behaviour of the this L-function and to various functorial lifts associated with the group G_2.

We present an explicit pseudorandom generator with polylog(n) seed length for read-once constant-width branching programs that can read their $n$ input bits in any order. This improves upon the work of Impagliazzo, Meka, and Zuckerman (FOCS, 2012), where they required seed length $n^{1/2+o(1)}$.

A central ingredient in our work is a bound on the Fourier spectrum of constant-width branching programs, settling a conjecture posed by Reingold, Steinke, and Vadhan (RANDOM, 2013).

Our analysis crucially uses a notion of local monotonicity on the edge labeling of the branching program. We carry critical parts of our proof under the assumption of local monotonicity and show how to deduce our results for unrestricted branching programs.

(Joint work with Eshan Chattopadhyay, Pooya Hatami, and Omer Reingold)

Machine learning has recently been revolutionized by the introduction of Deep Neural Networks. However, from a theoretical viewpoint these methods are still poorly understood. Indeed the key challenge in Machine Learning today is to derive rigorous results for optimization and generalization in deep learning. In this talk I will present several tractable approaches to training neural networks. At the second part I will discuss a new sequential algorithm for decision making that can take into account the structure in the action space and is more tuned with realistic decision making scenarios.

I will present our work that provides some of the first positive results and yield new, provably efficient, and practical algorithms for training certain types of neural networks. In a second work I will present a new online algorithm that learns by sequentially sampling random networks and asymptotically converges, in performance, to the optimal network. Our approach improves on previous random features based learning in terms of sample/computational complexity, and expressiveness. In a more recent work we take a different perspective on this problem. I will provide sufficient conditions that guarantee tractable learning, using the notion of refutation complexity. I will then discuss how this new idea can lead to new interesting generalization bounds that can potentially explain generalization in settings that are not always captured by classical theory.

In the setting of reinforcement learning I will present a recently developed new algorithm for decision making in a metrical action space. As an application, we consider a dynamic pricing problem in which a seller is faced with a stream of patient buyers. Each buyer buy at the lowest price in a certain time window. We use our algorithm to achieve an optimal regret, improving on previously known regret bound.

Schwartz functions are classically defined as smooth functions such that they, and all their (partial) derivatives, decay at infinity faster than the inverse of any polynomial. This was formulated on $\mathbb{R}^n$ by Laurent Schwartz, and later on Nash manifolds (smooth semi-algebraic varieties) by Fokko du Cloux and by Rami Aizenbud and Dima Gourevitch. In a joint work with Boaz Elazar we have extended the theory of Schwartz functions to the category of (possibly singular) real algebraic varieties. The basic idea is to define Schwartz functions on a (closed) algebraic subset of $\mathbb{R}^n$ as restrictions of Schwartz functions on $\mathbb{R}^n$.

Both in the Nash and the algebraic categories there exists a very useful characterization of Schwartz functions on open subsets, in terms of Schwartz functions on the embedding space: loosely speaking, Schwartz functions on an open subset are exactly restrictions of Schwartz functions on the embedding space, which are zero "to infinite order" on the complement to this open subset. This characterization suggests a very intuitive way to attach a space of Schwartz functions to an arbitrary (not necessarily semi-algebraic) open subset of $\mathbb{R}^n$.

In this talk, I will explain this construction, and more generally the construction of the category of tempered smooth manifolds. This category is in a sense the "largest" category whose objects "look" locally like open subsets of $\mathbb{R}^n$ (for some $n$), and on which Schwartz functions may be defined. In the development of this theory some classical results of Whitney are used, mainly Whitney type partition of unity (this will also be explained in the talk). As time permits, I will show some properties of Schwartz functions, and describe some possible applications. This is a work in progress.

Over the past two decades, machine learning has rapidly evolved and emerged as a highly influential discipline of computer science and engineering. One of the pillars of machine learning is mathematical optimization, and the connection between the two fields has been a primary focus of research. In this talk, I will present two recent works that contribute to this study, focusing on online learning---a central model in machine learning for sequential decision making and learning under uncertainty. In the first part of the talk, I will describe a foundational result concerned with the power of optimization in online learning, and give answer to the question: does there exist a generic and efficient reduction from online learning to black-box optimization? In the second part, I will discuss a recent work that employs online learning techniques to design a new efficient and adaptive preconditioned algorithm for large-scale optimization. Despite employing preconditioning, the algorithm is practical even in modern optimization scenarios such as those arising in training state-of-the-art deep neural networks. I will present the new algorithm along with its theoretical guarantees and demonstrate its performance empirically.

The random-cluster model is a dependent percolation model where the weight of a configuration is proportional to q to the power of the number of connected components. It is highly related to the ferromagnetic q-Potts model, where every vertex is assigned one of q colors, and monochromatic neighbors are encouraged. Through non-rigorous means, Baxter showed that the phase transition is first-order whenever $q > 4$ - i.e. there are multiple Gibbs measures at criticality. We provide a rigorous proof of this claim. Like Baxter, our proof uses the correspondence between the above models and the six-vertex model, which we analyze using the Bethe ansatz and transfer matrix techniques. We also prove Baxter's formula for the correlation length of the models at criticality.

This is joint work with Hugo Duminil-Copin, Maxime Gangebin, Ioan Manolescu, and Vincent Tassion.

We present a general approach to video understanding, inspired by semantic transfer techniques that have been successfully used for 2D image analysis. Our method considers a video to be a 1D sequence of clips, each one associated with its own semantics. The nature of these semantics -- natural language captions or other labels -- depends on the task at hand. A test video is processed by forming correspondences between its clips and the clips of reference videos with known semantics, following which, reference semantics can be transferred to the test video. We describe two matching methods, both designed to ensure that (a) reference clips appear similar to test clips and (b), taken together, the semantics of the selected reference clips is consistent and maintains temporal coherence. We use our method for video captioning on the LSMDC'16 benchmark, video summarization on the SumMe and TVSum benchmarks, Temporal Action Detection on the Thumos2014 benchmark, and sound prediction on the Greatest Hits benchmark. Our method not only surpasses the state of the art, in four out of five benchmarks, but importantly, it is the only single method we know of that was successfully applied to such a diverse range of tasks.

Digital geometry processing (DGP) is one of the core topics of computer graphics, and has been an active line of research for over two decades. On one hand, the field introduces theoretical studies in topics such as vector-field design, preservative maps and deformation theory. On the other hand, the tools and algorithms developed by this community are applicable in fields ranging from computer-aided design, to multimedia, to computational biology and medical imaging. Throughout my work, I have sought to bridge the gap between the theoretical aspects of DGP and their applications. In this talk, I will demonstrate how DGP concepts can be leveraged to facilitate real-life applications with the right adaptation. More specifically, I will portray how I have employed deformation theory to support problems in animation and augmented reality. I will share my thoughts and first taken steps to enlist DGP to the aid of machine learning, and perhaps most excitingly, I will discussion my own and the graphics community's contributions to computational fabrication field, as well as my vision for its future.

Bio: Dr. Amit H. Bermano is a postdoctoral Researcher at the Princeton Graphics Group, hosted by Professor Szymon Rusinkiewicz and Professor Thomas Funkhouser. Previously, he was a postdoctoral researcher at Disney Research Zurich in the computational materials group, led by Dr. Bernhard Thomaszewski. He conducted his doctoral studies at ETH Zurich under the supervision of Prof. Dr. Markus Gross, in collaboration with Disney Research Zurich. His Masters and Bachelors degrees were obtained at The Technion - Israel Institute of Technology under the supervision of Prof. Craig Gotsman. His research focuses on connecting the geometry processing field with other fields in computer graphics and vision, mainly by using geometric methods to facilitate other applications. His interests in this context include computational fabrication, animation, augmented reality, medical imaging and machine learning.

We present new heavy-hitters algorithms satisfying local-differential-privacy, with optimal or near-optimal worst-case error, running time, and memory. In our algorithms, the server running time is $\tilde O(n)$ and user running time is $\tilde O(1)$, hence improving on the prior state-of-the-art result of Bassily and Smith [STOC 2015] requiring $O(n^{5/2})$ server time and $O(n^{3/2})$ user time. With a typically large number of participants in local algorithms ($n$ in the millions), this reduction in time complexity is crucial for making locally-private heavy-hitters algorithms usable in practice.

Joint work with Raef Bassily, Kobbi Nissim, and Abhradeep Thakurta.

In this talk, I will try to present some techniques to handle the problem of large deviations of the spectrum of random matrices. I will focus on the case of macroscopic statistics of the spectrum of Hermitian matrices - in particular Wigner matrices - as the empirical distribution of the eigenvalues, the largest eigenvalue or the traces of powers.

In a first part, I will be concerned with the so-called "objective method''. Coined by David Aldous, this method consists in introducing, given a sequence of random objects, like random finite graphs, a new infinite random object from which one can deduce asymptotic properties of the original sequence. In the context of random matrices, this method has been mainly advertised by Balint Virag, and proven effective in showing universality results for the so-called beta-ensembles. Regarding large deviations of random matrices, this "objective method'' amounts to embed our sequence of matrices with growing size into an appropriate space on which one is able to understand the large deviations, and carry out a contraction principle. I will review several large deviations principles obtained by this method, given by interpretations of random matrices as either dense or sparse graphs, and point out the limits of this strategy.

Cryo-EM is an imaging technology that is revolutionizing structural biology; the Nobel Prize in Chemistry 2017 was recently awarded to Jacques Dubochet, Joachim Frank and Richard Henderson "for developing cryo-electron microscopy for the high-resolution structure determination of biomolecules in solution".

Cryo-electron microscopes produce a large number of very noisy two-dimensional projection images of individual frozen molecules. Unlike related methods, such as computed tomography (CT), the viewing direction of each image is unknown. The unknown directions, together with extreme levels of noise and additional technical factors, make the determination of the structure of molecules challenging.

While other methods for structure determination, such as x-ray crystallography and nuclear magnetic resonance (NMR), measure ensembles of molecules together, cryo-EM produces measurements of individual molecules. Therefore, cryo-EM could potentially be used to study mixtures of different conformations of molecules. Indeed, current algorithms have been very successful at analyzing homogeneous samples, and can recover some distinct conformations mixed in solutions, but, the determination of multiple conformations, and in particular, continuums of similar conformations (continuous heterogeneity), remains one of the open problems in cryo-EM.

I will discuss a one-dimensional discrete model problem, Heterogeneous Multireference Alignment, which captures many of the group properties and other mathematical properties of the cryo-EM problem. I will then discuss different components which we are introducing in order to address the problem of continuous heterogeneity in cryo-EM: 1. "hyper-molecules", the first mathematical formulation of truly continuously heterogeneous molecules, 2. The optimal representation of objects that are highly concentrated in both the spatial domain and the frequency domain using high-dimensional prolate spheroidal functions, and 3. Bayesian algorithms for inverse problems with an unsupervised-learning component for recovering such hyper-molecules in cryo-EM.

Weyl group multiple Dirichlet series are Dirichlet series in r complex variables which initially converge on a cone in C^r, possess analytic continuation to a meromorphic function on the whole complex space, and satisfy functional equations whose action on C^r is isomorphic to the Weyl group of a reduced root system. I will review different constructions of such series and discuss the relations between them.

The talk will discuss informational lower bounds of approximate Nash equilibrium in two complexity models: Query Complexity and Communication Complexity.

For both models we prove exponential (in the number of players) lower bound on the complexity of computing ε -Nash equilibrium, for constant value of approximation ε .

Consider a real Gaussian stationary process, either on Z or on R.

What is the probability that it remains positive on [0,N] for large N?

The relation between this probability, known as the persistence probability, and the covariance kernel of the process has been investigated since the 1950s with motivations stemming from probability, engineering and mathematical physics. Nonetheless, until recently, good estimates were known only for particular cases, or when the covariance kernel is either non-negative or summable.

In the first hour of the talk we will discuss new spectral methods which greatly simplify the analysis of persistence. We will then describe its qualitative behavior in a very general setting.

In the second hour, we will describe (very) recent progress. In particular we will show the proof of the "spectral gap conjecture'', which states: if the spectral measure vanishes on an interval containing 0 then the persistence is less then e^{-cN^2}, and this bound is tight if the measure is non-singular and compactly supported.

Time permitting, we will also discuss "tiny persistence'' phenomena (of the order of e^{-e^{cN}}).

Based on joint works with Ohad Feldheim, Benjamin Jaye, Fedor Nazarov and Shahaf Nitzan.

Isolating the voice of a specific person while filtering out other voices or background noises is challenging when video is shot in noisy environments, using a single microphone. For example, video conferences from home or office are disturbed by other voices, TV reporting from city streets is mixed with traffic noise, etc. We propose audio-visual methods to isolate the voice of a single speaker and eliminate unrelated sounds. Face motions captured in the video are used to estimate the speaker's voice, which is applied as a filter on the input audio. This approach avoids using mixtures of sounds in the learning process, as the number of such possible mixtures is huge, and would inevitably bias the trained model.

In the first part of this talk, I will describe a few techniques to predict speech signals by a silent video of a speaking person. In the second part of the talk, I will propose a method to separate overlapping speech of several people speaking simultaneously (known as the cocktail-party problem), based on the speech predictions generated by video-to-speech system.

Assume that $(X,f)$ is a dynamical system and $\phi$ is a real non negative potential such that the corresponding $f$-invariant measure $\mu_\phi$ measure is infinite. Under assumptions of good inducing schemes, we give conditions under which the pressure of $f$ for a perturbed potential $\phi+s\psi$ relates to the pressure of the induced system term.

This extends some of Sarig's results to the setting of infinite "equilibrium states".

In addition, limit properties of the family of measures $\mu_{\phi+s\psi}$ as $s\to 0$ are studied and statistical properties (e.g. correlation coefficients) under the limit measure are derived. I will discuss several examples.

This is based on joint work with H. Bruin and M. Todd.

As a concrete variant of motivic integration, I will discuss uniform p-adic integration and constructive aspects of results involved. Uniformity is in the p-adic fields, and, for large primes p, in the fields F_p((t)) and all their finite field extensions. Using real-valued Haar measures on such fields, one can study integrals, Fourier transforms, etc. We follow a line of research that Jan Denef started in the eighties, with in particular the use of model theory to study various questions related to p-adic integration. A form of uniform p-adic quantifier elimination is used. Using the notion of definable functions, one builds constructively a class of complex-valued functions which one can integrate (w.r.t. any of the variables) without leaving the class. One can also take Fourier transforms in the class. Recent applications in the Langlands program are based on Transfer Principles for uniform p-adic integrals, which allow one to get results for F_p((t)) from results for Q_p, once p is large, and vice versa. These Transfer Principles are obtained via the study of general kinds of loci, some of them being zero loci. More recently, these loci are playing a role in the uniform study of p-adic wave front sets for (uniformly definable) p-adic distributions, a tool often used in real analysis.

This talk contains various joint works with Gordon, Hales, Halupczok, Loeser, and Raibaut, and may mention some work in progress with Aizenbud about WF-holonomicity of these distributions, in relation to a question raized by Aizenbud and Drinfeld.

Covariance matrix estimation is essential in many areas of modern Statistics and Machine Learning including Graphical Models, Classification/Discriminant Analysis, Principal Component Analysis, and many others. Classical statistics suggests using Sample Covariance Matrix (SCM) which is a Maximum Likelihood Estimator (MLE) in the Gaussian populations. Real world data, however, usually exhibits heavy-tailed behavior and/or contains outliers, making the SCM non-efficient or even useless. This problem and many similar ones gave rise to the Robust Statistics field in early 60s, where the main goal was to develop estimators stable under reasonable deviations from the basic Gaussian assumptions. One of the most prominent robust covariance matrix estimators was introduced and thoroughly studied by D. Tyler in the mid-80s. This important representative of the family of M-estimators can be defined as an MLE of a certain population. The problem of robust covariance estimation becomes even more involved in the high-dimensional scenario, where the number of samples n is of the order of the dimension p, or even less. In such cases, prior knowledge, often referred to as structure, is utilized to decrease the number of degrees of freedom and make the estimation possible. Unlike the Gaussian setting, in Tyler's case even imposition of linear structure becomes challenging due to the non-convexity of the negative log-likelihood. Recently, Tyler's target function was shown to become convex under a certain change of metric (geodesic convexity), which stimulated further investigation of the estimator.

In this work, we focus on the so-called group symmetry structure, which essentially means that the true covariance matrix commutes with a group of unitary matrices. In engineering applications such structures appear due to the natural symmetries of the physical processes; examples include circulant, perHermitian, proper quaternion matrices, etc. Group symmetric constraints are linear, and thus convex in the regular Euclidean metric. We show that they are also convex in the geodesic metric. These properties allow us to develop symmetric versions of the SCM and Tyler's estimator and build a general framework for their performance analysis. The classical results claim that at least n = p and n = p+1 samples in general position are necessary to ensure the existence and uniqueness of the SCM and Tyler's estimator, respectively. We significantly improve the sample complexity requirements for both estimators under the symmetry structure and show that in some cases even 1 or 2 samples are enough to guarantee the existence and uniqueness regardless of the ambient dimension.

In this paper we study the adaptive complexity of submodular optimization. Informally, the adaptive complexity of a problem is the minimal number of sequential rounds required to achieve a constant factor approximation when polynomially-many queries can be executed in parallel at each round. Adaptivity is a fundamental concept that is heavily studied in computer science, largely due to the need for parallelizing computation. Somewhat surprisingly, very little is known about adaptivity in submodular optimization. For the canonical problem of maximizing a monotone submodular function under a cardinality constraint, to the best of our knowledge, all that is known to date is that the adaptive complexity is between 1 and Ω(n).Our main result in this paper is a tight characterization showing that the adaptive complexity of maximizing a monotone submodular function under a cardinality constraint is, up to lower order terms, θ(log n):We describe an algorithm which requires O(log n) sequential rounds and achieves an approximation that is arbitrarily close to 1/3; We show that no algorithm can achieve an approximation better than O(1 / log n) with fewer than O(log n / log log n) rounds. Thus, when allowing for parallelization, our algorithm achieves a constant factor approximation exponentially faster than any known existing algorithm for submodular maximization. Importantly, the approximation algorithm is achieved via adaptive sampling and complements a recent line of work on optimization of functions learned from data. In many cases, we do not know the functions we optimize and learn them from labeled samples. Recent results show that no algorithm can obtain a constant factor approximation guarantee using polynomially-many labeled samples as in the PAC and PMAC models, drawn from any distribution. Since learning with non-adaptive samples over any distribution results in a sharp impossibility, we consider learning with adaptive samples where the learner obtains poly(n) samples drawn from a distribution of her choice in every round. Our result implies that in the realizable case, where there is a true underlying function generating the data, θ(log n) batches, up to lower order terms, of adaptive samples are necessary and sufficient to approximately "learn to optimize" a monotone submodular function under a cardinality constraint. This is joint work with Yaron Singer.

Given an unknown D-dimensional quantum state rho, as well as M two-outcome measurements E_1,...,E_M, how many copies of rho do we need, if we want to learn the approximate probability that E_i accepts rho for *every* i? In this talk, I'll prove the surprising result --I didn't believe it myself at first -- that one can achieve this using a number of copies that's polylogarithmic in both M and D. So, e.g., one can learn whether *every* size-n^3 quantum circuit accepts or rejects an n-qubit state, given only poly(n) copies of the state. To prove this will require first surveying previous results on measuring quantum states and succinctly describing them, including my 2004 postselected learning theorem, and my 2006 "Quantum OR Bound" (with an erroneous proof fixed in 2016 by Harrow, Lin, and Montanaro).

As time permits, I'll also discuss new joint work with Xinyi Chen, Elad Hazan, and Ashwin Nayak, which takes my 2006 result on PAC-learnability of quantum states, and extends to the setting of online learning. Here we show that, given a sequence of T two-outcome measurements on an n-qubit state, even if the sequence is chosen adversarially, one can still learn to predict the outcomes of those measurements with total regret O(n) (in the "realizable" case) or O(sqrt(Tn)) (in the "non-realizable" case).

No quantum computing background will be assumed.

Calabi conjectured that the complex Monge-Ampere equation on compact Kaehler manifolds has a unique solution.

This was solved by Yau in 1978. In this talk, we present a non-archimedean version on projective Berkovich spaces.

In joint work with Burgos, Jell, Kunnemann and Martin, we improve a result of Boucksom, Favre and Jonsson in the equicharacteristic 0 case. We give also a result in positive equicharacteristic using test ideals.

We study the non-vanishing gradient-like vector fields $v$ on smooth compact manifolds $X$ with boundary. We call such fields traversing. With the help of a boundary generic field $v$, we divide the boundary $\d X$ of $X$ into two complementary compact manifolds, $\d^+X(v)$ and $\d^-X(v)$. Then we introduce the causality map $C_v: \d^+X(v) \to \d^-X(v)$, a distant relative of the Poincare return map. Let $\mathcal F(v)$ denote the oriented 1-dimensional foliation on $X$, produced by a traversing $v$-flow.

Our main result, the Holography Theorem, claims that, for boundary generic traversing vector fields $v$, the knowledge of the causality map $C_v$ is allows for a reconstruction of the pair $(X, \mathcal F(v))$, up to a homeomorphism $\Phi: X \to X$ which is the identity on the boundary $\d X$. In other words, for a massive class of ODE's, we show that the topology of their solutions, satisfying a given boundary value problem, is rigid. We call these results ``holographic" since the $(n+1)$-dimensional $X$ and the un-parameterized dynamics of the flow on it are captured by a single correspondence $C_v$ between two $n$-dimensional screens, $\d^+X(v)$ and $\d^-X(v)$.

This holography of traversing flows has numerous applications to the dynamics of general flows. Time permitting, we will discuss some applications of the Holography Theorem to the geodesic flows and the inverse scattering problems on Riemannian manifolds with boundary.

Let Rep(GL(m|n)) denote the category of finite-dimensional algebraic representations of the supergroup Gl(m|n). Nowadays the abelian structure (Ext^1 between irreducibles, block description,...) is well understood. Kazhdan-Lusztig theory gives an algorithmic solution for the character problem, and in special cases even explicit character formulas. However we understand the monoidal structure hardly at all (e.g. the decomposition of tensor products into the indecomposable constituents). I will talk about the problem of decomposing tensor products "up to superdimension 0", i.e. about the structure of Rep(GL(m|n))/N where N is the ideal of indecomposable representations of superdimension 0.

When dealing with a highly variable problem such as action recognition, focusing on a small area, such as the hand's region, makes the problem more manageable, and enables us to invest relatively high amount of resources needed for interpretation in a small but highly informative area of the image. In order to detect this region of interest in the image and properly analyze it, I have built a process that includes several steps, starting with a state of the art hand detector, incorporating both detection of the hand by appearance and by estimation of human body pose. The hand detector is built upon a fully convolutional neural network, detecting hands efficiently and accurately. The human body pose estimation starts with a state of the art head detector and continues with a novel approach where each location in the image votes for the position of each body keypoint, utilizing information from the whole image. Using dense, multi-target votes enables us to compute image-dependent joint keypoint probabilities by looking at consensus voting, and accurately estimates the body pose. Once the detection of the hands is complete, an additional step of segmentation of the hand and fingers is made. In this step each hand pixel in the image is labeled using a dense fully convolutional network. Finally, an additional step is made to segment and identify the held object. Understanding the hand-object interaction is an important step toward understanding the action taking place in the image. These steps enable us to perform fine interpretation of hand-object interaction images as an essential step towards understanding the human-object interaction and recognizing human activities.

Recent progress in imaging technologies leads to a continuous growth in biomedical data, which can provide better insight into important clinical and biological questions. Advanced machine learning techniques, such as artificial neural networks are brought to bear on addressing fundamental medical image computing challenges such as segmentation, classification and reconstruction, required for meaningful analysis of the data. Nevertheless, the main bottleneck, which is the lack of annotated examples or 'ground truth' to be used for training, still remains.

In my talk, I will give a brief overview on some biomedical image analysis problems we aim to address, and suggest how prior information about the problem at hand can be utilized to compensate for insufficient or even the absence of ground-truth data. I will then present a framework based on deep neural networks for the denoising of Dynamic contrast-enhanced MRI (DCE-MRI) sequences of the brain. DCE-MRI is an imaging protocol where MRI scans are acquired repetitively throughout the injection of a contrast agent, that is mainly used for quantitative assessment of blood-brain barrier (BBB) permeability. BBB dysfunctionality is associated with numerous brain pathologies including stroke, tumor, traumatic brain injury, epilepsy. Existing techniques for DCE-MRI analysis are error-prone as the dynamic scans are subject to non-white, spatially-dependent and anisotropic noise. To address DCE-MRI denoising challenges we use an ensemble of expert DNNs constructed as deep autoencoders, where each is trained on a specific subset of the input space to accommodate different noise characteristics and dynamic patterns. Since clean DCE-MRI sequences (ground truth) for training are not available, we present a sampling scheme, for generating realistic training sets with nonlinear dynamics that faithfully model clean DCE-MRI data and accounts for spatial similarities. The proposed approach has been successfully applied to full and even temporally down-sampled DCE-MRI sequences, from two different databases, of stroke and brain tumor patients, and is shown to favorably compare to state-of-the-art denoising methods.

We consider the problem of hidden common manifold extraction from multiple data sets, which have observation-specific distortions and artifacts. A new manifold learning method is presented based on alternating products of diffusion operators and local kernels. We provide theoretical analysis showing that our method is able to build a variant of the Laplacian of the hidden common manifold, while suppressing the observation-specific artifacts. The generality of this method is demonstrated in data analysis applications, where different types of devices are used to measure the same activity. In particular, we present applications to problems in biomedicine, neuroscience, and audio analysis.

This is joint work with Roy Lederman and Hau-tieng Wu.

We prove the first super-logarithmic lower bounds on the cell probe complexity of dynamic *boolean* (a.k.a. decision) data structure problems, a long-standing milestone in data structure lower bounds. We introduce a new technique and use it to prove a ~ log^{1.5}(n) lower bound on the operational time of a wide range of boolean data structure problems, most notably, on the query time of dynamic range counting *over F_2* ([Patrascu07]). Proving a super-logarithmic lower bound for this problem was explicitly posed as one of five important open problems in the late Mihai Patrascu's obituary [Tho13]. This result also implies the first super-logarithmic lower bound for the classical 2D range counting problem,one of the most fundamental data structure problems in computational geometry and spatial databases. We derive similar lower bounds for boolean versions of dynamic polynomial evaluation and 2D "rectangle stabbing", and for the (non-boolean) problems of range selection and range median. Our technical centerpiece is a new way of "weakly" simulating dynamic data structures using efficient *one-way* communication protocols with small advantage over random guessing. This simulation involves a surprising excursion to low-degree (Chebychev) polynomials which may be of independent interest, and offers an entirely new algorithmic angle on the "cell sampling" method of Panigrahy et al. [PTW10].

Joint work with Kasper Green-Larsen and Huacheng Yu.

We study the classic bipartite matching problem in the online setting, first introduced in the seminal work of Karp, Vazirani and Vazirani. Specifically, we consider the problem for the well-studied class of regular graphs. Matching in this class of graphs was studied extensively in the offline setting. In the online setting, an optimal deterministic algorithm, as well as efficient algorithms under stochastic input assumptions were known. In this work, we present a novel randomized algorithm with competitive ratio tending to one on this family of graphs, under adversarial arrival order. Our main contribution is a novel algorithm which achieves competitive ratio 1-O(\sqrt{\log d}/\sqrt{d}) in expectation on d-regular graphs. In contrast, we show that all previously-known online algorithms, such as the generally worst-case optimal ranking algorithm of Karp et al., are restricted to a competitive ratio strictly bounded away from one, even as d grows. Moreover, we show the convergence rate of our algorithm's competitive ratio to one is nearly tight, as no algorithm achieves competitive ratio better than 1-O(1/\sqrt{d}). Finally, we show that our algorithm yields a similar competitive ratio with high probability, as well as guaranteeing each offline vertex a probability of being matched tending to one.

The recent success of convolutional neural networks (CNNs) for image processing tasks is inspiring research efforts attempting to achieve similar success for geometric tasks. One of the main challenges in applying CNNs to surfaces is defining a natural convolution operator on surfaces. In this paper we present a method for applying deep learning to sphere-type shapes using a global seamless parameterization to a planar flat-torus, for which the convolution operator is well defined. As a result, the standard deep learning framework can be readily applied for learning semantic, high-level properties of the shape. An indication of our success in bridging the gap between images and surfaces is the fact that our algorithm succeeds in learning semantic information from an input of raw low-dimensional feature vectors.

We demonstrate the usefulness of our approach by presenting two applications: human body segmentation, and automatic landmark detection on anatomical surfaces. We show that our algorithm compares favorably with competing geometric deep-learning algorithms for segmentation tasks, and is able to produce meaningful correspondences on anatomical surfaces where hand-crafted features are bound to fail.

Joint work with: Meirav Galun, Noam Aigerman, Miri Trope, Nadav Dym, Ersin Yumer, Vladimir G. Kim and Yaron Lipman.

The past five years have seen a dramatic increase in the performance of recognition systems due to the introduction of deep architectures for feature learning and classification. However, the mathematical reasons for this success remain elusive. In this talk we will briefly survey some existing theory of deep learning. In particular, we will focus on data structure based theory and discuss two recent developments.

The first work studies the generalization error of deep neural network. We will show how the generalization error of deep networks can be bounded via their classification margin. We will also discuss the implications of our results for the regularization of the networks. For example, the popular weight decay regularization guarantees the margin preservation, but it leads to a loose bound to the classification margin. We show that a better regularization strategy can be obtained by directly controlling the properties of the network's Jacobian matrix.

The second work focuses on solving minimization problems with neural networks. Relying on recent recovery techniques developed for settings in which the desired signal belongs to some low-dimensional set, we show that using a coarse estimate of this set leads to faster convergence of certain iterative algorithms with an error related to the accuracy of the set approximation. Our theory ties to recent advances in sparse recovery, compressed sensing and deep learning. In particular, it provides an explanation for the successful approximation of the ISTA (iterative shrinkage and thresholding algorithm) solution by neural networks with layers representing iterations.

Joint work with Guillermo Sapiro, Miguel Rodrigues, Jure Sokolic, Alex Bronstein and Yonina Eldar.

Computational social choice deals with algorithms for aggregating individual preferences or opinions towards collective decisions. AI researchers (including myself) have long argued that such algorithms could play a crucial role in the design and implementation of multiagent systems. However, in the last few years I have come to realize that the "killer app" of computational social choice is helping people -- not software agents -- make joint decisions. I will illustrate this theme through two recent endeavors: Spliddit.org, a website that offers provably fair solutions to everyday problems; and Robovote.org, which provides optimization-driven voting methods.

Throughout the talk, I will devote special attention to the theoretical foundations and results that make these services possible.

The theory of hypergeometric functions with matrix argument was developed in the 1950s by S. Bochener for Hermitian matrices, and by C. Herz for symmetric matrices. This theory admits a common generalization to the setting of symmetric cones, which is discussed in the book by Faraut-Koranyi. It also has applications to the study of non-central distributions in statistics and to the theory of random matrices.

In the 1980s, I.G. Macdonald introduced a one parameter family of multivariate hypergeometric functions, which, for special values of the parameter, are the *radial* parts of the matrix hypergeometric functions. He also formulated a number of natural conjectures about these functions, which in the matrix case can be proved by appropriate integral formulas. However this technique is unavailable in the general setting and as a result these conjectures have remained open.

In recent work with G. Olafsson we have solved most of these conjectures, using ideas from the theory of Cherednik algebras and Jack polynomials. Among other results we obtain sharp estimates for the exponential kernel that allow us to establish a rigorous theory of the Fourier and Laplace transforms, and we prove an explicit formula for the Laplace transform of a Jack polynomial conjectured by Macdonald. This opens the door for several future developments in the associated harmonic analysis, some of which we also treat. This includes (1) the Paley-Wiener theorem, (2) Laplace transform identities for hypergeometric functions, and (3) the "so-called" Ramanujan master theorem.

**Note the unusual room [De Picciotto ****Building, Room 25]**

We show that there exist binary locally testable codes (for all rates) and locally correctable codes (for low rates) with rate and distance approaching the Gilbert-Varshamov bound (which is the best rate-distance tradeoff known for general binary error-correcting codes). Our constructions use a number of ingredients: Thommesen's random concatenation technique, the Guruswami-Sudan-Indyk strategy for list-decoding concatenated codes, the Alon-Edmonds-Luby distance amplification method, and the local list-decodability and local testability of Reed-Muller codes. Interestingly, this seems to be the first time that local testability is used in the construction of locally correctable codes.

Joint work with Sivakanth Gopi, Rafael Oliveira, Noga Ron-Zewi and Shubhangi Saraf

This is a joint work with Bezrukavnikov and Kazhdan. The goal of my talk is to give an explicit formula for the Bernstein projector to representations of depth $\leq r$. As a consequence, we show that the depth zero Bernstein projector is supported on topologically unipotent elements and it is equal to the restriction of the character of the Steinberg representation. As another application, we deduce that the depth $r$ Bernstein projector is stable. Moreover, for integral depths our proof is purely local.

Image denoising is the most fundamental problem in image enhancement, and it is largely solved: It has reached impressive heights in performance and quality -- almost as good as it can ever get. But interestingly, it turns out that we can solve many other problems using the image denoising "engine". I will describe the Regularization by Denoising (RED) framework: using the denoising engine in defining the regularization of any inverse problem. The idea is to define an explicit image-adaptive regularization functional directly using a high performance denoiser. Surprisingly, the resulting regularizer is guaranteed to be convex, and the overall objective functional is explicit, clear and well-defined. With complete flexibility to choose the iterative optimization procedure for minimizing this functional, RED is capable of incorporating any image denoising algorithm as a regularizer, treat general inverse problems very effectively, and is guaranteed to converge to the globally optimal result.

* Joint work with Peyman Milanfar (Google Research) and Yaniv Romano (EE-Technion).

We propose a procedure (the first of its kind) for computing a fully data-dependent interval that traps the mixing time t_mix of a finite reversible ergodic Markov chain at a prescribed confidence level. The interval is computed from a single finite-length sample path from the Markov chain, and does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge.

The interval is constructed around the relaxation time t_relax, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a sqrt{n} rate, where n is the length of the sample path. Upper and lower bounds are given on the number of samples required to achieve constant-factor multiplicative accuracy. The lower bounds indicate that, unless further restrictions are placed on the chain, no procedure can achieve this accuracy level before seeing each state at least \Omega(t_relax) times on the average. Future directions of research are identified. Time permitting, we will mention some recent further developments by D. Levin and Y. Peres.

Joint work with Daniel Hsu and Csaba Szepesvari.

Graphs that are prevalent in current applications (the Internet, Facebook etc.) are not only very large and highly dynamic, but also distributed between many servers, none of which sees the graph in its entirety. The distributed monitoring problem deals with the question of imposing conditions on the local graphs, such that as long as they hold, it is guaranteed that some desired property holds for the global graph.

While defining local conditions for linear properties (e.g. average degree) is relatively easy, they are more difficult to derive for non-linear functions over the graph. We propose a solution and a general definition of solution optimality, and demonstrate how to apply it to two important graph properties -- spectral gap and number of triangles. We also define an absolute lower bound on the communication overhead for distributed monitoring, and compare our algorithm to it, with good results. Performance improves as the graph becomes larger and denser -- that is, when distributing it is more important.

I will talk about my recent adventures with ants. Together with biologists we study P. longicornis ants as they collaboratively transport a large food item to their nest. This collective navigation process is guided by pheromones which are laid by individual ants. Using a new methodology to detect scent marks, we identify a new kind of ant trail characterized by very short and dynamic pheromone markings and highly stochastic navigation response to them. We argue that such a trail can be highly beneficial in conditions in which knowledge of individual ants regarding the underlying topological structure is unreliable. This gives rise to a new theoretical search model on graphs under unreliable guiding instructions, which is of independent computational interest. To illustrate the model, imagine driving a car in an unknown country that is in the aftermath of a major hurricane which has randomly flipped a certain small fraction of the road-signs. Under such conditions of unreliability, how can you still reach your destination fast? I will discuss the limits of unreliability that allow for efficient navigation. In trees, for example, there is a phase transition phenomenon that occurs roughly around the inverse of the square root of the maximal degree. That is, if noise is above this threshold then any algorithm cannot avoid finding the target in exponential time (in the original distance), while below the threshold we identify an optimal, almost linear, walking algorithm. Finally, I will discuss algorithms that under such a noisy model aim to minimize the number of queries to find a target (rather than the number of moves).

This talk is based on joint works with biologists from the Weizmann Institute: Ofer Feinerman, Udi Fonio, and others, and with CS researchers: Lucas Bockowski, Adrian Kosowski, and Yoav Rodeh.

By analytical and numerical studies of Deep Neural Networks (using standard TensorFlow) in the "Information Plane" - the Mutual Information the network layers preserve on the input and the output variables - we obtain the following new insights.

- The training epochs, for each layer, are divided into two phases: (1) fitting the training data - increasing the mutual information on the labels; (2) compressing the representation - reducing the mutual information on the inputs. The layers are learnt hierarchically, from the bottom to the top layer, with some overlaps.
- Most (~80%) of the training time - optimization with SGD - is spent on compressing the representation (the second phase) - NOT on fitting the training data labels, even when the training has no regularization or terms that directly aim at such compression.
- The convergence point, FOR EVERY HIDDEN LAYER, lies on or very close to the Information Bottleneck IB) theoretical bound. Thus, the mappings from the input to the hidden layer and from the hidden layer to the output obey the IB self-consistent equations for some value of the compression-prediction tradeoff.
- The main benefit of adding more hidden layers is in the optimization/training time, as the compression phase for each layer amounts to relaxation to a Maximum conditional Entropy state, subject to the proper constraints on the error/information on the labels. As such relaxation takes super-linear time in the compressed entropy, adding more hidden layers dramatically reduces the training time. There is also benefit in sample complexity to adding hidden layers, but this is a smaller effect.

I will explain these new observations and the benefits of exploring Deep Learning in the "Information Plane", and discuss some of the exciting theoretical and practical consequences of our analysis.

Joint work with Ravid Ziv and Noga Zaslavsky.

We present a novel approach to template matching that is efficient, can handle partial occlusions, and is equipped with provable performance guarantees. A key component of the method is a reduction that transforms the problem of searching a nearest neighbor among N high-dimensional vectors, to searching neighbors among two sets of order sqrt(N) vectors, which can be done efficiently using range search techniques. This allows for a quadratic improvement in search complexity, that makes the method scalable when large search spaces are involved.

For handling partial occlusions, we develop a hashing scheme based on consensus set maximization within the range search component. The resulting scheme can be seen as a randomized hypothesize-and-test algorithm, that comes with guarantees regarding the number of iterations required for obtaining an optimal solution with high probability.

The predicted matching rates are validated empirically and the proposed algorithm shows a significant improvement over the state-of-the-art in both speed and robustness to occlusions.

Joint work with Stefano Soatto.

While there has been a lot of progress in designing efficient custom protocols for computing Private Set Intersection (PSI), there has been less research on using generic MPC protocols for this task. However, there are many variants of the set intersection functionality which seem hard to compute with existing custom protocols and are easy to compute with generic MPC based solutions (for example comparing the cardinality of the intersection with a threshold or measuring ad conversion rates). Generic protocols work over circuits which compute the intersection. For sets of size n the best known circuit constructions compute O(n log n) comparisons. In this work we propose new circuit-based protocols for computing variants of the intersection, with circuits computing only O(n) comparisons. Our constructions are based on a new variant of Cuckoo hashing in two dimensions. We employ several optimizations and determine experimentally the required sizes of tables and circuits, and measure the runtime, showing that our protocol is more efficient in concrete terms than existing constructions. The proof technique is new and can be generalized to analyzing simple Cuckoo hashing as well as new variants.

We study the ecological use of analogies in AI. Specifically, we address the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given representation function f, which accepts inputs in either domains, would remain unchanged. Other than f, the training data is unsupervised and consist of a set of samples from each domain, without any mapping between them. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f preserving component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity.

Joint work with Yaniv Taigman and Adam Polyak

Joint work with Srikanth Srinivasan.

A determinantal point process governed by a locally trace class Hermitian contraction kernel on a measure space $E$ remains determinantal when conditioned on its configuration on an arbitrary measurable subset $B \subset E$. Moreover, the conditional kernel can be chosen canonically in a way that is "local" in a non-commutative sense, i.e. invariant under "restriction" to closed subspaces $L^2(B) \subset P \subset L^2(E)$.

Using the properties of the canonical conditional kernel we establish a conjecture of Lyons and Peres: if $K$ is a projection then almost surely all functions in its image can be recovered by sampling at the points of the process.

I will discuss the asymptotic behaviour (both on and off the diagonal) of the spectral function of a Schroedinger operator with smooth bounded potential when energy becomes large. I formulate the conjecture that the local density of states (i.e. the spectral function on the diagonal) admits the complete asymptotic expansion and discuss the known results, mostly for almost-periodic potentials.

Let X be a regular scheme, projective and flat over Spec Z. We give a conjectural formula in terms of motivic cohomology, singular cohomology and de Rham cohomology for the special value of the zeta-function of X at any rational integer. We will explain how this reduces to the standard formula for the residue of the Dedekind zeta-function at s = 1.

A classical theorem by Anosov states that the slow motion of a slow-fast system where the fast subsystem is ergodic with respect to a smooth invariant measure can be approximated, in a well-defined sense, by the slow subsystem averaged over the fast variables. We address the question of what happens if the fast system is not ergodic. We discuss a theory which is developing in joint works with V. Gelfreich, T. Pereira, V. Rom-Kedar and K. Shah, and suggest that in the non-ergodic case the behavior of the slow variables is approximated by a random process, and not a single, deterministic averaged system. We also discuss the question of the relevance of ergodicity to the foundations of statistical mechanics.

Image processing algorithms often involve a data fidelity penalty, which encourages the solution to comply with the input data. Existing fidelity measures (including perceptual ones) are very sensitive to slight misalignments in the locations and shapes of objects. This is in sharp contrast to the human visual system, which is typically indifferent to such variations. In this work, we propose a new error measure, which is insensitive to small smooth deformations and is very simple to incorporate into existing algorithms. We demonstrate our approach in lossy image compression. As we show, optimal encoding under our criterion boils down to determining how to best deform the input image so as to make it "more compressible". Surprisingly, it turns out that very minor deformations (almost imperceptible in some cases) suffice to make a huge visual difference in methods like JPEG and JPEG2000. Thus, by slightly sacrificing geometric integrity, we gain a significant improvement in preservation of visual information.

We also show how our approach can be used to visualize image priors. This is done by determining how images should be deformed so as to best conform to any given image model. By doing so, we highlight the elementary geometric structures to which the prior resonates. Using this method, we reveal interesting behaviors of popular priors, which were not noticed in the past.

Finally, we illustrate how deforming images to possess desired properties can be used for image "idealization" and for detecting deviations from perfect regularity.

Joint work with Tamar Rott Shaham, Tali Dekel, Michal Irani, and Bill Freeman.

Given a quadratic form Q in d variables over the integers, e.g. Q(x,y,z) = xy - z^2, and a set of positive density E in Z^d, we investigate what kind of structure can be found in the set Q(E-E).

We will see that if d >= 3, and Q is indefinite, then the measure rigidity, due to Bourgain-Furman-Lindenstrauss-Mozes or Benoist-Quint, of the action of the group of the symmetries of Q implies that there exists k >=1 such that k^2*Q(Z^d) is a subset of Q(E-E).

We will give an alternative proof of the theorem for the case Q(x,y,z) = xy - z^2 that uses more classical equidistribution results of Vinogradov, and Weyl, as well as a more recent result by Frantzikinakis-Kra. The latter proof extends the theorem to other polynomials having a much smaller group of symmetries. Based on joint works with M. Bjorklund (Chalmers), and K. Bulinski (Sydney).

The algorithm referred to in the title builds on Luks's powerful group-theoretic divide-and-conquer method (1980) and addresses the bottleneck situation where Luks's method fails to "divide".

Luks will continue to "conquer" if an alternative method "divides"; we develop such a partitioning technique.

In the talk we shall outline the algorithm and explain in some detail its group theoretic core, the "Unaffected Stabilizers Lemma" and the "Local Certificates" routine. The Lemma is used to construct, somewhat implausibly, global automorphisms out of local information -- a key step toward the construction of combinatorial structures to which the partitioning method from the previous day's lecture will be applied, providing the required "divide" step.

Symmetry is defined in terms of structure-preserving transformations (automorphisms); regularity in terms of numerical invariants. Symmetry always implies regularity but there are many highly regular combinatorial objects (such as "strongly regular graphs") with no symmetry. The opposite of irregularity is regularity, not symmetry. Yet we show that in a well-defined sense, *the opposite of hidden irregularity is hidden symmetry*, and in fact hidden symmetry of a particularly robust kind.

The symmetry of a circle is easily destroyed: just "individualize" two non-opposite points -- color one of them red, the other blue -- and all the symmetry is gone. In fact, the resulting structure is completely irregular: every point is uniquely identified by a pair of numerical invariants, namely, its pair of distances to the two individualized points. We shall say that the circle has a high degree of hidden irregularity.

In contrast, *Johnson graphs* are objects with robust symmetry: individualizing a small number of vertices of a Johnson graph hardly makes a dent in its symmetry.

Recent work on the algorithmic problem of Graph Isomorphism has revealed that Johnson graphs are unique in this regard: Every finite relational structure of small arity either has a measurable (say 10%) hidden irregularity (revealed by individualizing a polylogarithmic number of elements) or has a large degree of hidden symmetry, manifested in a canonically embedded Johnson graph on more than 90% of the underlying set.

This dichotomy is the key Divide-and-Conquer tool in recent progress on the worst-case complexity of the Graph Isomorphism problem.

This subject is purely combinatorial and does not require advanced mathematical apparatus. The group theoretic aspects of the new Graph Isomorphism test will be discussed in a follow-up seminar on January 30.

Within the wide field of sparse approximation, convolutional sparse coding (CSC) has gained increasing attention in recent years. This model assumes a structured-dictionary built as a union of banded Circulant matrices. Most attention has been devoted to the practical side of CSC, proposing efficient algorithms for the pursuit problem, and identifying applications that benefit from this model. Interestingly, a systematic theoretical understanding of CSC seems to have been left aside, with the assumption that the existing classical results are sufficient.

In this talk we start by presenting a novel analysis of the CSC model and its associated pursuit. Our study is based on the observation that while being global, this model can be characterized and analyzed locally. We show that uniqueness of the representation, its stability with respect to noise, and successful greedy or convex recovery are all guaranteed assuming that the underlying representation is locally sparse. These new results are much stronger and informative, compared to those obtained by deploying the classical sparse theory.

Armed with these new insights, we proceed by proposing a multi-layer extension of this model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers. This, in turn, is shown to be tightly connected to Convolutional Neural Networks (CNN), so much so that the forward-pass of the CNN is in fact the Thresholding pursuit serving the ML-CSC model. This connection brings a fresh view to CNN, as we are able to attribute to this architecture theoretical claims such as uniqueness of the representations throughout the network, and their stable estimation, all guaranteed under simple local sparsity conditions. Lastly, identifying the weaknesses in the above scheme, we propose an alternative to the forward-pass algorithm, which is both tightly connected to deconvolutional and recurrent neural networks, and has better theoretical guarantees.

Many problems about finite groups (e.g., convergence of random walks, properties of word maps, spectrum of Cayley graphs, etc.) can be approached in terms of sums of group characters. More precisely, what intervenes in such sums are the character ratios:

X_r(g) / dim(r), g in G,

where r is an irreducible representation of G, and X_r is its character. This leads to the quest for good estimates on the character ratios.

In this talk I will introduce a precise notion of "**size**" for representations of finite classical groups and show that it tends to put together those with character ratios of the same order of magnitude.

As an application I will show how one might generalize to classical groups the following result of Diaconis-Shahshahani (for k=2) and Berestycki -Schramm -Zeitouni (for general k): The mixing time for the random walk on the group G=S_n using the cycles of length k is (1/k) n log(n).

The talk should be accessible for beginning graduate students, and is part from our joint project with **Roger Howe (Yale and Texas A&M).**

Detecting genetic adaptation in recent human history is a major challenge of population genetics. The fundamental problem is to infer historical changes in the frequency of genetic variants (alleles), from data of contemporary genomes. With this we can identify unusual changes that are unlikely to have occurred in the absence of selective pressures. However, a generally applicable method to infer recent allele frequency changes is lacking. Instead, present methods can only detect frequency changes under very restrictive assumptions on the model of selection. Moreover, their time resolution is generally limited to prehistoric scales, on the order of the past 25-75 thousand years. To address these gaps we developed a novel statistical method, Singleton Density Score (SDS), that infers the recent changes in allele frequencies from local variation in contemporary genome sequences with specificity to historical timescales. Applied to data of ~3000 genomes from the UK10K project, SDS reveals that human genetic adaptation continued well into historical times. Over the past ~2000-3000 years, ancestors of modern Britons genetically evolved over a range of phenotypes related to diet, immunity, and physical appearance. Notably, we found that polygenic adaptation, whereby selection acting on many small-effect variants across the genome that together determine a single trait, has played a pervasive, previously undetected role in shaping present genotypic and phenotypic variation.

Reference:

Field et al, Science 2016, Detection of human adaptation during the past 2000 years. https://www.ncbi.nlm.nih.gov/pubmed/27738015

I will describe two branches of my work related to algorithms for distributed networks. The main focus will be devoted for Fault-Tolerant (FT) Network Structures.

The undisrupted operation of structures and services is a crucial requirement in modern day communication networks. As the vertices and edges of the network may occasionally fail or malfunction, it is desirable to make those structures robust against failures.

FT Network Structures are low cost highly resilient structures, constructed on top of a given network, that satisfy certain desirable performance requirements concerning, e.g., connectivity, distance or capacity. We will overview some results on fault tolerant graph structures with a special focus on FT Breadth-First-Search.

The second part of the talk will discuss distributed models and algorithms for large-scale networks. Towards the end, we will see some connections between distributed computing and other areas such as EE and Biology.

Let M be a smooth, compact, connected two-dimensional, Riemannian manifold without boundary, and let C_epsilon be the amount of time needed for the Brownian motion to come within (Riemannian) distance epsilon of all points in M. The first order asymptotics of C_epsilon as epsilon goes to 0 are known. We show that for the two dimensional sphere

\sqrt{C_epsilon}-2\sqrt{2}\( \log \epsilon^{-1}- \frac{1}{4}\log\log \epsilon^{-1}\) is tight.

Joint work with David Belius and Ofer Zeitouni.

**First Speaker: Ran Tessler (ETH)**

Time: 11:00

Title: A sharp threshold for Hamiltonian spheres in a random 2-complex.

Abstract: We define the notion of Hamiltonian sphere - a 2-complex homeomorphic to a sphere which uses all vertices. We prove an explicit sharp threshold for the appearance of Hamiltonian spheres in the Linial-Meshulam model for random 2-complexes. The proof combines combinatorial, probabilistic and geometric arguments. Based on a joint work with Zur Luria.

**Second Speaker: Assaf Naor (Princeton)**

Time: 12:00

Title: A new vertical-versus-horizontal isoperimetric inequality on the Heisenberg group, with applications to metric geometry and approximation algorithms

Abstract: In this talk we will show that for every measurable subset of the Heisenberg group of dimension at least 5, an appropriately defined notion of its "vertical perimeter" is at most a constant multiple of its horizontal (Heisenberg) perimeter. We will explain how this new isoperimetric-type inequality solves open questions in analysis (an endpoint estimate for a certain singular integral on W^{1,1}), metric geometry (sharp nonembeddability into L_1) and approximation algorithms (asymptotic evaluation of the performance of the Goemans-Linial algorithm for the Sparsest Cut problem). Joint work with Robert Young.

Let X be a set definable in some o-minimal structure. The Pila-Wilkie theorem (in its basic form) states that the number of rational points in the transcendental part of X grows sub-polynomially with the height of the points. The Wilkie conjecture stipulates that for sets definable in $R_\exp$, one can sharpen this asymptotic to polylogarithmic.

I will describe a complex-analytic approach to the proof of the Pila-Wilkie theorem for subanalytic sets. I will then discuss how this approach leads to a proof of the "restricted Wilkie conjecture", where we replace $R_\exp$ by the structure generated by the restrictions of $\exp$ and $\sin$ to the unit interval (both parts are joint work with Dmitry Novikov). If time permits I will discuss possible generalizations and applications.

Pseudo-deterministic algorithms are randomized search algorithms that on different executions on the same input, output the same solution with high probability.

We will discuss how pseudo-deterministic algorithms bridge the gap between randomized search and decision problems for problems in P and in NC. Next, we will show a pseudo-deterministic NC algorithm for bipartite matching. Finally, we will show how pseudo-determinism can be used to save on random bits used by classical randomized algorithms, and apply the method to obtain an algorithm for RNC depth first search using only O(log^2 n) random bits. This is joint work with Shafi Goldwasser.

Consider the Gaussian Entire Function (GEF) whose Taylor coefficients are independent complex-valued Gaussian variables, and the variance of the k-th coefficient is 1/k!. This random Taylor series is distinguished by the invariance of its zero set with respect to the isometries of the complex plane.

I will show that the law of the zero set, conditioned on the GEF having no zeros in a disk of radius r, and properly normalized, converges to an explicit limiting Radon measure in the plane, as r goes to infinity. A remarkable feature of this limiting measure is the existence of a large 'forbidden region' between a singular part supported on the boundary of the (scaled) hole and the equilibrium measure far from the hole. This answers a question posed by Nazarov and Sodin, and is in stark contrast to the corresponding result known to hold in the random matrix setting, where such a gap does not appear.

The talk is based on a joint work with S. Ghosh.

The Operator Scaling problem asks whether a set of complex matrices can be jointly moved to a certain canonical (isotropic) position. This problem has a remarkable number of myriad incarnations: non-commutative algebra, invariant theory, arithmetic complexity, quantum information theory, analytic inequalities and more. We will describe an efficient algorithm solving all these related problems, and explain how their analysis combines ideas from all these areas.

Through these connections, the algorithm can be shown to solve some non-convex optimization problems, some systems of quadratic equations, and some linear programs with exponentially many inequalities - all these, and concrete examples we will give, suggest that it might be a powerful algorithmic tool via reductions to these problems.

No special background will be assumed!

Joint on two joint works with Ankit Garg, Leonid Gurvits and Rafael Olivera.

This talk is longer than usual and has a two-hour slot.

If f, g are two polynomials in C[x,y] such that J(f,g)=1, but C[f,g] does not coincide with C[x,y], then the mapping given by these polynomials ( (x,y) maps to (f(x,y), g(x,y)) ) has a rather unexpected property which will be discussed in the talk.

The proliferation of data collection in the health, commercial, and economic spheres, brings with it opportunities for extracting new knowledge with concrete policy implications. Examples include individualizing medical practices based on electronic healthcare records, and understanding the implications of job training programs on employment and income.

The scientific challenge lies in the fact that standard prediction models such as supervised machine learning are often not enough for decision making from this so-called "observational data": Supervised learning does not take into account causality, nor does it account for the feedback loops that arise when predictions are turned into actions. On the other hand, existing causal-inference methods are not adapted to dealing with the rich and complex data now available, and often focus on populations, as opposed to individual-level effects.

The problem is most closely related to reinforcement learning and bandit problems in machine learning, but with the important property of having no control over experiments and no direct access to the actor's model.

In my talk I will discuss how we apply recent ideas from machine learning to individual-level causal-inference and action. I will introduce a novel generalization bound for estimating individual-level treatment effect, and further show how we use representation learning and deep temporal generative models to create new algorithms geared towards this problem. Finally, I will show experimental results using data from electronic medical records and data from a job training program.

The sensitivity conjecture is a famous open problem in the theory of boolean functions. Let f be a boolean function defined on the hypercube. The sensitivity of a node x is the number of its neighbours in the hypercube, for which f give the opposite value as that it does on x. The sensitivity conjecture speculates that if all nodes have low sensitivity, then the function f must be simple. Concretely, all its Fourier mass is supported on levels with low hamming weight.

Recently, Gopalan et al [CCC 2016] conjectured a robust analogue of the sensitivity conjecture: if most of the nodes have low sensitivity, then most of the Fourier mass is supported on levels with low hamming weight. They also prove it under the stronger assumption that all nodes have low sensitivity. In this work, we prove this conjecture, with near tight quantitative bounds.

Joint work with Avishay Tal (IAS) and Jiapeng Zhang (UCSD).

Given a dynamical system, a partition of the space induces a mapping to the space of sequences of the partition elements (a point is mapped to the partition elements containing its orbit terms). Such a duality is called Symbolic Dynamics, Markov partitions are an important tool, as the symbolic dynamics they induce enfold many of the important dynamical properties of the original system, and they allow an easier studying of them.

We show that general non uniformly hyperbolic C^{1+epsilon} diffeomorphism on compact manifolds of any dimension admit countable Markov partitions. Previously this was only known in dimension 2.

We study an online learning framework introduced by Mannor and Shamir (2011) in which the feedback is specified by a graph, in a setting where the graph may vary from round to round and is \emph{never fully revealed} to the learner. We show a large gap between the adversarial and the stochastic cases. In the adversarial case, we prove that even for dense feedback graphs, the learner cannot improve upon a trivial regret bound obtained by ignoring any additional feedback besides her own loss. In contrast, in the stochastic case we give an algorithm that achieves $\widetilde \Theta(\sqrt{\alpha T})$ regret over $T$ rounds, provided that the independence numbers of the hidden feedback graphs are at most $\alpha$. completely unlearnable. We also extend our results to a more general feedback model, in which the learner does not necessarily observe her own loss, and show that, even in simple cases, concealing the feedback graphs might render the problem unlearnable.

Cauchy-Riemann maps (shortly: CR-maps) occur in complex analysis as boundary values of maps holomorphic in a domain in complex space. As a rule, CR-mappings of real-analytic hypersurfaces appear to be analytic as well. However, we recently showed in a joint work with Rasul Shafikov the existence of Stokes Phenomenon in CR-geometry: there exist real-analytic hypersurfaces, which are equivalent formally, but not holomorphically.

Despite of this, it appears that in complex dimension 2, CR-maps necessarily posses appropriate weaker regularity properties. Namely, components of such maps necessarily belong to the well known Gevrey classes. The latter statement has the following remarkable application: if two real-analytic hypersurfaces in complex two-space are equivalent formally, then they are also equivalent smoothly.

The proof of all these facts employs the recent multi-summability theory in Dynamical Systems. It as well employs the recent CR-DS technique that we developed, which connects CR-manifolds and certain Dynamical Systems. In this talk, I will outline the technique, as well as some details of the proof.

Liouville's rigidity theorem (1850) states that a map $f:\Omega\subset R^d \to R^d$ that satisfies $Df \in SO(d)$ is an affine map. Reshetnyak (1967) generalized this result and showed that if a sequence $f_n$ satisfies $Df_n \to SO(d)$ in $L^p$, then $f_n$ converges to an affine map.

In this talk I will discuss generalizations of these theorems to mappings between manifolds, present some open questions, and describe how these rigidity questions arise in the theory of elasticity of pre-stressed materials (non-Euclidean elasticity).

If time permits, I will sketch the main ideas of the proof, using Young measures and harmonic analysis techniques, adapted to Riemannian settings.

Based on a joint work with Asaf Shachar and Raz Kupferman.

We show an efficient method for converting a logic circuit of gates with fan-out 1 into an equivalent circuit that works even if some fraction of its gates are short-circuited, i.e., their output is short-circuited to one of their inputs. Our conversion can be applied to any circuit with fan-in k>= 2, yielding a resilient circuit whose size is polynomial in the size of the (non-resilient) input circuit.

The resilient circuit gives the correct output as long as less than 1/3 of the gates in any of its input-to-output paths are corrupted. Furthermore, we prove that a resilience level of 1/3 is optimal (maximal) for this type of faulty gates. This fully answers an open question by Kalai et al. (FOCS 2012).

Joint work with Mark Braverman and Michael Yitayew.

Everyone wants to program with "high-level concepts", rather than meddle with the fine details of the implementation, such as pointers, network packets, and asynchronous callbacks. This is usually achieved by introducing layers of abstraction - but every layer incurs some overhead, and when they accumulate, this overhand becomes significant and sometimes prohibitive. Optimizing the program often requires to break abstractions, which leads to suboptimal design, higher maintenance costs, and subtle hard-to-trace bugs.

I will present two recent projects that attempt to address this difficulty. STNG is an automated lifting compiler that can synthesize high-level graphics code for computing stencils over matrices, from low-level legacy code written in Fortran. Its output is expressed in Halide, a domain-specific language for image processing that can take advantage of modern GPUs. The result is therefore code that is both easier to understand and more efficient than the original code.

Bellmania is a specialized tool for accelerating dynamic-programming algorithms by generating recursive divide-and-conquer implementations of them. Recursive divide-and-conquer is an algorithmic technique that was developed to obtain better memory locality properties. Bellmania includes a high-level language for specifying dynamic programming algorithms and a calculus that facilitates gradual transformation of these specifications into efficient implementations. These transformations formalize the divide-and-conquer technique; a visualization interface helps users to interactively guide the process, while an SMT-based back-end verifies each step and takes care of low-level reasoning required for parallelism.

The lesson is that synthesis techniques are becoming mature enough to play a role in the design and implementation of realistic software systems, by combining the elegance of abstractions with the performance gained by optimizing and tuning the fine details.

We discuss recent progress on hardness of 2-to-2 games, with applications to the inapproximability of the Vertex-Cover problem.

A 2-to-2 game (which is a variant of Khot's well known unique games), is defined by a graph where there is a variable in each node, and a constraint of a specific structure defined on each edge. While in unique games each edge- constraint must be a one-to-one correspondence -- i.e. for each assignment to one node there is exactly one assignent to the other node that satisfies the constraint -- in 2-to-2 games the correspondence has a "two-to-two" structure.

The goal is to distinguish between instances in which almost all of the edge- constraints can be satisfied, and instances in which almost none of them can be satisfied simultaneously.

We present a new combinatorial hypothesis regarding Grassmann graphs, and show that it implies that 2-to-2 games are NP-hard *in a certain sense*. As a consequence, the hypothesis implies that it is NP-hard to distinguish between graphs that have an independent set of fractional size (1- 1/sqrt{2}), and graphs with no independent sets of any constant fractional size. This easily implies that it is NP-hard to approximate the Vertex Cover problem within a factor \sqrt{2} - o(1).

The talk is mostly based on a joint work with Subhash Khot and Muli Safra, nevertheless, we will also mention results from a more recent extension, which is a joint work with Irit Dinur, Subhash Khot, Guy Kindler and Muli Safra.

Selfadjoint extensions of a closed symmetric operator A in a Hilbert space with equal deficiency indices were described by in the 30's by J. von Neumann. Another approach, based on the notion of abstract boundary triple originates in the work of J.W. Calkin and was developed by M. I. Visik, G. Grubb, F. S. Rofe-Beketov, M. L. Gorbachuck, A .N. Kochubei and others.

By Calkin's approach, all selfadjoint extensions of the symmetric operator A can be parametrized via "multivalued" selfadjoint operators in an auxiliary Hilbert space. Spectral properties of these extensions can be characterized in terms of the abstract Weyl function, associated to the boundary triple. In the present talk some recent developments in the theory of boundary triples will be presented. Applications to boundary value problems for Laplacian operators in bounded domains with smooth and rough boundaries will be discussed.

We characterize the communication complexity of truthful mechanisms. Our departure point is the well known taxation principle. The taxation principle asserts that every truthful mechanism can be interpreted as follows: every player is presented with a menu that consists of a price for each bundle (the prices depend only on the valuations of the other players). Each player is allocated a bundle that maximizes his profit according to this menu. We define the taxation complexity of a truthful mechanism to be the logarithm of the maximum number of menus that may be presented to a player.

Our main finding is that in general the taxation complexity essentially equals the communication complexity. The proof consists of two main steps. First, we prove that for rich enough domains the taxation complexity is at most the communication complexity. We then show that the taxation complexity is much smaller than the communication complexity only in "pathological" cases and provide a formal description of these extreme cases.

Our approach yields several applications, including strengthening the solution concept with low communication overhead, fast computation of prices, and hardness of approximation by computationally efficient truthful mechanisms.

In the classical Node-Disjoint Paths (NDP) problem, the input consists of an undirected n-vertex graph G, and a collection M of pairs of its vertices, called source-destination, or demand, pairs. The goal is to route the largest possible number of the demand pairs via node-disjoint paths. The best current approximation for the problem is achieved by a simple greedy algorithm, whose approximation factor is O(\sqrt n), while the best current negative result is a roughly \Omega(log^{1/2}n)-hardness of approximation. Even seemingly simple special cases of the problem are still poorly understood: when the input graph is a grid, the best current algorithm achieves a \tilde{O}(n^{1/4})- approximation, and when it is a general planar graph, the best current approximation ratio of an efficient algorithm is \tilde{O}(n^{9/19}). The best currently known lower bound for both these versions of the problem is APX- hardness.

In this talk we will show that NDP is 2^{\Omega(\log n)}-hard to approximate, unless all problems in NP have algorithms with running time n^{O(\log n)}. Our result holds even when the underlying graph is a planar graph with maximum vertex degree 3, and all source vertices lie on the boundary of a single face. We extend this result to the closely related Edge-Disjoint Paths problem, showing the same hardness of approximation ratio even for sub-cubic planar graphs with all sources lying on the boundary of a single face.

This is joint work with David H.K. Kim and Rachit Nimavat.

One of the most common tasks in cryptography and cryptanalysis is to find some interesting event (a needle) in an exponentially large collection (haystack) of N=2^n possible events, or to demonstrate that no such event is likely to exist. In particular, we are interested in finding needles which are defined as events that happen with an unusually high probability of p>>1/N in a haystack which is an almost uniform distribution on N possible events. When the search algorithm can only sample values from this distribution, the best known time/memory tradeoff for finding such an event requires O(1/Mp^2) time given O(M) memory.

In this talk I will describe much faster needle searching algorithms in the common cryptographic setting in which the distribution is defined by applying some deterministic function f to random inputs. Such a distribution can be modeled by a random directed graph with N vertices in which almost all the vertices have O(1) predecessors while the vertex we are looking for has an unusually large number of O(pN) predecessors. When we are given only a constant amount of memory, we propose a new search methodology which we call **NestedRho**. As p increases, such random graphs undergo several subtle phase transitions, and thus the log-log dependence of the time complexity T on p becomes a piecewise linear curve which bends four times. Our new algorithm is faster than the O(1/p^2) time complexity of the best previous algorithm in the full range of 1/N < p < 1 , and in particular it improves the previous time complexity by a significant factor of \sqrt{N} for any p in the range N^(- 0.75) < p < N^(-0.5). When we are given more memory, we show how to combine the **NestedRho** technique with the parallel collision search technique in order to further reduce its time complexity. Finally, we show how to apply our new search technique to more complicated distributions with multiple peaks when we want to find all the peaks whose probabilities are higher than p.

Joint work with Itai Dinur, Orr Dunkelman and Nathan Keller.

Projected gradient descent (PGD), and its close variants, are often considered the methods of choice for solving a large variety of machine learning optimization problems, including empirical risk minimization, statistical learning, and online convex optimization. This is not surprising, since PGD is often optimal in a very appealing information-theoretic sense. However, for many problems PGD is infeasible both in theory and practice since each step requires to compute an orthogonal projection onto the feasible set. In many important cases, such as when the feasible set is a non-trivial polytope, or a convex surrogate for a low-rank structure, computing the projection is computationally inefficient in high-dimensional settings. An alternative is the conditional gradient method (CG), aka Frank-Wolfe algorithm, that replaces the expensive projection step with a linear optimization step over the feasible set. Indeed in many problems of interest, the linear optimization step admits much more efficient algorithms than the projection step, which is the reason to the substantial regained interest in this method in the past decade. On the downside, the convergence rates of the CG method often fall behind that of PGD and its variants.

In this talk I will survey an ongoing effort to design CG variants that on one hand enjoy the cheap iteration complexity of the original method, and on the other hand converge provably faster, and are applicable to a wider variety of machine learning settings. In particular I will focus on the cases in which the feasible set is either a polytope or a convex surrogate for low-rank matrices. Results will be demonstrated on applications including: LASSO, video co-localization, optical character recognition, matrix completion, and multi-class classification.

Suppose that you have n truly random bits X=(X1,...,Xn) and you wish to use them to generate m>>n pseudorandom bits Y=(Y1,..., Ym) using a local mapping, i.e., each Yi should depend on at most d=O(1) bits of x. In the polynomial regime of m=n^s, s>1, the only known solution, originates from (Goldreich, ECCC 2000), is based on Random Local Functions: Compute Yi by applying some fixed (public) d-ary predicate P to a random (public) tuple of distinct inputs. In this talk, we will try to understand, for any value of s, how the pseudorandomness of the resulting sequence depends on the choice of the underlying predicate.

Based on joint work with Shachar Lovett.

One of the most exciting possibilities opened by deep neural networks is end-to-end learning: the ability to learn tasks without the need for feature engineering or breaking down into sub-tasks. This talk will present three cases illustrating how end-to-end learning can operate in machine perception across the senses (Hearing, Vision) and even for the entire perception-cognition-action cycle.

The talk begins with speech recognition, showing how acoustic models can be learned end-to-end. This approach skips the feature extraction pipeline, carefully designed for speech recognition over decades.

Proceeding to vision, a novel application is described: identification of photographers of wearable video cameras. Such video was previously considered anonymous as it does not show the photographer.

The talk concludes by presenting a new task, encompassing the full perception-cognition-action cycle: visual learning of arithmetic operations using only pictures of numbers. This is done without using or learning the notions of numbers, digits, and operators.

The talk is based on the following papers:

Speech Acoustic Modeling From Raw Multichannel Waveforms, Y. Hoshen, R.J. Weiss, and K.W. Wilson, ICASSP'15

An Egocentric Look at Video Photographer Identity, Y. Hoshen, S. Peleg, CVPR'16

Visual Learning of Arithmetic Operations, Y. Hoshen, S. Peleg, AAAI'16

If $X$ is a finite set, the $p$-biased measure on the power-set of $X$ is defined as follows: choose a subset $S$ of $X$ at random by including each element of $X$ independently with probability $p$. If $\mathcal{F}$ is a family of subsets of $X$, one can consider the {\em $p$-biased measure} of $\mathcal{F}$, denoted by $\mu_p(\mathcal{F})$, as a function of $p$; if $\mathcal{F}$ is closed under taking supersets, then this function is an increasing function of $p$. Seminal results of Friedgut and Friedgut-Kalai give criteria for this function to have a 'sharp threshold'. A careful analysis of the behaviour of this function also yields some rather strong results in extremal combinatorics which do not explicitly mention the $p$-biased measure - in particular, in the field of {\em Erd\H{o}s-Ko-Rado type problems}, which concern the sizes of families of objects in which any two (or three...) of the objects 'agree' or 'intersect' in some way. We will discuss some of these, including a recent proof of an old conjecture of Frankl that a symmetric three-wise intersecting family of subsets of $\{1,2,\ldots,n\}$ has size $o(2^n)$, and some 'stability' results characterizing the structure of 'large' $t$-intersecting families of $k$-element subsets of $\{1,2,\ldots,n\}$. Based on joint work with (subsets of) Nathan Keller, Noam Lifshitz and Bhargav Narayanan.

Interactive proofs have had a dramatic impact on Complexity Theory and Cryptography. The celebrated IP=PSPACE Theorem [LFKN92,Shamir92] allows an all-powerful but untrusted prover to convince a polynomial-time verifier of the validity of extremely complicated statements (as long as they can be evaluated using polynomial space). The interactive proof system designed for this purpose requires a polynomial number of communication rounds. This talk will focus on studying the power of more efficient interactive proof systems.

Our main result is that for every statement that can be evaluated in polynomial time and bounded-polynomial space, there exists an interactive proof that satisfies the following strict efficiency requirements:

(1) The honest prover runs in polynomial time.

(2) The verifier is almost linear time (and under some conditions even sub linear).

(3) The interaction consists of only a constant number of communication rounds.

To obtain this result, we introduce and study several new notions for interactive proofs, which may be of independent interest. One of these notions is that of unambiguous interactive proofs, where the prover has a unique successful strategy. Another notion is that of probabilistically checkable interactive proofs (PCIPs) where the verifier only reads a few bits of the transcript in checking the proof (this could be viewed as an interactive extension of PCPs).

Joint work with Omer Reingold and Ron Rothblum.

Computer science and optics are usually studied separately -- separate people, in separate departments, meet at separate conferences. This is changing. The exciting promise of technologies like virtual reality and self-driving cars demand solutions that draw from the best aspects of computer vision, computer graphics, and optics. Previously, it has proved difficult to bridge these communities. For instance, the laboratory setups in optics are often designed to image millimeter-size scenes in a vibration-free darkroom.

This talk is centered around time of flight imaging, a growing area of research in computational photography. A time of flight camera works by emitting amplitude modulated (AM) light and performing correlations on the reflected light. The frequency of AM is in the radio frequency range (like a Doppler radar system), but the carrier signal is optical, overcoming diffraction limited challenges of full RF systems while providing optical contrast. The obvious use of such cameras is to acquire 3D geometry. By spatially, temporally and spectrally coding light transport we show that it may be possible to go "beyond depth", demonstrating new forms of imaging like photography through scattering media, fast relighting of photographs, real-time tracking of occluded objects in the scene (like an object around a corner), and even the potential to distinguish between biological molecules using fluorescence. We discuss the broader impact of this design paradigm on the future of 3D depth sensors, interferometers, computational photography, medical imaging and many other applications.

Cluster algebras are a class of commutative rings introduced by Fomin and Zelevinsky in 2000. I will give an introductory talk about cluster algebras. The main examples are the cluster algebra of type A2, the coordinate ring of $SL_4/N$, and the homogeneous coordinate ring of the Grassmannian $Gr_{2,n+3}(\mathbb{C})$.

This Thursday we will have two SIGGRAPH rehearsal talks in the Vision Seminar, one by Netalee Efrat and one by Meirav Galun. Abstracts are below. Each talk will be about 15 minutes (with NO interruptions), followed by 10 minutes feedback.

**Talk1 (Netalee Efrat): Cinema 3D: Large scale automultiscopic display **

While 3D movies are gaining popularity, viewers in a 3D cinema still need to wear cumbersome glasses in order to enjoy them. Automultiscopic displays provide a better alternative to the display of 3D content, as they present multiple angular images of the same scene without the need for special eyewear. However, automultiscopic displays cannot be directly implemented in a wide cinema setting due to variants of two main problems: (i) The range of angles at which the screen is observed in a large cinema is usually very wide, and there is an unavoidable tradeoff between the range of angular images supported by the display and its spatial or angular resolutions. (ii) Parallax is usually observed only when a viewer is positioned at a limited range of distances from the screen. This work proposes a new display concept, which supports automultiscopic content in a wide cinema setting. It builds on the typical structure of cinemas, such as the fixed seat positions and the fact that different rows are located on a slope at different heights. Rather than attempting to display many angular images spanning the full range of viewing angles in a wide cinema, our design only displays the narrow angular range observed within the limited width of a single seat. The same narrow range content is then replicated to all rows and seats in the cinema. To achieve this, it uses an optical construction based on two sets of parallax barriers, or lenslets, placed in front of a standard screen. This paper derives the geometry of such a display, analyzes its limitations, and demonstrates a proof-of-concept prototype.

*Joint work with Piotr Didyk, Mike Foshey, Wojciech Matusik, Anat Levin

**Talk 2 (Meirav Galun): Accelerated Quadratic Proxy for Geometric Optimization **

We present the Accelerated Quadratic Proxy (AQP) - a simple first order algorithm for the optimization of geometric energies defined over triangular and tetrahedral meshes. The main pitfall encountered in the optimization of geometric energies is slow convergence. We observe that this slowness is in large part due to a Laplacian-like term existing in these energies. Consequently, we suggest to exploit the underlined structure of the energy and to locally use a quadratic polynomial proxy, whose Hessian is taken to be the Laplacian. This improves stability and convergence, but more importantly allows incorporating acceleration in an almost universal way, that is independent of mesh size and of the specific energy considered. Experiments with AQP show it is rather insensitive to mesh resolution and requires a nearly constant number of iterations to converge; this is in strong contrast to other popular optimization techniques used today such as Accelerated Gradient Descent and Quasi-Newton methods, e.g., L-BFGS. We have tested AQP for mesh deformation in 2D and 3D as well as for surface parameterization, and found it to provide a considerable speedup over common baseline techniques.

*Joint work with Shahar Kovalsky and Yaron Lipman

The Jacquet-Rallis relative trace formula was introduced as a tool towards solving the global conjectures of Gan-Gross-Prasad for unitary groups. I will present some recent progress in developing the full formula.

I will show how to extend the transfer of regular orbital integrals to singular geometric terms using a mix of local and global methods.

(Joint with Pierre-Henri Chaudouard)

Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge-coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. In this paper we prove that these exponential gaps are necessary and establish connections between the deterministic and randomized complexities in the LOCAL model. Each result has a very compelling take-away message:

1. Fast Δ-coloring of trees requires random bits: Building on the recent lower bounds of Brandt et al., we prove that the randomized complexity of Δ-coloring a tree with maximum degree Δ≥55 is Θ(log_Δ log n), whereas its deterministic complexity is Θ(log_Δ n) for any Δ≥3. This also establishes a large separation between the deterministic complexity of Δ-coloring and (Δ+1)-coloring trees.

2. Randomized lower bounds imply deterministic lower bounds: We prove that any deterministic algorithm for a natural class of problems that runs in O(1)+o(log_Δ n) rounds can be transformed to run in O(log*n −log*Δ+1) rounds. If the transformed algorithm violates a lower bound (even allowing randomization), then one can conclude that the problem requires Ω(log_Δ n) time deterministically.

3. Deterministic lower bounds imply randomized lower bounds: We prove that the randomized complexity of any natural problem on instances of size n is at least its deterministic complexity on instances of size √ log n. This shows that a deterministic Ω(log_Δ n) lower bound for any problem implies a randomized Ω(log_Δ log n) lower bound. It also illustrates that the graph shattering technique is absolutely essential to the LOCAL model.

Joint work with Tsvi Kopelowitz and Seth Pettie. http://arxiv.org/abs/1602.08166

There are numerous essentially equivalent characterizations of mixing in $L_1$ of a finite Markov chain. Some of these characterizations involve natural probabilistic concepts such as couplings, stopping times and hitting times. In contrast, while there are several analytic and geometric tools for bounding the $L_2$ mixing time, none of them are tight and they do not have a probabilistic interpretation.

We provide tight probabilistic characterizations in terms of hitting times distributions for mixing in $L_2$ (for normal chains) and (under reversibility) in relative entropy. This is done by assigning appropriate penalty (depending on the size of the set) to the case that the chain did not escape from a certain set.

We also prove a new extremal characterization of the log-sobolev constant in terms of a weighted version of the spectral gap (where the weight depends on the size of the support of the function).

We will discuss representation theory of a symmetric pair (G,H), where G is a complex reductive group, and H is a real form of G. The main objects of study are the G-representations with a non trivial H-invariant functional, called the H-distinguished representations of G.

I will give a necessary condition for a G-representation to be H-distinguished and show that the multiplicity of such representations is less or equal to the number of double cosets B\G/H, where B is a Borel subgroup of G.

Quantum computers are hypothetical devices, based on quantum physics, which would enable us to perform certain computations (among them some that Chaim Leib Pekeris pioneered) hundreds of orders of magnitude faster than digital computers. This feature is coined “quantum supremacy.” We start the lecture with a gentle introduction to computing - classical and quantum, with basic notions of computational complexity, and with the vision of quantum computers.

A main reason for concern regarding the feasibility of quantum computers is that quantum systems are inherently noisy. We will explain what is "noise" and describe an optimistic hypothesis regarding quantum noise that will allow quantum computing and a pessimistic hypothesis that won’t. The remarkable progress witnessed during the past two decades in the field of experimental physics of controlled quantum systems places the decision between the pessimistic and optimistic hypotheses within reach. On the optimistic side, one aspect or another of quantum supremacy might be seen by experiments in the near future: by implementing quantum error-correction or by systems of free bosons or by exotic new phases of matter called anyons or by quantum annealing, or in various other ways.

In the lecture I will explain my pessimistic line of research and here is a brief summary of my view: understanding quantum computers in the presence of noise requires consideration of behavior at different scales. In the small scale, standard models of noise from the mid-90s are suitable, and quantum evolutions and states described by them manifest a very low-level computational power. This small-scale behavior has far-reaching consequences for the behavior of noisy quantum systems at larger scales. On the one hand, it does not allow reaching the starting points for quantum fault tolerance and quantum supremacy, making them both impossible at all scales. On the other hand, it leads to novel implicit ways for modeling noise at larger scales and to various predictions on the behavior of noisy quantum systems.

We will rely on the theory of noise-sensitivity and stability developed with Benjamini and Schramm in the late 90s and on recent work with Guy Kindler related to the mysterious gap between permanents and determinants (or, in other words, between bosons and fermions).

Let F/Q_p be a finite extension, supersingular representations are the irreducible mod p representations of GL_n(F) which do not appear as a subquotient of a principal series representation, and similarly to the complex case, they are the building blocks of the representation theory of GL_n(F). Historically, they were first discovered by L. Barthel and R. Livne some twenty years ago and they are still not understood even for n=2.

For F=Q_p, the supersingular representations of GL_2(F) have been classified by C. Breuil, and a local mod p Langlands correspondence was established between them and certain mod p Galois representations.

When one tries to generalize this connection and move to a non-trivial extension of Q_p, Breuil's method fails; The supersingular representations in that case have complicated structure and instead of two as in the case F=Q_p we get infinitely many such representations, when there are essentially only finitely many on the Galois side.

In this talk we give an exposition of the subject and explore, using what survives from Breuil's methods, the universal modules whose quotients contain all the supersingular representations in the difficult case where F is a non-trivial extension of Q_p.

In distributed systems, communication between the participants in the computation is usually the most expensive part of the computation. Theoretical models of distributed systems usually reflect this by neglecting the cost of local computation, and charging only for messages sent between the participants; in particular, we usually assume that the computation proceeds in rounds, and in each round, each participant can send only a limited number of bits. We are interested in characterizing the number of rounds required to perform various tasks.

In this talk I will describe two sets of results. The first concerns the complexity of distributed subgraph detection: we have n servers, each representing a node in an undirected graph, and each server receives as input its adjacent edges in the graph. The goal of the computation is to determine whether the global input graph contains some fixed subgraph. I will describe upper and lower bounds for several classes of subgraphs, through a connection to Turan numbers. The general case remains open.

In the second part of the talk I will describe recent work on multi- party number-in-hand communication and information complexity, and show a tight upper and lower bound for set disjointness in the shared blackboard model.

Joint work with Mark Braverman, Andrew Drucker and Fabian Kuhn.

Children may learn about the world by pushing, banging, and manipulating things, watching and listening as materials make their distinctive sounds-- dirt makes a thud; ceramic makes a clink. These sounds reveal physical properties of the objects, as well as the force and motion of the physical interaction.

We've explored a toy version of that learning-through-interaction by recording audio and video while we hit many things with a drumstick. We developed an algorithm the predict sounds from silent videos of the drumstick interactions. The algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We demonstrate that the sounds generated by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that the task of predicting sounds allows our system to learn about material properties in the scene.

Joint work with:

Andrew Owens, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson

http://arxiv.org/abs/1512.08512

The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. In a sense, the strongest guarantee available is the L2 guarantee, which requires finding all items that occur at least eps*||f|| times in the stream, where the i-th coordinate of the vector f is the number of occurrences of i in the stream. The first algorithm to achieve the L2 guarantee was the CountSketch (Charikar, Chen, and Farach-Colton ICALP'02), which, for constant eps, requires O(log n) words of memory and O(log n) update time. It is known to be space-optimal if the stream includes deletions.

In this talk I will discuss recent improvements that allow us to find L2 heavy hitters in O(1) memory and O(1) update time in insertion only streams. The improvements rely on a deeper understanding of the AMS sketch (Alon, Matias, and Szegedy STOC'96) and similar sketches and draw on the theory of Gaussian processes. This talk is based on joint work with Vladimir Braverman, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P. Woodruff in arxiv:1511.00661 and arxiv:1603.00759.

There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for essentially similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self-averaging, interpolating algorithm which fits training data without error but is nevertheless somewhat smooth. We show that AdaBoost has the same property. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples and some theoretical justification to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees and without direct regularization or early stopping.

For some rectangular Hardy classes of analytic functions,an optimal method of interpolation has been previously found, within the framework of Optimal Recovery. It will be shown that this method of interpolation, based on the Abel-Jacobi elliptic functions, is also optimal, according to corresponding criteria of Nonparametric Regression and Optimal Design.

In a non-asymptotic setting, the maximal mean squared error of the optimal interpolant is evaluated explicitly, for all noise levels away from 0. In these results, a pivotal role is played by an interference effect, in which both the stochastic and deterministic parts of the interpolant exhibit an oscillating behavior, with the two oscillating processes mutually subduing each other.

In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, we propose the methodology of sampling correctors. Such algorithms use structure that the distribution is purported to have, in order to allow one to make "on-the-fly" corrections to samples drawn from probability distributions. These algorithms may then be used as filters between the noisy data and the end user. We show connections between sampling correctors, distribution learning algorithms, and distribution property testing algorithms. We show that these connections can be utilized to expand the applicability of known distribution learning and property testing algorithms as well as to achieve improved algorithms for those tasks.Warning: This talk contains more questions than answers...

Joint work with Clement Canonne and Themis Gouleakis.

We present a randomized algorithm that computes a Minimum Spanning Tree (MST) in O(log^* n) rounds, with high probability, in the Congested Clique model of distributed computing. In this model, the input is a graph on n nodes, initially each node knows only its incident edges, and per round each two nodes can exchange O(log n) bits.

Our key technical novelty is an O(log^* n) Graph Connectivity algorithm, the heart of which is a (recursive) forest growth method, based on a combination of two ideas: a sparsity-sensitive sketching aimed at sparse graphs and a random edge sampling aimed at dense graphs.

Our result improves significantly over the $O(\log \log \log n)$ algorithm of Hegeman et al. [PODC 2015] and the $O(\log \log n)$ algorithm of Lotker et al. [SPAA 2003; SICOMP 2005].

Join work with Mohsen Ghaffari.

Estimating the amount of distinct elements in a dataset by examining only a fraction of the data is known to be a hard problem, both theoretically and in practice.

Our work explores a breakthrough theoretical result by Valiant and Valiant from 2011 that presents a provably accurate method for doing such estimations.

Our goal is to put this theory into practice for the important task of estimating the deduplication ratio of a very large dataset. However, deploying this technique in a real world setting runs into significant obstacles.

In the talk I will describe new techniques that help bridging the gap and enable the use of this exciting approach. Our work achieves a major improvement over the current state of the art practical solutions.

The talk is for a general audience, no prior knowledge is assumed.

Based on joint work with Dmitry Sotnikov and Ety Khaitzin that appeared at Usenix FAST 2016.

While co-addition and subtraction of astronomical images stand at the heart of observational astronomy, the existing solutions for them lack rigorous argumentation, are not achieving maximal sensitivity and are often slow. Moreover, there is no widespread agreement on how they should be done, and often different methods are used for different scientific applications. I am going to present rigorous solutions to these problems, deriving them from the most basic statistical principles. These solutions are proved optimal, under well defined and practically acceptable assumptions, and in many cases improve substantially the performance of the most basic operations in astronomy.

For coaddition, we present a coadd image that is:

a) sufficient for any further statistical decision or measurement on the underlying constant sky, making the entire data set redundant.

b) improves both survey speed (by 5-20%) and effective spatial resolution of past and future astronomical surveys.

c) improves substantially imaging through turbulence applications.

d) much faster than many of the currently used coaddition solutions.

For subtraction, we present a subtraction image that is:

a) optimal for transient detection under the assumption of spatially uniform noise.

b) sufficient for any further statistical decision on the differences between the images, including the identification of cosmic rays and other image artifacts.

c) Free of subtraction artifacts, allowing (for the first time) robust transient identification in real time, opening new avenues for scientific exploration.

d) orders of magnitude faster than past subtraction methods.

I will present an elementary proof of the following theorem of Alexander Olshanskii:

Let F be a free group and let A,B be finitely generated subgroups of infinite index in F. Then there exists an infinite index subgroup C of F which contains both A and a finite index subgroup of B.

The proof is carried out by introducing a 'profinite' measure on the discrete group F, and is valid also for some groups which are not free.Some applications of this result will be discussed:

1. Group Theory - Construction of locally finite faithful actions of countable groups.

2. Number Theory - Discontinuity of intersections for large algebraic extensions of local fields.

3. Ergodic Theory - Establishing cost 1 for groups boundedly generated by subgroups of infinite index and finite cost.

A common approach to face recognition relies on using deep learning for extracting a signature. All leading work on the subject use stupendous amounts of processing power and data. In this work we present a method for efficient and compact learning of metric embedding. The core idea allows a more accurate estimation of the global gradient and hence fast and robust convergence. In order to avoid the need for huge amounts of data we include an explicit alignment phase into the network, hence greatly reducing the number of parameters. These insights allow us to efficiently train a compact deep learning model for face recognition in only 12 hours on a single GPU, which can then fit a mobile device.

Joint work with: Oren Tadmor, Tal Rosenwein, Shai Shalev-Schwartz, Amnon Shashua

Expander graphs are widely studied, and various methods are known to obtain bounded degree expander graphs. Recently, there is a growing interest in understanding combinational expansion in higher dimensions (higher dimensional simplicial complexes). However, bounded degree combinatorial expanders (random or explicit) were not known till our work.

We present a local to global criterion on a complex that implies combinatorial expansion. We use our criterion to present explicit bounded degree high dimensional expanders. This solves in the affirmative an open question raised by Gromov, who asked whether bounded degree high dimensional expanders could at all exist.

We expect that the emerging theory of high dimensional expansion is likely to have various application in the theory of computation. Thus, one of the goals of this talk in to introduce this concept to the theory community.

Based on joint works with David Kazhdan and Alex Lubotzky, and with Shai Evra.

Dynamic events such as family gatherings, concerts or sports events are often photographed by a group of people. The set of still images obtained this way is rich in dynamic content. We consider the question of whether such a set of still images, rather than traditional video sequences, can be used for analyzing the dynamic content of the scene. This talk will describe several instances of this problem, their solutions and directions for future studies.

In particular, we will present a method to extend epipolar geometry to predict location of a moving feature in CrowdCam images. The method assumes that the temporal order of the set of images, namely photo-sequencing, is given. We will briefly describe our method to compute photo-sequencing using geometric considerations and rank aggregation. We will also present a method for identifying the moving regions in a scene, which is a basic component in dynamic scene analysis. Finally, we will consider a new vision of developing collaborative CrowdCam, and a first step toward this goal.

This talk will be based on joint works with Tali Dekel, Adi Dafni, Mor Dar, Lior Talked, Ilan Shimshoni, and Shai Avidan.

Projection inequalities bound the volume of a body in terms of a product of the volumes of lower-dimensional projections of the body. Two well-known examples are the Loomis-Whitney inequality and the more general Uniform Cover inequality. We will see how to use information theory to prove stability versions of these inequalities, showing that when they are close to being tight, the body in question is close to being a box (which is the unique case of equality). We will also see how to obtain a stability result for the edge-isoperimetric inequality in the infinite d-dimensional lattice. Namely, that a subset of Z^d with small edge-boundary must be close in symmetric difference to a d-dimensional cube.

Based on work with David Ellis, Ehud Friedgut and Guy Kindler.

It is common practice in multivariate and matrix-valued data analysis to reduce dimensionality by performing a Singular Value Decomposition or Principal Component Analysis, and keeping only $r$ singular values or principal components, the rest being presumably associated with noise. However, the literature does not propose a disciplined criterion to determine $r$; most practitioners still look for the ``elbow in the Scree Plot'', a 50-years-old heuristic performed by eye. I'll review a line of work which develops a systematic approach to eigenvalue and singular value thresholding. This approach assumes that the signal is low-rank and that the noise is rotationally invariant. Recent results derive optimal thresholds in the presence of quite general noise distributions.

Joint work with David Donoho, Iain Johnstone and Edgar Dobriban (Stanford).

Let K be a complete discrete valuation field with finite residue field of characteristic p>0. Let G be the absolute Galois group of K and for a natural M, let G(M) be the maximal quotient of G of nilpotent class <p and period p^M. Then G(M) can be identified with a group obtained from a Lie Z/p^M-algebra L via (truncated) Campbell-Hausdorff composition law. Under this identification the ramification subgroups in upper numbering G(M)^(v)correspond to ideals L^(v) of L. It will be explained an explicit construction of L and the ideals L^(v). The case of fields K of characteristic p was obtained by the author in 1990's (recently refined), the case of fields K of mixed characteristic requires the assumption that K contains a primitive p^M-th root of unity (for the case M=1 cf. Number Theory Archive).

We show that a version of the classical Iterated Log Law of Khinchin, and independently of Kolmogorov from the 1920's, holds for various parameters in the binomial random graph model and in a random 0/1 Bernoulli matrix. In particular, for a constant p, we show that such a law holds for the number of copies of a fixed graph H in G(n,p), we show a similar statement for the number of Hamilton cycles in a random k-uniform hypergraph, provided that k\geq 4. In the graph case (that is, k=2), since the number of Hamilton cycles in G(n,p), denoted by X_n, does not converge to a normal distribution but rather tends to a log-normal distribution (as has been first proved by Janson), we show that a version of the Iterated Log Law holds for \log X_n. We also obtain similar result for the permanent of a 0/1 bernouli random matrix.

No prior knowledge is required.

Joint with Daniel Motealegre and Van Vu.

Heuristically, delocalization for a random matrix means that its normalized eigenvectors look like the vectors uniformly distributed over the unit sphere. This can be made precise in a number of different ways. We show that with high probability, any sufficiently large set of coordinates of an eigenvector carries a non-negligible portion of its Euclidean norm. Our results pertain to a large class of random matrices including matrices with independent entries, symmetric, skew-symmetric matrices, as well as more general ensembles.

Joint work with Roman Vershynin.

Raz's celebrated Parallel Repetition Theorem shows that the probability of simultaneously winning n independent instances of a two-player one-round game G is exponentially small in n, when the maximum success probability of G is less than 1. Though the statement is intuitive, the proof is rather nontrivial and has found important application in hardness of approximation, cryptography, and communication complexity.

There are two major open problems regarding the parallel repetition of games: does an analogue of Raz's theorem hold for (a) games with more than two players, and (b) games with quantumly entangled players? Extending Raz’s theorem to these settings is a challenging problem for a number of reasons: techniques for attacking direct sum/direct product problems in multiparty settings are lacking, and our understanding of quantum entanglement as an information theoretic resource is quite limited.

In this work, we show to sidestep these barriers and make progress on the two open problems. We first prove exponential-decay parallel repetition theorems for a class of games we called "anchored games" in the multiplayer and entangled-player settings. Then, we show how to efficiently transform any game into an equivalent anchored game. Together, our results provide a simple hardness-amplification technique for games in both the classical multiplayer and quantum settings.

Joint work with Mohammad Bavarian and Thomas Vidick.

Erdös-Ko-Rado (EKR) type theorems yield upper bounds on the size of set families under various intersection requirements on the elements. Stability versions of such theorems assert that if the size of a family is close to the maximum possible then the family itself must be close (in appropriate sense) to a maximum family. In this talk we present an approach to stability versions of EKR-type theorems through isoperimetric inequalities for subsets of the hypercube. We use this approach to obtain tight stability versions of the EKR theorem itself and of the Ahlswede-Khachatrian theorem on t-intersecting families (for k < n/(t+1)), and to show that, somewhat surprisingly, both theorems hold when the "intersection" requirement is replaced by a much weaker requirement. Furthermore, we obtain stability versions of several recent EKR-type results, including Frankl's proof of the Erdös matching conjecture for n>(2s+1)k-s.

Joint work with David Ellis and Noam Lifshitz.

We study the discrepancy of the number of visits of a Kronicker sequence on a d dimensional torus to nice sets. We are interested in particular in the question how the answer depends on the geometry of the set.

This is a joint work with Bassam Fayad.

(http://arxiv.org/abs/1211.4323 and http://arxiv.org/abs/1206.4853)

Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) initiated the formal study of this problem, and gave the first upper bounds on the achievable generalization error for adaptive data analysis.

The results of Dwork et al. are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that stable algorithms of the kind guaranteed by differential privacy imply low generalization error. We also show that weaker stability guarantees such as bounded KL divergence and total variation distance lead to correspondingly weaker generalization guarantees.

Joint work with Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, and Jonathan Ullman.

For the past 40 years computer scientists generally believed that NP-complete problems are intractable. In particular, Boolean satisfiability (SAT), as a paradigmatic NP-complete problem, has been considered to be intractable. Over the past 20 years, however, there has been a quiet, but dramatic, revolution, and very large SAT instances are now being solved routinely as part of software and hardware design.

In this talk I will review this amazing development and show that we can leverage SAT solving to accomplish other Boolean reasoning tasks. Counting the number of satisfying truth assignments of a given Boolean formula or sampling such assignments uniformly at random are fundamental computational problems in computer science with numerous applications. While the theory of these problems has been thoroughly investigated in the 1980s, approximation algorithms developed by theoreticians do not scale up to industrial-sized instances. Algorithms used by the industry offer better scalability, but give up certain correctness guarantees to achieve scalability. We describe a novel approach, based on universal hashing and Satisfiability Modulo Theory, that scales to formulas with hundreds of thousands of variable without giving up correctness guarantees.

Homological algebra plays a major role in noncommutative ring theory. One important homological construct related to a noncommutative ring A is the dualizing complex, which is a special kind of complex of A-bimodules. When A is a ring containing a central field K, this concept is well-understood now. However, little is known about dualizing complexes when the ring A does not contain a central field (I shall refer to this as the noncommutative arithmetic setting). The main technical issue is finding the correct derived category of A-bimodules.

In this talk I will propose a promising definition of the derived category of A-bimodules in the noncommutative arithmetic setting. Here A is a (possibly) noncommutative ring, central over a commutative base ring K (e.g. K = Z). The idea is to resolve A: we choose a DG (differential graded) ring A', central and flat over K, with a DG ring quasi-isomorphism A' -> A. Such resolutions exist. The enveloping DG ring A'^{en} is the tensor product over K of A' and its opposite. Our candidate for the "derived category of A-bimodules" is the category D(A'^{en}), the derived category of DG A'^{en}-modules. A recent theorem says that the category D(A'^{en}) is independent of the resolution A', up to a canonical equivalence. This justifies our definition.

Working within D(A'^{en}), it is not hard to define dualizing complexes over A, and to prove all their expected properties (like when K is a field). We can also talk about rigid dualizing complexes in the noncommutative arithmetic setting.

What is noticeably missing is a result about existence of rigid dualizing complexes. When the K is a field, Van den Bergh had discovered a powerful existence result for rigid dualizing complexes. We are now trying to extend Van den Bergh's method to the noncommutative arithmetic setting. This is work in progress, joint with Rishi Vyas.

In this talk I will explain, in broad strokes, what are DG rings, DG modules, and the associated derived categories and derived functors. Also, I will try to go into the details of a few results and examples, to give the flavor of this material.

Stochastic analysis of real-world signals consists of 3 main parts: mathematical representation; probabilistic modeling; statistical inference. For it to be effective, we need mathematically-principled and practical computational tools that take into consideration not only each of these components by itself but also their interplay. This is especially true for a large class of computer-vision and machine-learning problems that involve certain mathematical structures; the latter may be a property of the data or encoded in the representation/model to ensure mathematically-desired properties and computational tractability. For concreteness, this talk will center on structures that are geometric, hierarchical, or topological.

Structures present challenges. For example, on nonlinear spaces, most statistical tools are not directly applicable, and, moreover, computations can be expensive. As another example, in mixture models, topological constraints break statistical independence. Once we overcome the difficulties, however, structures offer many benefits. For example, respecting and exploiting the structure of Riemannian manifolds and/or Lie groups yield better probabilistic models that also support consistent synthesis. The latter is crucial for the employment of analysis-by-synthesis inference methods used within, e.g., a generative Bayesian framework. Likewise, imposing a certain structure on velocity fields yields highly-expressive diffeomorphisms that are also simple and computationally tractable; particularly, this facilitates MCMC inference, traditionally viewed as too expensive in this context.

Time permitting, throughout the talk I will also briefly touch upon related applications such as statistical shape models, transfer learning on manifolds, image warping/registration, time warping, superpixels, 3D-scene analysis, nonparametric Bayesian clustering of spherical data, multi-metric learning, and new machine-learning applications of diffeomorphisms. Lastly, we also applied the (largely model-based) ideas above to propose the first learned data augmentation scheme; as it turns out, when compared with the state-of-the-art schemes, this improves the performance of classifiers of the deep-net variety.

I will describe recent work on building and using rich representations aimed at automatic analysis of visual scenes. In particular, I will describe methods for semantic segmentation (labeling regions of an image according to the category it belongs to), and on semantic boundary detection (recovering accurate boundaries of semantically meaningful regions, such as those corresponding to objects). We focus on feed-forward architectures for these tasks, leveraging recent advances in the art of training deep neural networks. Our approach aims to shift the burden of inducing desirable constraints from explicit structure in the model to implicit structure inherent in computing richer, context-aware representations. I will describe experiments on standard benchmark data sets that demonstrate the success of this approach.

Joint work with Mohammadreza Mostajabi, Payman Yadollahpour, and Harry Yang.

Many questions about properties of a finite group such as random walks, spectrum of Cayley graphs, distribution of word maps etc., can be approached by using “generalized Fourier sum” formulas involving characters of the group. Numerical data show that characters of low dimensional representations of the group contribute the largest terms to these sums. However, relatively little seems to be known about these small representations so a systematic knowledge of them could lead to proofs of some of the properties. The talk will demonstrates, through concrete examples, and numerical simulations, a new method to construct and analyze those small representations, and hence hopefully to solve some of the aforementioned questions.

The talk is intended for non-experts.

This is part from a joint project with Roger Howe (Yale).

Many sequence prediction tasks---such as automatic speech recognition and video analysis---benefit from long-range temporal features. One way of utilizing long-range information is through segmental (semi-Markov) models such as segmental conditional random fields. Such models have had some success, but have been constrained by the computational needs of considering all possible segmentations. We have developed new segmental models with rich features based on neural segment embeddings, trained with discriminative large-margin criteria, that are efficient enough for first-pass decoding. In our initial work with these models, we have found that they can outperform frame-based HMM/deep network baselines on two disparate tasks, phonetic recognition and sign language recognition from video. I will present the models and their results on these tasks, as well as (time permitting) related recent work on neural segmental acoustic word embeddings.

This is joint work with Hao Tang, Weiran Wang, Herman Kamper, Taehwan Kim, and Kevin Gimpel

Finite group theorists have established many formulas that express interesting properties of a finite group in terms of sums of characters of the group. An obstacle to applying these formulas is lack of control over the dimensions of representations of the group. In particular, the representations of small dimension tend to contribute the largest terms to these sums, so a systematic knowledge of these small representations could lead to proofs of some of these facts. This talk will discuss a new method for systematically constructing the small representations of finite classical groups. I will explain the method with concrete examples and applications.

This is part from a joint project with Roger Howe (Yale).

Many sequence prediction tasks---such as automatic speech recognition and video analysis---benefit from long-range temporal features. One way of utilizing long-range information is through segmental (semi-Markov) models such as segmental conditional random fields. Such models have had some success, but have been constrained by the computational needs of considering all possible segmentations. We have developed new segmental models with rich features based on neural segment embeddings, trained with discriminative large-margin criteria, that are efficient enough for first-pass decoding. In our initial work with these models, we have found that they can outperform frame-based HMM/deep network baselines on two disparate tasks, phonetic recognition and sign language recognition from video. I will present the models and their results on these tasks, as well as (time permitting) related recent work on neural segmental acoustic word embeddings.

This is joint work with Hao Tang, Weiran Wang, Herman Kamper, Taehwan Kim, and Kevin Gimpel

Recently, the "information percolation" framework was introduced as a way to obtain sharp estimates on mixing for spin systems at high temperatures, and in particular, to establish cutoff for the Ising model in three dimensions up to criticality from a worst starting state. I will describe how this method can be used to understand the effect of different initial states on the mixing time, both random (''warm start'') and deterministic.

Joint work with Allan Sly.

Let *X* be a compact manifold with the boundary ∂ *X* and *R *(λ) be a Dirichlet-to-Neumann operator: *R *(λ): *f* → *u*|_{∂X} where *u* solves ( _{}+ λ^{2}) *u *= 0, *u*|_{∂X} = *f* . We establish asymptotics as λ→ + ∞ of the number of eigenvalues of λ^{-1 }*R* (λ) between s_{1} and s_{2}.

This is a joint work with Andrew Hassell, Australian National University.

In today’s world there are huge amounts of data that need to get reliably stored or transmitted. However, some amount of noise or corruption is inevitable. An error-correcting code is a scheme for robustly representing data in the form of a codeword that allows one to detect and correct errors in transmission. Locally-testable and locally-decodable codes are special families of error-correcting codes that admit highly efficient algorithms that detect and correct errors in sublinear time with high probability, probing only a small number of entries of the corrupted codeword. While locally-testable and locally-decodable codes have been intensely studied in the past 2 decades, in recent years there has been even further incentive for their study due to their relevance for transmission and storage of massive data and the successful implementation of local codes in cloud storage systems.

In this talk, I will show an exponential improvement on the best-known running time of error detection and correction algorithms for locally-testable and locally-decodable codes. Specifically, I will describe new families of locally-testable codes with constant rate that can detect a constant fraction of errors in time (log n)^{O(log log n)} and new families of locally-decodable codes of constant rate that can correct a constant fraction of errors in time exp(\sqrt{log n}). Prior to that, the best known running time for such codes was n^{epsilon} (for a constant epsilon) using several, quite different, constructions.

(Based on joint work with Swastik Kopparty, Or Meir and Shubhangi Saraf)

Locality-Sensitive Hashing (LSH) is a powerful technique for the approximate nearest neighbor search (ANN) in high dimensions. In this talk I will present two recent results:

1) I will show a data structure for ANN for the Euclidean distance that provably outperforms the best possible LSH-based data structure. We proceed via designing a good *data-dependent* hash family.

2) I will show a practical and optimal LSH family for the cosine similarity (a.k.a. Euclidean distance on a sphere). It substantially outperforms the celebrated Hyperplane LSH family. Along the way, I will try to debunk two popular myths about LSH:

* LSH-based data structures consume too much memory and are thus impractical;

* Optimal LSH constructions are too complicated to be made practical.

The talk is based on two papers: arXiv: 1501.01062 (joint with Alexandr Andoni, STOC 2015) and arXiv: 1509.02897 (joint with Alexandr Andoni, Piotr Indyk, Thijs Laarhoven and Ludwig Schmidt, NIPS 2015).

We show techniques for decreasing the error probability of randomized algorithms and for converting randomized algorithms to deterministic (non-uniform) algorithms. Unlike most existing techniques that involve repetition of the randomized algorithm and hence a slowdown, our techniques produce algorithms with a similar run-time to the original randomized algorithms.

The amplification technique is related to a certain stochastic multi-armed bandit problem. The derandomization technique -- which is the main contribution of this work -- points to an intriguing connection between derandomization and sketching/sparsification.

We demonstrate the techniques by showing applications to max-cut on dense graphs, approximate clique, constraint satisfaction problems on dense bipartite graphs, and list decoding to unique decoding for Reed-Muller code.

This is joint work with Ofer Grossman.

By a Theorem of Aaronson, normalized Birkhoff sums of positive integrable functions in infinite, ergodic systems either tend to 0 almost surely or there is a subsequence along which every further subsequence tends to infinity. This is not true for normalized symmetric Birkhoff sums where the summation is along a symmetric time interval as there are examples of infinite, ergodic systems for which the absolutely normalized symmetric Birkhoff sums of positive integrable functions may be almost surely bounded away from zero and infinity. In this talk I will start by explaining a variety of transformations (of different nature) satisfying this phenomena, discuss the case main result that the absolutely normalized, symmetric Birkhoff sums of positive integrable functions in infinite, ergodic systems never converge point-wise and there even exists a universal divergence statement. Time permits I will show some examples of actions of other groups which converge and some recent (yet unwritten) results on actions by commuting skew products which are related to self intersection local times.

The contents of this talk are a combination of 3 papers, one of which is a joint work with Benjamin Weiss and Jon Aaronson and another one is work in progress with Jon Aaronson.

Parameter estimation is performed by fitting data measurements to a model using Bayesian statistics, assuming additional prior information. The estimation requires a numerical solution of large scale optimization problem, whose objective traditionally includes data fidelity and regularization terms. In this talk I will present numerical solution methods for two such estimation problems.

In the first part of the talk I will concentrate on parameter estimation of physical models, obtained by solving optimization problems that are constrained by partial differential equations (PDEs). I will focus on my recent work on 3D Full Waveform Inversion, which arises in seismic exploration of oil and gas reservoirs, earth sub-surface mapping, ultrasound imaging and more. I will demonstrate how to computationally treat this inverse problem, and improve its solution by using travel time tomography in a joint inversion framework. This includes efficient algorithms for the solution of the Helmholtz and eikonal equations (the two associated PDEs), and a parallel software framework for applying these algorithms for the joint inversion using a Gauss Newton algorithm.

In the second part of the talk, I will consider the estimation of large scale sparse inverse covariance matrices of multivariate Gaussian distribution. Such matrices are often used to characterize and analyze data measurements in fields that range from machine learning, signal processing and computational biology. To estimate these matrices, an l1 regularized log-determinant optimization problem needs to be solved. I will present a block-coordinate descent algorithm that can efficiently solve this problem at large scales with low memory footprint, and a multilevel acceleration framework that is suitable for general sparse optimization problems. These algorithms can be used as a tool for enriching inverse problems by "learning" appropriate prior information, adopting an empirical Bayesian framework.

Line defects appear in the microscopic structure of crystalline materials (e.g. metals) as well as liquid crystals, the latter an intermediate phase of matter between liquids and solids. Mathematically, their study is challenging since they correspond to topological singularities that result in blow-up of total energies of finite bodies when utilizing most commonly used classical models of energy density; as a consequence, formulating nnonlinear dynamical models (especially pde) for the representation and motion of such defects is a challenge as well. I will discuss the development and implications of a single pde model intended to describe equilibrium states and dynamics of these defects. The model alleviates the nasty singularities mentioned above and it will also be shown that incorporating a conservation law for the topological charge of line defects allows for the correct prediction of some important features of defect dynamics that would not be possible just with the knowledge of an energy function.

This is joint work with Chiqun Zhang, Dmitry Golovaty, and Noel Walkington.

The class P attempts to capture the efficiently solvable computational tasks. It is full of practically relevant problems, with varied and fascinating combinatorial structure.

In this talk, I will give an overview of a rapidly growing body of work that seeks a better understanding of the structure within P. Inspired by NP-hardness, the main tool in this approach are combinatorial reductions. Combining these reductions with a small set of plausible conjectures, we obtain tight lower bounds on the time complexity of many of the most important problems in P.

I will present the most recent landscape of P and the conjectures on which this project is based on (e.g. the Strong Exponential Time Hypothesis). I will discuss recent attempts on identifying new conjectures: either more reliable ones, or ones that will get us closer to a full classification of the important problems in P.

Finally, I will highlight a surprising new reduction from Circuit-SAT to natural problems in P like Edit-Distance which proves that minor improvements over the quadratic running time of Edit-Distance are enough to prove major complexity separations.

In many practical parameter estimation problems, such as medical experiments and cognitive radio communications, parameter selection is performed prior to estimation. The selection process has a major impact on subsequent estimation by introducing a selection bias and creating coupling between decoupled parameters. As a result, classical estimation theory may be inappropriate and inaccurate and a new methodology is needed. In this study, the problem of estimating a preselected unknown deterministic parameter, chosen from a parameter set based on a predetermined data-based selection rule, \Psi, is considered. In this talk, I present a general non-Bayesian estimation theory for estimation after parameter selection, includes estimation methods, performance analysis, and adaptive sampling strategies. First, I use the post-selection mean-square-error (PSMSE) criterion as a performance measure instead of the commonly used mean-square-error (MSE). The corresponding Cramér-Rao-type bound on the PSMSE of any \Psi-unbiased estimator is derived, where the \Psi -unbiasedness is in the Lehmann-unbiasedness sense. The post-selection maximum-likelihood (PSML) estimator is presented and its \Psi–efficiency properties are demonstrated. Practical implementations of the PSML estimator are proposed as well. Finally, I discuss the concept of adaptive sampling in a two-sampling stages scheme of selection and estimation.

Transmission rates in broadband optical waveguide systems are enhanced by launching many pulse sequences through the same waveguide. Since pulses from different sequences propagate with different group velocities, intersequence pulse collisions are frequent, and can lead to severe transmission degradation. On the other hand, the energy exchange in pulse collisions can be beneficially used for controlling the transmission.

In this work we show that collision-induced amplitude dynamics of soliton sequences of N perturbed coupled nonlinear Schrödinger (NLS) equations can be described by N-dimensional Lotka-Volterra (LV) models, where the model's form depends on the perturbation. To derive the LV models, we first carry out single-collision analysis, which is based on the method of eigenmode expansion with the eigenmodes of the linear operator describing small perturbations about the fundamental NLS soliton. We use stability and bifurcation analysis for the equilibrium points of the LV models to develop methods for achieving robust transmission stabilization and switching that work well for a variety of waveguides. Further enhancement of transmission stability is obtained in waveguides with a narrowband Ginzburg-Landau gain-loss profile. We also discuss the possibility to use the relation between NLS and LV models to realize transition to spatio-temporal chaos with NLS solitons.

The Max-Cut problem seeks to determine the maximal cut size in a given graph. With no polynomial-time efficient approximation for Max-Cut (unless *P=NP*), its asymptotic for a typical large sparse graph is of considerable interest. We prove that for uniformly random d-regular graph of N vertices, and for the uniformly chosen Erdos-Renyi graph of *M=Nd/2* edges, the leading correction to *M/2 *(the typical cut size), is *P∗sqrt(NM/2)*. Here *P∗* is the ground state energy of the Sherrington-Kirkpatrick model, expressed analytically via Parisi's formula.

This talk is based on a joint work with Subhabrata Sen and Andrea Montanari.

The goal of my talk (based on joint work with Dima Grigoriev, Anatol Kirillov, and Gleb Koshevoy) is to generalize the celebrated Robinson-Schensted-Knuth (RSK) bijection between the set of matrices with nonnegative integer entries, and the set of the planar partitions.

Namely, for any pair of injective valuations on an integral domain we construct a canonical bijection K, which we call the generalized RSK, between the images of the valuations, i.e., between certain ordered abelian monoids.

Given a semisimple or Kac-Moody group, for each reduced word ii=(i_1,...,i_m) for a Weyl group element we produce a pair of injective valuations on C[x_1,...,x_m] and argue that the corresponding bijection K=K_ii, which maps the lattice points of the positive octant onto the lattice points of a convex polyhedral cone in R^m, is the most natural generalization of the classical RSK and, moreover, K_ii can be viewed as a bijection between Lusztig and Kashiwara parametrizations of the dual canonical basis in the corresponding quantum Schubert cell.

Generalized RSKs are abundant in "nature", for instance, any pair of polynomial maps phi,psi:C^m-->C^m with dense images determines a pair of injective valuations on C[x_1,...,x_n] and thus defines a generalized RSK bijection K_{phi,psi} between two sub-monoids of Z_+^m.

When phi and psi are birational isomorphisms, we expect that K_{phi,psi} has a geometric "mirror image", i.e., that there is a rational function f on C^m whose poles complement the image of phi and psi so that the tropicalization of the composition psi^{-1}phi along f equals to K_{phi,psi}. We refer to such a geometric data as a (generalized) geometric RSK, and view f as a "super-potential". This fully applies to each ii-RSK situation, and we find a super-potential f=f_ii which helps to compute K_ii.

While each K_ii has a "crystal" flavor, its geometric (and mirror) counterpart f_ii emerges from the cluster twist of the relevant double Bruhat cell studied by Andrei Zelevinsky, David Kazhdan, and myself.

We consider the fundamental problem of prediction with expert advice where the experts are "optimizable": there is a black-box optimization oracle that can be used to compute, in constant time, the leading expert in retrospect at any point in time. In this setting, we give a novel online algorithm that attains vanishing regret with respect to $N$ experts in total $\sqrt{N}$ computation time. We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is linear in $N$. These results demonstrate an exponential gap between the power of optimization in online learning and its power in statistical learning: in the latter, an optimization oracle---i.e., an efficient empirical risk minimizer---allows to learn a finite hypothesis class of size $N$ in time $\log{N}$.

We also study the implications of our results to learning in repeated zero-sum games, in a setting where the players have access to oracles that compute, in constant time, their best-response to any mixed strategy of their opponent. We show that the runtime required for approximating the minimax value of the game in this setting is $\sqrt{N}$, yielding again a quadratic improvement upon the oracle-free setting, where linear time in $N$ is known to be tight.

Periodically driven systems are of immense interest in plasma physics both from the point of view of plasma confinement as well as plasma heating.

One of the models to explain plasma heating in capacitive RF discharges is Fermi acceleration, which consists of a particle moving in a dynamical billiard with oscillating boundaries. It is well known that the energy growth rate of an ensemble of particles in a strongly chaotic billiard with moving walls is quadratic-in-time whereas it can be exponential-in-time in billiards with multiple ergodic components. Since a real plasma device allows for an exchange of particles with the surroundings, we have now studied Fermi accelerators with a hole (small enough so as not to disturb the statistics). We find that energy gain is significantly higher in a leaky Fermi accelerators with multiple ergodic components and it can be further increased by shrinking the hole size. In the ergodic case, energy gain is found to be independent of the hole size. Work done jointly with V. Gelfreich, V. Rom-Kedar and D. Turaev [Physical Review E 91, 062920 (2015)].

Paul trap is a device used to confine electrons by using time-periodic spatially non-uniform electric fields and a Nobel Prize as awarded for its discovery in 1989. The time-averaged distribution function of plasma in such devices is usually modelled using the concept of an effective potential (ponderomotive theory). For a specific example of the electric field used in Paul traps, we had shown earlier that the exact solutions of the Vlasov equation (collisionless Boltzmann equation) do not agree with solutions obtained by the effective potential approach. Now we have been able to obtain a perturbative solution of the Vlasov equation for a much more general case and find the same discrepancy with conventional theory. These perturbative solutions represent a non-equilibrium steady state and further work needs to be done to understand their statistical evolution. Work done jointly with B. Srinivasan [arXiv:1510.03974].

We study the two-party communication complexity of the geometric problem of finding an approximate Brouwer fixed-point of a composition of two Lipschitz functions g*f, where Alice knows f and Bob knows g.

We prove an essentially tight communication lower bound on this problem, using a novel adaptation of the Raz-McKenzie simulation theorem into geometric settings.

We show that a slightly stronger version of this communication problem would imply an (essentially) tight communication lower bounds on the problem of finding an approximate Nash equilibrium in 2-player (and n-player) games, where each player initially knows only his own payoff matrix.

Joint work with Tim Roughgarden.

We consider random polynomials of degree n whose coefficients are i.i.d. distributed over a finite set of integers, with probability at most 1/2 to take any particular value. We show that the probability that such a polynomial of degree n has a double root is dominated by the probability that 0,1 or -1 are double roots up to an error of o(n^{−2}). Our result generalizes a similar result of Peled, Sen and Zeitouni for Littlewood polynomials.

Joint work with Ron Peled and Arnab Sen.

"Circular inference" is a pejorative coined for methods in which a hypothesis is selected after looking at the data, but the inferential procedures treat it as if it was determined in advance. Unfortunately, many throughput screening experiments in genomics or neuroimaging seek to do exactly this: identify regions (bumps) of high signal in the data **and** evaluate these found regions using the same data. Simple estimators that ignore the selection will be biased; when the data is non-stationary, this bias can vary dramatically between different regions. Nevertheless, methods for evaluating and comparing selected regions are crucial, because typically only a handful of regions can be further explored in tailored follow up studies.

In this talk I describe a new conditional inference approach for characterizing these found regions by estimating their population parameters. Our method explicitly models the selection procedure, and simulates from the conditional distribution to estimate the underlying parameters. Efficient strategies for providing p-value, estimators and intervals will be discussed, as well as power versus accuracy tradeoffs. I will demonstrate the new method for estimating bumps in a comparison of DNA-methylation patterns across tissue type.

This is joint work with Jonathan Taylor and Rafael Irizarry.

In this talk I will consider the problem of local analytic classification of powers of volume forms on manifolds with boundary, i.e. of ordinary volume forms multiplied by the (complex in general) power of a function f, under the action of the group of diffeomorpshims preserving both the boundary and the hypersurface defined by the zero locus of f. In the case where this function defines an isolated boundary singularity in the sense of Arnol'd, I will show how to obtain local normal forms and moduli theorems, analogous to those obtained by Arnol'd, Varchenko, Lando and others for the ordinary, without boundary case. Moreover I will show how these moduli are related to (in fact obtained by) the topological and analytic (Hodge theoretic) invariants of the boundary singularity, such as the relative Picard-Lefschetz monodromy, the relative Brieskorn lattices with their relative Gauss-Manin connection, the relative spectrum and so on, all objects generalising, in the presence of a boundary, the corresponding well known objects already defined for isolated hypersurface singularities.

Online d-dimensional vector packing models many settings such as minimizing resources in data centers where jobs have multiple resource requirements (CPU, Memory, etc.). However, no online d-dimensional vector packing algorithm can achieve a competitive ratio better than d. Fortunately, in many natural applications, vectors are relatively small, and thus the lower bound does not hold. For sufficiently small vectors, an O(log d)-competitive algorithm was known. We improve this to a constant competitive ratio, arbitrarily close to e (where e is the base of the natural logarithm), given that vectors are sufficiently small.

We give improved results for the two dimensional case. For arbitrarily small vectors, the First Fit algorithm for two dimensional vector packing is no better than 2-competitive. We present a natural family of First Fit variants, and for optimized parameters get a competitive ratio of approximately 1.48 for sufficiently small vectors.

We improve upon the 1.48 competitive ratio - not via a First Fit variant - and give a competitive ratio arbitrarily close to 4/3 for packing small, two dimensional vectors. We show that no algorithm can achieve better than a 4/3 competitive ratio for two dimensional vectors, even if one allows the algorithm to split vectors among arbitrarily many bins.

It has long been conjectured that hypothesis spaces suitable for data that is compositional in nature, such as text or images, may be more efficiently represented with deep hierarchical architectures than with shallow ones. Despite the vast empirical evidence, formal arguments to date are limited and do not capture the kind of networks used in practice. Using tensor factorization, we derive a universal hypothesis space implemented by an arithmetic circuit over functions applied to local data structures (e.g. image patches). The resulting networks first pass the input through a representation layer, and then proceed with a sequence of layers comprising sum followed by product-pooling, where sum corresponds to the widely used convolution operator. The hierarchical structure of networks is born from factorizations of tensors based on the linear weights of the arithmetic circuits. We show that a shallow network corresponds to a rank-1 decomposition, whereas a deep network corresponds to a Hierarchical Tucker (HT) decomposition. Log-space computation for numerical stability transforms the networks into SimNets.

In its basic form, our main theoretical result shows that the set of polynomially sized rank-1 decomposable tensors has measure zero in the parameter space of polynomially sized HT decomposable tensors. In deep learning terminology, this amounts to saying that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require an exponential size if one wishes to implement (or approximate) them with a shallow network. Our construction and theory shed new light on various practices and ideas employed by the deep learning community, and in that sense bear a paradigmatic contribution as well.

Joint work with Or Sharir and Amnon Shashua.

The notion of a Lie conformal algebra (LCA) comes from physics, and is related to the operator product expansion. An LCA is a module over a ring of differential operators with constant coefficients, and with a bracket which may be seen as a deformation of a Lie bracket. LCA are related to linearly compact differential Lie algebras via the so-called annihilation functor. Using this observation and the Cartan's classification of linearly compact simple Lie algebras, Bakalov, D'Andrea and Kac classified finite simple LCA in 2000.

I will define the notion of LCA over a ring R of differential operators with not necessarily constant coefficients, extending the known one for R=K[x]. I will explain why it is natural to study such an object and will suggest an approach for the classification of finite simple LCA over arbitrary differential fields.

Much of the theory of mathematical programs for combinatorial optimization can be described in the following way: A polytope of interest has exponentially many (in the dimension) facets, but can be written as the linear projection of a simpler convex body in a higher-dimensional space. Simple might mean a polytope with a much smaller number of facets, or a spectrahedron (the intersection of an affine subspace with the PSD cone) of small dimension. This allows one to optimize linear functionals over the base polytope by instead optimizing a lifted functional over the lifted body.

Unless P=NP, one does not expect certain polytopes--like the convex hull of indicators of traveling salesman tours in a graph--to have a small lift. But it remained open to prove any non-trivial lower bound on the necessary dimension for a spectrahedral lift, i.e. to prove that semi-definite programs do not yield efficient optimization procedures over these polytopes.

We show that the cut, TSP, and stable set polytopes on n-vertex graphs are not the linear image of a spectrahedron of dimension less than exp(n^c) for some constant c > 0. In the process, many interesting phenomena emerge: Factorization of operators through the PSD cone, quantum information theory, discrete Fourier analysis, and real algebraic geometry.

This is based joint work with Prasad Ragahvendra and David Steurer.

It is well-known that Hecke algebras H_q(W) do not have interesting Hopf algebra structures because, first, the only available one would emerge only via an extremely complicated isomorphism with the group algebra of W and, second, this would make H_q(W) into yet another cocommutative Hopf algebra.

The goal of my talk (based on joint work with D. Kazhdan) is to extend each Hecke algebra H_q(W) to a non-cocommutative Hopf algebra (we call it Hecke-Hopf algebra of W) that contains H_q(W) as a coideal.

Our Hecke-Hopf algebras have a number of applications: they generalize Bernstein presentation of Hecke algebras, provide new solutions of quantum Yang-Baxter equation and a large category of endo-functors of H_q(W)-Mod, and suggest further generalizations of Hecke algebras.

In view of the recent huge interest in image classification and object recognition problems and the spectacular success of deep learning and random forests in solving these tasks, it seems astonishing that much more modest efforts are being invested into related, and often more difficult, problems of image and multimodal content-based retrieval, and, more generally, similarity assessment in large-scale databases. These problems, arising as primitives in many computer vision tasks, are becoming increasingly important in the era of exponentially increasing information. Semantic and similarity-preserving hashing methods have recently received considerable attention to address such a need, in part due to their significant memory and computational advantage over other representations.

In this talk, I will overview some of my recent attempts to construct efficient semantic hashing schemes based on deep neural networks and random forests.

Based on joint works with Qiang Qiu, Guillermo Sapiro, Michael Bronstein, and Jonathan Masci.

I will present a 'categorical' way of doing analytic geometry in which analytic geometry is seen as a precise analogue of algebraic geometry. Our approach works for both complex analytic geometry and p-adic analytic geometry in a uniform way. I will focus on the idea of an 'open set' as used in these various areas of math and how it is characterised categorically. In order to do this, we need to study algebras and their modules in the category of Banach spaces. The categorical characterization that we need uses homological algebra in these 'quasi-abelian' categories which is work of Schneiders and Prosmans. In fact, we work with the larger category of Ind-Banach spaces for reasons I will explain. This gives us a way to establish foundations of analytic geometry and to compare with the standard notions such as the theory of affinoid algebras, Grosse-Klonne's theory of dagger algebras (over-convergent functions), the theory of Stein domains and others. I will explain how this extends to a formulation of derived analytic geometry following the relative algebraic geometry approach of Toen, Vaquie and Vezzosi.

This is joint work with Federico Bambozzi (Regensburg) and Kobi Kremnizer (Oxford).

An Oblivious RAM (ORAM), introduced by Goldreich and Ostrovsky (JACM 1996), is a (probabilistic) RAM that hides its access pattern, i.e. for every input the observed locations accessed are similarly distributed. Great progress has been made in recent years in minimizing the overhead of ORAM constructions, with the goal of obtaining the smallest overhead possible.

We revisit the lower bound on the overhead required to obliviously simulate programs, due to Goldreich and Ostrovsky. While the lower bound is fairly general, including the offline case, when the simulator is given the reads and writes ahead of time, it does assume that the simulator behaves in a "balls and bins" fashion. That is, the simulator must act by shuffling data items around, and is not allowed to have sophisticated encoding of the data.

We prove that for the OFFLINE case, showing a lower bound without the above restriction is related to the size of the circuits for sorting. Our proof is constructive, and uses a bit-slicing approach which manipulates the bit representations of data in the simulation. This implies that without obtaining yet unknown superlinear lower bounds on the size of such circuits, we cannot hope to get lower bounds on offline (unrestricted) ORAMs.

What is it that enables learning with multi-layer networks? What causes the network to generalize well? What makes it possible to optimize the error, despite the problem being hard in the worst case? In this talk I will attempt to address these questions and relate between them, highlighting the important role of optimization in deep learning. I will then use the insight to suggest studying novel optimization methods, and will present Path-SGD, a novel optimization approach for multi-layer RELU networks that yields better optimization and better generalization.

Joint work with Behnam Neyshabur, Ryota Tomioka and Russ Salakhutdinov.

We study properties of affine algebras with small Gel'fand-Kirillov dimension, from the points of view of the prime spectrum, gradations and radical theory.

As an application, we are able to prove that Z-graded algebras with quadratic growth, and graded domains with cubic growth have finite (and efficiently bounded) classical Krull dimension; this is motivated by Artin's conjectured geometric classification of non-commutative projective surfaces, and by opposite examples in the non-graded case.

As another application, we prove a graded version of a dichotomy question raised by Braun and Small, between primitive algebras (namely, algebras admitting faithful irreducible representations) and algebras satisfying polynomial identities.

If time permits, we discuss approximations of the well-studied Koethe problem and in particular prove a stability result for certain radicals under suitable growth conditions.

We finally propose further questions and possible directions, which already stimulated new constructions of monomial algebras.

This talk is partially based on a joint work with A. Leroy, A. Smoktunowicz and M. Ziembowski.

The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence that for a given text and pattern, one cannot compute the exact Hamming distance for all locations in the text in time which is polynomially less than o(n\sqrt m). Nevertheless, Karloff showed that if one is willing to suffer a 1+-\epsilon approximation, then it is possible to solve the problem with high probability in O~(n / \epsilon^2) time.

Due to related lower bounds for computing the Hamming distance of two strings in the one-way communication complexity model, it is strongly believed that obtaining an algorithm for solving the approximation version cannot be done much faster as a function of 1 / \epsilon. We will show that this belief is false by introducing a new O~(n / \epsilon) time algorithm that succeeds with high probability.

The main idea behind our algorithm, which is common in sparse recovery problems, is to reduce the variance of a specific randomized experiment by (approximately) separating heavy hitters from non-heavy hitters. However, while known sparse recovery techniques work very well on vectors, they do not seem to apply here, where we are dealing with mismatches between pairs of characters. We introduce two main algorithmic ingredients. The first is a new sparse recovery method that applies for pair inputs (such as in our setting). The second is a new construction of hash/projection functions, which allows to count the number of projections that induce mismatches between two characters exponentially faster than brute force. We expect that these algorithmic techniques will be of independent interest.

We show a 2^{n+o(n)}-time algorithm for the Shortest Vector Problem on n-dimensional lattices (improving on the previous best-known algorithm of Micciancio and Voulgaris, which runs in time 4^{n+o(n)}). The algorithm uses the elementary yet powerful observation that, by properly combining samples from a Gaussian distribution over the lattice, we can produce exact samples from a narrower Gaussian distribution on the lattice. We use such a procedure repeatedly to obtain samples from an arbitrarily narrow Gaussian distribution over the lattice, allowing us to find a shortest vector.

Both the algorithm and the analysis are quite simple in hindsight. (The main technical tool is an identity on Gaussian measures with a simple geometric proof originally due to Riemann.) If time allows and interest merits, we will discuss a more technical variant of this algorithm that solves the Closest Vector Problem (a seemingly harder problem) in the same asymptotic running time.

Based on joint work with Divesh Aggarwal, Daniel Dadush, and Oded Regev. (See http://arxiv.org/abs/1412.7994 and http://arxiv.org/abs/1504.01995.)

The Catalan numbers form a sequence of integers C_t. A collection of sets H_t with |H_t|= C_t for all t is called a Catalan set. Many examples of Catalan sets are known; the triangulations of the (t+2)-gon, the Dyck paths from (0,0) to (0, 2t) and the nilpotent ideals in the Borel subalgebra of sl_t to name but a few. In my talk I will present a new example of a Catalan set, which has a remarkable property: for all t, H_t decomposes into a (non-disjoint) union of C_{t-1} distinct subsets each of cardinality 2^{t-1}. Moreover, one may define certain interesting labelled graphs for H_t and obtain the above decomposition in a natural way. The subgraphs corresponding to the aforementioned subsets are labelled hypercubes with some edges missing. The motivation of this work was the study of the additive structure of the Kashiwara crystal B(infty).

We say that a system possesses a mixed dynamics if

1) it has infinitely many hyperbolic periodic orbits of all possible types (stable, unstable,saddle) and

2) the closures of the sets of orbits of different types have nonempty intersections.

Recall that Newhouse regions are open domains (from the space of smooth dynamical systems) in which systems with homoclinic tangencies are dense. Newhouse regions in which systems with mixed dynamics are generic (compose residual subsets) are called *absolute Newhouse regions* or *Newhouse regions with mixed dynamics*. Their existence was proved in the paper [1] for the case of 2d diffeomorphisms close to a diffeomorphism with a nontransversal heteroclinic cycle containing two fixed (periodic) points with the Jacobians less and greater than 1. Fundamentally, that "mixed dynamics" is the universal property of reversible chaotic systems. Moreover, in this case generic systems from absolute Newhouse regions have infinitely many stable, unstable, saddle and symmetric elliptic periodic orbits [2,3].

As well-known, reversible systems are often met in applications and they can demonstrate a chaotic orbit behavior. However, the phenomenon of mixed dynamics means that this chaos can not be associated with "strange attractor" or "conservative chaos". Attractors and repellers have here a nonempty intersection containing symmetric orbits (elliptic and saddle ones) but do not coincide, since periodic sinks (sources) do not belong to the repeller (attractor). Therefore, " mixed dynamics" should be considered as a new form of dynamical chaos posed between "strange attractor" and "conservative chaos".

These and related questions are discussed in the talk. Moreover, the main attention here is paid to the development of the concept of mixed dynamics for two-dimensional reversible maps. The main elements of this concept are presented in section below.

[1] S.V. Gonchenko, L.P. Shilnikov, D.V. Turaev. On Newhouse regions of two-dimensional diffeomorphisms close to a diffeomorphism with a nontransversal heteroclinic cycle. Proc. Steklov Inst. Math., 216 (1997), 70-118.

[2] Lamb J.S.W. and Stenkin O.V. Newhouse regions for reversible systems with infinitely many stable, unstable and elliptic periodic orbits Nonlinearity, 2004, 17(4), 1217-1244.

[3] Delshams A., Gonchenko S.V., Gonchenko V.S., Lazaro J.T. and Sten'kin O.V. "Abundance of attracting, repelling and elliptic orbits in two-dimensional reversible maps".- Nonlinearity, 2013, v.26(1), 1-35.

In this talk, we study Demazure modules which occur in a level l irreducible integrable representation of an untwisted affine Lie algebra. We also assume that they are stable under the action of the standard maximal parabolic subalgebra of the affine Lie algebra. We prove that such a module is isomorphic to the fusion product of "prime" Demazure modules, where the prime factors are indexed by dominant integral weights which are either a multiple of l or take value less than l on all simple coroots. Our proof depends on a technical result which we prove in all the classical cases and G_2. We do not need any assumption on the underlying simple Lie algebra when the last "prime" factor is too small. This is joint work with Vyjayanthi Chari, Peri Shereen and Jeffrey Wand.

The Hamming and the edit metrics are two common notions of measuring distances between pairs of strings x,y lying in the Boolean hypercube. The edit distance between x and y is defined as the minimum number of character insertion, deletion, and bit flips needed for converting x into y. Whereas, the Hamming distance between x and y is the number of bit flips needed for converting x to y.

In this paper we study a randomized injective embedding of the edit distance into the Hamming distance with a small distortion. This question was studied by Jowhari (ESA 2012) and is mainly motivated by two questions in communication complexity: the document exchange problem and deciding edit distance using a sketching protocol.

We show a randomized embedding with quadratic distortion. Namely, for any $x,y$ satisfying that their edit distance equals $k$, the Hamming distance between the embedding of $x$ and $y$ is $O(k^2)$ with high probability. This improves over the distortion ratio of $O(\log n \log^* n)$ obtained by Jowhari for small values of $k$. Moreover, the embedding output size is linear in the input size and the embedding can be computed using a single pass over the input.

19th century mathematicians (Gauss, Riemann, Markov, to name a few) spent a lot of their time doing tedious numerical computations. Sometimes they were assisted by (human) computers, but they still did a lot themselves. All this became unnecessary with the advent of computers, who made number-crunching million times faster (and more reliable).

20th- and 21st- century mathematicians spent (and still spend) a lot of their time doing tedious symbolic computations. Thanks to the more recent advent of Computer Algebra Systems (e.g. Maple, Mathematica, and the free system SAGE), much of their labor can be delegated to computers, who, of course, can go much faster, much further, and more reliably.

But humans are still needed! First, to teach the computer how to crunch symbols efficiently, but, just as importantly, to inspire them to formulate general conjectures, and methods of proof, for which humans are (still) crucial. I will mention several examples, most notably, a recent proof, by (the human) Guillaume Chapuy, of a conjecture made with the help of my computer Shalosh B. Ekhad (who rigorously proved many special cases), generalizing, to multi-permutations, Amitai Regev's celebrated asymptotic formula for the number of permutations of length n avoiding an increasing subsequence of length d.

Cancer is a highly complex aberrant cellular state where mutations impact a multitude of signalling pathways operating in different cell types. In recent years it has become apparent that in order to understand and fight cancer, it must be viewed as a system, rather than as a set of cellular activities. This mind shift calls for new techniques that will allow us to investigate cancer as a holistic system. In this talk, I will discuss some of the progress made towards achieving such a system-level understanding using computer modelling of biological behaviours, also known as Executable Biology. I will concentrate on our recent attempts to better understand cancer through the following examples: 1) drug target optimization for Chronic Myeloid Leukaemia using an innovative platform called BioModelAnalyzer, which allows to prove stabilization of biological systems; 2) dynamic hybrid modelling of Glioblastoma (brain tumour) development; 3) state-based modelling of cancer signalling pathways and their analysis using model-checking; and 4) synthesis of blood stem cell programs from single-cell gene expression data. Looking forward, inspired by David Harel’s Grand Challenge proposed a decade ago, I will propose a smaller grand challenge for computing and biology that could shed new light on our ability to control cell fates during development and disease and potentially change the way we treat cancer in the future.

The voter model on $\Z^d$ is a particle system that serves as a rough model for changes of opinions among social agents or, alternatively, competition between biological species occupying space. When $d \geq 3$, the set of (extremal) stationary distributions is a family of measures $\mu_\alpha$, for $\alpha$ between 0 and 1. A configuration sampled from $\mu_\alpha$ is a strongly correlated field of 0's and 1's on $\Z^d$ in which the density of 1's is $\alpha$.

We consider such a configuration as a site percolation model on $\Z^d$. We prove that if $d \geq 5$, the probability of existence of an infinite percolation cluster of 1's exhibits a phase transition in $\alpha$. If the voter model is allowed to have sufficiently spread-out interactions, we prove the same result for $d \geq 3$.

These results partially settle a conjecture of Bricmont, Lebowitz and Maes (1987).

Joint work with Daniel Valesin (University of Groningen)

We consider two stationary versions of the Eden model, on the upper half planar lattice, resulting in an infinite forest covering the half plane. Under weak assumptions on the weight distribution and by relying on ergodic theorems, we prove that almost surely all trees are finite. Using the mass transport principle, we generalize the result to Eden model in graphs of the form $G \times Z$, where G is a Cayley graph. This generalizes certain known results on the two-type Richardson model, in particular of Deijfen and Häggström in 2007.

Several years ago I introduced Newton polytopes related to the potential counterexamples to the JC. This approach permitted to obtain some additional information which though interesting, was not sufficient to get a contradiction. It seems that a contradiction can be obtained by comparing Newton polytopes for the left and right side of a (somewhat mysterious) equality G_x=-y_F.

In this talk, I will discuss the backward-in-time behaviors of several nonlinear parabolic and dissipative evolution equations. This study is motivated by the investigation of the Bardos-Tartar conjecture on the 2D Navier-Stokes equations. Besides the rigorous mathematical treatment, we provide physical interpretation of the mechanism of singularity formulation, backward in time, for perturbations of the KdV equation. Finally, I will present the connection between the backward behavior and the energy spectra of the solutions.

This is a joint work with E. S. Titi.

In this talk we will implement the notion of finite number of determining parameters for the long-time dynamics of the Navier-Stokes equations (NSE), such as determining modes, nodes, volume elements, and other determining interpolants, to design finite-dimensional feedback control for stabilizing their solutions. The same approach is found to be applicable for data assimilation of weather prediction. In addition, we will show that the long-time dynamics of the NSE can be imbedded in an infinite-dimensional dynamical system that is induced by an ordinary differential equations, named *determining form*, which is governed by a globally Lipschitz vector field. The NSE are used as an illustrative example, and all the above mentioned results equally hold to other dissipative evolution PDEs.

This is a joint work with A. Azouani, H. Bessaih, A. Farhat, C. Foias, M. Jolly, R. Kravchenko, E. Lunasin and E. Olson.

Let M be a closed Riemannian manifold. There are numerous results that establish the existence of various minimal objects on M, such as periodic geodesics, minimal surfaces, or geodesic nets. We will present some effective versions of these existence theorems.

For example, we will present diameter upper bounds for the lengths of three shortest simple periodic geodesics on a Riemannian 2-sphere, which can be interpreted as an effective version of the existence theorem of Lusternik and Schnirelmann. (Joint with Y. Liokumovich and A. Nabutovsky).

Finding upper bounds for the size of smallest stationary objects is closely related with construction of "optimal" homotopies. We will show that if M is a closed surface of diameter d (with or without boundary), then any simple closed curve on M that can be contracted to a point over free loops of length less than L, can be contracted over based loops of length at most 3L+2d. (Joint with G. Chambers).

Assortment planning is a major operational issue that arises in many industries, such as retailing, airlines and consumer electronics. Given a set of products that are differentiated by price, quality and possibly other attributes, one has to decide on the subset of products and the respective quantities that will be stocked and offered to heterogeneous customers, who exhibit substitution behavior.

The general problem can be shown to be NP-hard to approximate better than a factor linear in the number of products. In this talk we discuss how for a range of practically interesting special cases, one could design conceptually simple policies that admit provably near-optimal solutions. The analysis reveals interesting structural properties, including hidden submodularity and decomposition properties.

The talk is based on several papers which are Joint work with Ali Aouad, Vineet Goyal and Danny Segev

We will talk on the validity of the mean ergodic theorem along left Følner sequences in a countable amenable group G. Although the weak ergodic theorem always holds along any left Følner sequence in G, we will provide examples where the mean ergodic theorem fails in quite dramatic ways. On the other hand, if G does not admit any ICC quotients, e.g. if G is virtually nilpotent, then we will prove that the mean ergodic theorem does indeed hold along any left Følner sequence. Based on the joint work with M. Björklund (Chalmers).

We discuss the collective dynamics of systems driven by the “social engagement” of agents with their local neighbors. Canonical models are based on environmental averaging, with prototype examples in opinion dynamics, flocking, self-organization of biological organisms, and rendezvous in mobile networks. The large time behavior of such systems leads to the formation of clusters, and in particular, the emergence of “consensus of opinions”.

We propose an alternative paradigm, arguing that in many relevant scenarios social interactions involve the tendency of agents “to move ahead”. We introduce a new family of models for collective dynamics with tendency. The large time behavior of these new systems leads to the emergence of “leaders”.

The theory of structure and pseudo-randomness has been very influential in several areas of mathematics, such as number theory, graph theory and harmonic analysis. It is also been influential in theoretical computer science, with applications in complexity theory, cryptography and property testing. At a high level, it allows to analyze arbitrary objects by decomposing them to a "structural" component and a "pseudo-random" component. The pseudo-random component behaves in many ways like random noise, while the structural component has a concise representation which makes it amenable to analysis and algorithmic manipulation.

In this talk, I will describe applications of this paradigm to coding theory. I will describe a new general approach to list decoding, which follows by decomposing an arbitrary received word to a structural received word and pseudo-random noise. This allows for a simplified analysis of the list decoding problem. In particular, I will describe how this approach leads to a resolution of a conjecture by Gopalan, Klivans and Zuckerman [STOC 2008], that the list decoding radius of Reed-Muller codes (in certain regimes) is equal to the minimal distance of the code.

Based on joint work with Abhishek Bhowmick.

Conventional cameras record all light falling onto their sensor regardless of the path that light followed to get there. In this talk I will present an emerging family of video cameras that can be programmed to record just a fraction of the light coming from a controllable source, based on the actual 3D path followed. Live video from these cameras offers a very unconventional view of our everyday world in which refraction and scattering can be selectively blocked or enhanced, visual structures too subtle to notice with the naked eye can become apparent, and object appearance can depend on depth.

I will discuss the unique optical properties and power efficiency of these "transport-aware" cameras, as well as their use for 3D shape acquisition, robust time-of-flight imaging, material analysis, and scene understanding. Last but not least, I will discuss their potential to become our field's "outdoor Kinect" sensor---able to operate robustly even in direct sunlight with very low power.

Kyros Kutulakos is a Professor of Computer Science at the University of Toronto. He received his PhD degree from the University of Wisconsin-Madison in 1994 and his BS degree from the University of Crete in 1988, both in Computer Science. In addition to the University of Toronto, he has held appointments at the University of Rochester (1995-2001) and Microsoft Research Asia (2004-05 and 2011-12). He is the recipient of an Alfred P. Sloan Fellowship, an Ontario Premier's Research Excellence Award, a Marr Prize in 1999, a Marr Prize Honorable Mention in 2005, and three other paper awards (CVPR 1994, ECCV 2006, CVPR 2014). He also served as Program Co-Chair of CVPR 2003, ICCP 2010 and ICCV 2013.

There are at least 2 aspects to learning: predicting the outcome of unseen events, and finding simple explanations of observed systems. We shall discuss 2 formal abstractions of these aspects: PAC learning and sample compression schemes. We shall start with an introduction to these notions, and then discuss the equivalence between them.

Based on a joint project with Shay Moran, Amir Shpilka and Avi Wigderson.

Consider a random coloring of a bounded domain in *Z ^{d}* with the probability of each coloring

*F*proportional to

*exp(−β∗N(F))*, where

*β>0*is a parameter (representing the inverse temperature) and

*N(F)*is the number of nearest neighboring pairs colored by the same color. This is the anti-ferromagnetic 3-state Potts model of statistical physics, used to describe magnetic interactions in a spin system. The Kotecký conjecture is that in such a model, for

*d≥3*and high enough

*β*, a sampled coloring will typically exhibit long-range order, placing the same color at most of either the even or odd vertices of the domain. We give the first rigorous proof of this fact for large

*d*. This extends previous works of Peled and of Galvin, Kahn, Randall and Sorkin, who treated the case

*β=∞*.

The main ingredient in our proof is a new structure theorem for 3-colorings which characterizes the ways in which different "phases" may interact, putting special emphasis on the role of edges connecting vertices of the same color. We also discuss several related conjectures. No background in statistical physics will be assumed and all terms will be explained thoroughly.

Joint work with Ohad Feldheim.

We generalize the technique of smoothed analysis to distributed algorithms in dynamic network models. Whereas standard smoothed analysis studies the impact of small random perturbations of input values on algorithm performance metrics, dynamic graph smoothed analysis studies the impact of random perturbations of the underlying changing network graph topologies. Similar to the original application of smoothed analysis, our goal is to study whether known strong lower bounds in dynamic network models are robust or fragile: do they withstand small (random) perturbations, or do such deviations push the graphs far enough from a precise pathological instance to enable much better performance? Fragile lower bounds are likely not relevant for real-world deployment, while robust lower bounds represent a true difficulty caused by dynamic behavior. We apply this technique to three standard dynamic network problems with known strong worst-case lower bounds: random walks, flooding, and aggregation. We prove that these bounds provide a spectrum of robustness when subjected to smoothing---some are extremely fragile (random walks), some are moderately fragile / robust (flooding), and some are extremely robust (aggregation).

Joint work with Jeremy Fineman (Georgetown), Seth Gilbert (National University of Singapore), and Calvin Newport (Georgetown).

Many scientific and engineering problems are challenged by the fact they involve functions of a very large number of variables. Such problems arise naturally in signal recovery, image processing, learning theory, etc. In addition to the numerical difficulties due to the so-called curse of dimensionality, the resulting optimization problems are often nonsmooth and nonconvex.

We shall survey some of our recent results, illustrating how these difficulties may be handled in the context of well-structured optimization models, highlighting the ways in which problem structures and data information can be beneficially exploited to devise and analyze simple and efficient algorithms.

Consider a one-dimensional semi-infinite system of Brownian particles, starting at Poisson (L) point process on the positive half-line, with the left-most (Atlas) particle endowed a unit drift to the right. We show that for the equilibrium density (L=2), the asymptotic Gaussian space-time particle fluctuations are governed by the stochastic heat equation with Neumann boundary condition at zero. As a by product we resolve a conjecture of Pal and Pitman (2008) about the asympotic (random) fBM trajectory of the Atlas particle.

In a complementary work, we derive and explicitly solve the Stefan (free-boundary) equations for the limiting particle-profile when starting at out of equilibrium density (L other than 2). We thus determine the corresponding (non-random) asymptotic trajectory of the Atlas particle.

This talk is based on joint works with Li-Cheng Tsai, Manuel Cabezas, Andrey Sarantsev and Vladas Sidoravicius.

The Ising model is an archetypical model of the order-disorder phase transition: though simple to formulate, it exhibits a complex behavior, much like the real-world phenomena in solid-state physics, ferromagnetism, chemistry, biology, computer science.

In 1920 the physicist Wilhelm Lenz proposed to model ferromagnetic materials by a lattice of plus-minus spins with interacting neighbors. His student Ernst Ising solved the model in dimension one four years later. The one-dimensional behavior turned out to be trivial, with no phase transition when the interaction strength changes, and for a decade people searched for other possible models. However, a ferromagnetic phase transition was established by Rudolf Peierls in higher dimensions, and in 1944 Lars Onsager famously calculated the free energy for the Ising model in two dimensions.

Since then the Ising model became widely studied, as it exhibits complicated phase transition behavior, yet allows a very detailed analysis in dimension two. We will give a historical introduction to the model, and then describe some recent results.

Let *Q* be a Stein space and *L* a complex Lie group. Then Grauert's Oka Principle states that the canonical map of the isomorphism classes of holomorphic principle *L*-bundles over *Q* to the isomorphism classes of topological principle *L*-bundles over *Q* is an isomorphism. In particular he showed that if *P*, *P'* are holomorphic principle *L*-bundles and a topological isomorphism, then there is a homotopy of topological isomorphisms with and a holomorphic isomorphism.

Let *X* and *Y* be Stein *G*-manifolds where *G* is a reductive complex Lie group. Then there is a quotient Stein space Q_{X}, and a morphism such that . Similarly we have .

Suppose that is a *G*-biholomorphism. Then the induced mapping has the following property: for any , is *G*-isomorphic to (the fibers are actually affine *G*-varieties). We say that is admissible. Now given an admissible , assume that we have a *G*-equivariant homeomorphism lifting . Our goal is to establish an Oka principle, saying that has a deformation with and biholomorphic.

We establish this in two main cases. One case is where is a diffeomorphism that restricts to *G*-isomorphisms on the reduced fibers of and . The other case is where restricts to *G*-isomorphisms on the fibers and *X* satisfies an auxiliary condition, which usually holds. Finally, we give applications to the Holomorphic Linearization Problem. Let *G* act holomorphically on . When is there a change of coordinates such that the action of *G* becomes linear? We prove that this is true, for *X* satisfying the same auxiliary condition as before, if and only if the quotient Q_{X} is admissibly biholomorphic to the quotient of a *G*-module *V*.

The subject of this talk is the problem of estimating service time distribution of the $M/G/\infty$ queue from incomplete data on the queue. The goal is to estimate $G$ from observations of the queue--length process at the points of the regular grid on a fixed time interval. We propose an estimator and analyze its accuracy over a family of target service time distributions. The original $M/G/\infty$ problem is closely related to the problem of estimating derivatives of the covariance function of a stationary Gaussian process. We consider the latter problem and derive lower bounds on the minimax risk. The obtained results strongly suggest that the proposed estimator of the service time distribution is rate optimal.

Let *G* be a connected reductive algebraic group defined over a field *k* of characteristic not 2, an involution of *G* defined over *k*, *H* a *k*-open subgroup of the fixed point group of and G_{k} (resp. H_{k}) the set of *k*-rational points of *G* (resp. *H*). The homogeneous space X_{k}:=G_{k}/H_{k} is a generalization of a real reductive symmetric space to arbitrary fields and is called a generalized symmetric space.

Orbits of parabolic *k*-subgroups on these generalized symmetric spaces occur in various situations, but are especially of importance in the study of representations of G_{k} related to X_{k}. In this talk we present a number of structural results for these parabolic *k*-subgroups that are of importance for the study of these generalized symmetric space and their applications.

Consider a random finite metric space X given by sampling n points in the unit interval uniformly, and a deterministic finite metric space U given by placing n points in the unit interval at uniform distance. With high probability, X will contain some pairs of points at distance roughly 1/n^2, so any bijection from X to U must distort distances by a factor of roughly n. However, with high probability, two of these random spaces, X_1 and X_2, have a bijection which distorts distances by a factor of only about n^2/3. The exponent of 2/3 is optimal.

Given an input $x$, and a search problem $F$, local computation algorithms (LCAs) implement access to specified locations of $y$ in a legal output $y \in F(x)$, using polylogarithmic time and space. Previous work on LCAs restricted its focus to graphs of bounded degree, or degree of bounded expectation that is distributed binomially.

Using a new palette of techniques, we show that it is possible to obtain LCAs for maximal independent set (MIS) and maximal matching (MM) on trees with degree bounded by $\polylog{n}$. Our result immediately extends to all graphs with degree bounded by $\polylog{n}$, as long as they do not contain short cycles (of length $\polylog{n}$).

We define a family of graphs called $d$-light graphs, and show how to convert a large class of online algorithms (including MIS and MM) to LCAs on these graphs. We then show that applying the MIS (or MM) LCA on appropriately selected $d$-light subgraphs, allows us to quickly address all of the vertices of the $\polylog{n}$-degree graph.

In addition to expanding the range of LCAs to graphs of polylogarithmic degree, our new techniques also significantly improve the running times and space requirements, expand the family of graphs, and better define the family of algorithms to which previous results apply. Furthermore our proofs are simpler and more concise than the previous proof methods.

Joint work with Omer Reingold.

We associate to a full flag F in an n-dimensional variety X over a field k, a "symbol map" $\mu_F :K(F_X) \to \Sigma^n K(k)$. Here, F_X is the field of rational functions on X, and K(.) is the K-theory spectrum.

We prove a "reciprocity law" for these symbols: Given a partial flag, the sum of all symbols of full flags refining it is 0. Examining this result on the level of K-groups, we derive the following known reciprocity laws: the degree of a principal divisor is zero, the Weil reciprocity law, the residue theorem, the Contou-Carrère reciprocity law (when X is a smooth complete curve) as well as the Parshin reciprocity law and the higher residue reciprocity law (when

X is higher-dimensional).

This is a joint work with Evgeny Musicantov.

The interplay between **algorithms** and **proofs** has been one of the most fascinating and fruitful themes in theoretical Computer Science. In this talk I will describe a different connection between these two concepts - a general methodology for algorithm design using an automatic transformation of a proof into an algorithm. This methodology yields a systematic way to design and analyze algorithms across many domains. In particular we and others have used it for problems in combinatorial optimization, machine learning, and quantum information theory, and it shows promise for making progress on important open questions such as settling Khot's Unique Games Conjecture. I will demonstrate this methodology by presenting an algorithm for the Sparse Coding / Dictionary Learning problem that handles much more general inputs than was known before, and an algorithm for the Unique Games problem that can solve all the previously known candidate hard instances. Key to our approach is the notion of "pseudo-distributions", which are objects that are emphatically different than actual distributions but behave like them in the eyes of low degree polynomials. We use these pseudo-distributions to "lift" a proof into an algorithm via the Shor-Parrilo-Lasserre "Sum-of-Squares" semidefinite programming hierarchy. The talk will be based on several joint works with (varying subsets of) Fernando Brandao, Aram Harrow, Jonathan Kelner, David Steurer and Yuan Zhou.

Computational anatomy considers spaces of shapes (e.g. medical images) endowed with a Riemannian metric (Sobolev type). The area blends techniques from differential geometry (geometric mechanics), analysis and statistics. The EPDiff equation, which is basically an extension of Euler's equation, without the incompressibility assumption, is often used to match shapes. Time-varying images (4DCA), one of the current research themes in the area. In longitudinal studies (say for Alzheimer's disease) snapshots at given times are interpolated/regressed. The problem arises of comparing two such sequences for classification purposes. For some background, see the recent workshop http://www.mat.univie.ac.at/~shape2015/schedule.html .

In this talk we discuss finite dimensional examples using landmarks. A short process is interpreted as a tangent vector in the space of images. This leads to control problems whose state space is a tangent bundle. Usually, cubic Riemannian splines are taken, ie., minimizing the norm of the acceleration vector, for paths connecting two tangent vectors under a fixed time. We propose as an alternative to cubic splines the time minimal problem under bounded acceleration (morally the norm). We suggest that both splines problems on are completely integrable in the Arnold-Liouville sense. Along the way, we present general technical results about the underlying symplectic structures of control problems whose state space has a bundle structure.

This is joint ongoing work with Paula Balseiro, Alejandro Cabrera, and Teresa Stuchi.

In many settings, people exhibit behavior that is inconsistent across time — we allocate a block of time to get work done and then procrastinate, or put effort into a project and then later fail to complete it. An active line of research in behavioral economics and related fields has developed and analyzed models for this type of time-inconsistent behavior.

Here we propose a graph-theoretic model of tasks and goals, in which dependencies among actions are represented by a directed graph, and a time-inconsistent agent constructs a path through this graph. We first show how instances of this path-finding problem on different input graphs can reconstruct a wide range of qualitative phenomena observed in the literature on time-inconsistency, including procrastination, abandonment of long-range tasks, and the benefits of reduced sets of choices. We then explore a set of analyses that quantify over the set of all graphs; among other results, we find that in any graph, there can be only polynomially many distinct forms of time-inconsistent behavior; and any graph in which a time-inconsistent agent incurs significantly more cost than an optimal agent must contain a large “procrastination” structure as a minor. Finally, we use this graph-theoretic model to explore ways in which tasks can be designed to help motivate agents to reach designated goals.

This is joint work with Jon Kleinberg.

The goal in this work is to produce ‘full interpretation’ for object images, namely to identify and localize all semantic features and parts that are recognized by human observers. We develop a novel approach and tools to study this challenging task, by dividing the interpretation of the complete object to interpretation of so-called 'minimal recognizable configurations’, namely severely reduced but recognizable local regions, that are minimal in the sense that any further reduction would turn them unrecognizable. We show that for the task of full interpretation, such minimal images have unique properties, which make them particularly useful.

For modeling interpretation, we identify primitive components and relations that play a useful role in the interpretation of minimal images by humans, and incorporate them in a structured prediction algorithm. The structure elements can be point, contour, or region primitives, while relations between them range from standard unary and binary potentials based on relative location, to more complex and high dimensional relations. We show experimental results and match them to human performance. We discuss implications of ‘full’ interpretation for difficult visual tasks, such as recognizing human activities or interactions.

For germs of holomorphic functions $f : \mathbf{C}^{m+1} \to \mathbf{C}$, $g : \mathbf{C}^{n+1} \to \mathbf{C}$ having an isolated critical point at 0 with value 0, the classical Thom-Sebastiani theorem describes the vanishing cycles group $\Phi^{m+n+1}(f \oplus g)$ (and its monodromy) as a tensor product $\Phi^m(f) \otimes \Phi^n(g)$, where $(f \oplus g)(x,y) = f(x) + g(y), x = (x_0,...,x_m), y = (y_0,...,y_n)$. I will discuss algebraic variants and generalizations of this result over fields of any characteristic, where the tensor product is replaced by a certain local convolution product, as suggested by Deligne. The main theorem is a Künneth formula for $R\Psi$ in the framework of Deligne's theory of nearby cycles over general bases.

We consider the following problem: given a set of algebraic conditions on an $n$-tuple of functions and their first $l$ derivatives, admitting finitely many solutions (in a differentiably closed field), can one give an upper bound for the number of solutions?

I will present estimates in terms of the degrees of the algebraic conditions, or more generally the volumes of their Newton polytopes (analogous to the Bezout and BKK theorems). The estimates are singly-exponential with respect to $n,l$ and have the natural asymptotic with respect to the degrees or Newton polytopes, sharpening previous doubly-exponential estimates due to Hrushovski and Pillay. I will also discuss some diophantine applications to counting transcendental lattice points on algebraic varieties.

I will describe recent applications of "birthday repetition" that can give (conditional) quasi-polynomial time hardness results for Densest k-Subgraph and for $\epsilon$-Nash in 2-player games. (For the latter result we use a non-standard "PCP for PPAD" assumption which I will also discuss.) Both results are tight by [FS97], [LMM03], [Barman15].

Based on:

DkS- http://eccc.hpi-web.de/report/2015/074/

(joint work with Mark Braverman, Young Kun Ko, and Omri Weinstein)

Nash- http://arxiv.org/abs/1504.02411

(joint work with Yakov Babichenko and Christos Papadimitriou)

In the famous KKL (Kahn-Kalai-Linial) paper of 1988 the authors "imported" to combinatorics and theoretical computer science a hypercontractive inequality known as Beckner's ineqaulity (proven first, independently, by Gross and Bonami). This inequality has since become an extremely useful and influential tool, used in tens of papers, in a wide variety of settings. In many cases there are no proofs known that do not use the inequality.

In this talk I'll try to illuminate the information theoretic nature of both the inequality and its dual, touch upon a proof of the dual version from about a decade ago, (joint with V. Rodl), and a fresh (and unrelated) information theoretic proof of the primal version.

No prior knowledge will be assumed regarding discrete Fourier analysis, Entropy, and hypercontractivity.

Research can be viewed as a search of an accepting-state in an exponential-space. When we look in hind-sight after we find an accepting-state, can we identify a much shorter path than the one that was discovered by trial and error as the search actually proceeded?

In this talk I'll show that this is the case for Distributed-Computability: The discovery that different distributed problems have different levels of difficulty, and identifying the weakest model of distributed-computation that allows to solve a problem. I'll explain the essence of 40 years of research in an hour, by showing that if the right questions were asked at the right time, all the results could have been had in a span of time order-of-magnitude shorter.

Some of the major ideas in the talk were developed in works with Afek (TAU), and Borowsky (Akamai), Lynch (MIT), and Rajsbaum (UNAM).

A common approach to statistical learning on big data is to randomly split it among m machines and calculate the parameter of interest by averaging their m individual estimates.

Focusing on empirical risk minimization, or equivalently M-estimation, we study the statistical error incurred by this strategy.

We consider two asymptotic settings: one where the number of samples per machine n->inf but the number of parameters p is fixed, and a second high-dimensional regime where both p,n-> inf with p/n-> kappa.

Most previous works provided only moment bounds on the error incurred by splitting the data in the fixed p setting. In contrast, we present for both regimes asymptotically exact distributions for this estimation error. In the fixed-p setting, under suitable assumptions, we thus prove that to leading order, averaging is as accurate as the centralized solution. In the high-dimensional setting, we show a qualitatively different behavior: data splitting does incur a first order accuracy loss, which we quantify precisely. In addition, our asymptotic distributions allow the construction of confidence intervals and hypothesis testing on the estimated parameters.

Our main conclusion is that in both regimes, averaging parallelized estimates is an attractive way to speedup computations and save on memory, while incurring a quantifiable and typically moderate excess error.