## Select Seminar Series

All seminars- Home
- ›
- Studying at the Faculty
- ›
- Seminars ›
- Previous Seminars

# Previous Seminars

A longstanding problem in machine learning is to find unsupervised methods that can learn the statistical structure of high dimensional signals. In recent years, GANs have gained much attention as a possible solution to the problem, and in particular have shown the ability to generate remarkably realistic high resolution sampled images. At the same time, many authors have pointed out that GANs may fail to model the full distribution ("mode collapse") and that using the learned models for anything other than generating samples may be very difficult. In this paper, we examine the utility of GANs in learning statistical models of images by comparing them to perhaps the simplest statistical model, the Gaussian Mixture Model. First, we present a simple method to evaluate generative models based on relative proportions of samples that fall into predetermined bins. Unlike previous automatic methods for evaluating models, our method does not rely on an additional neural network nor does it require approximating intractable computations. Second, we compare the performance of GANs to GMMs trained on the same datasets. While GMMs have previously been shown to be successful in modeling small patches of images, we show how to train them on full sized images despite the high dimensionality. Our results show that GMMs can generate realistic samples (although less sharp than those of GANs) but also capture the full distribution, which GANs fail to do. Furthermore, GMMs allow efficient inference and explicit representation of the underlying statistical structure. Finally, we discuss how GMMs can be used to generate sharp images.

We consider an algebraic variety $V$ and its foliation, both defined over a number field. Given a (compact piece of a) leaf $L$ of the foliation, and a subvariety $W$ of complementary codimension, we give an upper bound for the number of intersections between $L$ and $W$. The bound depends polynomially on the degree of $W$, the logarithmic height of $W$, and the logarithmic distance between $L$ and the locus of points where leafs of the foliation intersect $W$ improperly.

Using this theory we prove the Wilkie conjecture for sets defined using leafs of foliations under a certain assumption about the algebraicity locus. For example, we prove the if none of the leafs contain algebraic curves then the number of algebraic points of degree $d$ and log-height $h$ on a (compact piece of a) leaf grows polynomially with $d$ and $h$. This statement and its generalizations have many applications in diophantine geometry following the Pila-Zannier strategy.

I will focus mostly on the proof of the main statement, which uses a combination of differential-algebraic methods related to foliations with some ideas from complex geometry and value distribution theory. If time permits I will briefly discuss the applications to counting algebraic points and diophantine geometry at the end.

20 questions is one of the simplest examples of a combinatorial search game:

lice thinks of an English word, and Bob's job is to figure it out by asking Yes/No questions. The worst-case complexity of the game is clearly log n, so to spice things up,we assume that Alice chooses her input according to some distribution known to Bob, and Bob now aims to minimize the expected number of questions.

An optimal solution to this problem was found already in 1952. However, the optimal strategy could ask arbitrarily complex Yes/No questions. We ask what happens when we constrain Bob to asking only "simple" questions, and what happens if Alice is allowed to lie a few times.

Joint work with Yuval Dagan (MIT), Ariel Gabizon, Daniel Kane (UCSD) and Shay Moran (IAS).

We consider statistical questions concerning colored sequences of partitions, produced by applying a partition process which was first introduced by Kakutani for the 1-dimensional case. This process can be generalized as the application of a fixed multiscale substitution rule, defined on a finite set of colored sets in R^d, on elements of maximal measure in each partition. Colored sets appearing in the sequence are modeled by certain flows on an associated directed weighted graph, and natural statistical questions can be reformulated as questions on the distribution of paths on graphs. Under some incommensurability assumptions, we show that special properties of Laplace transforms of the relevant counting functions imply explicit statistical results

A searchable symmetric encryption (SSE) scheme enables a client to store data on an untrusted server while supporting keyword searches in a secure manner. Recent experiments have indicated that the practical relevance of such schemes heavily relies on the tradeoff between their space overhead, locality (the number of non-contiguous memory locations that the server accesses with each query), and read efficiency (the ratio between the number of bits the server reads with each query and the actual size of the answer). In this talk, we survey recent constructions and matching lower bounds, and discuss their underlying techniques.

Based on joint works with Gilad Asharov, Moni Naor, and Gil Segev.

We consider the geo-localization task - finding the pose (position & orientation) of a camera in a large 3D scene from a single image. We aim toexperimentally explore the role of geometry in geo-localization in a convolutional neural network (CNN) solution. We do so by ignoring the often available texture of the scene. We therefore deliberately avoid using texture or rich geometric details and use images projected from a simple 3D model of a city, which we term lean images. Lean images contain solely information that relates to the geometry of the area viewed (edges, faces, or relative depth). We find that the network is capable of estimating the camera pose from the lean images, and it does so not by memorization but by some measure of geometric learning of the geographical area. The main contributions of this work are: (i) providing insight into the role of geometry in the CNN learning process; and (ii) demonstrating the power of CNNs for recovering camera pose using lean images.

This is a joint work with Moti Kadosh & Ariel Shamir

This will be a very introductory talk about Virasoro Lie algebra and its super-analogues: Ramond and Neveu-Schwarz Lie superalgebras.

In this talk I will describe some recent work on distributed property testing in the networks with bounded bandwidth (the CONGEST model): we have a network of computing nodes communicating over some initially-unknown network graph, where every communication link can carry a bounded number of bits per round. Some simple-looking problems, such as checking if the network contains a 4-cycle, are known to be very hard in this model, and this motivates us to consider property testing instead of exact solutions.

I will describe distributed property testing algorithms for two problems: subgraph-freeness, where we wish to determine whether the network graph contains some fixed constant-sized subgraph H; and uniformity testing, where every node of the network draws samples from an unknown distribution, and our goal is to determine whether the distribution is uniform or far from uniform. I will also discuss lower bounds.

Understanding statistical properties of zeros of Laplacian eigenfunctions is a program which is attracting much attention from mathematicians and physicists. We will discuss this program in the setting of "quantum graphs", self-adjoint differential operators acting on functions living on a metric graph. Numerical studies of quantum graphs motivated a conjecture that the distribution of nodal surplus (a suitably rescaled number of zeros of the n-th eigenfunction) has a universal form: it approaches Gaussian as the number of cycles grows. The first step towards proving this conjecture is a result established for graphs which are composed of cycles separated by bridges. For such graphs we use the nodal-magnetic theorem of the speaker, Colin de Verdiere and Weyand to prove that the distribution of the nodal surplus is binomial with parameters p=1/2 and n equal to the number of cycles. Based on joint work with Lior Alon and Ram Band.

Universal learning is considered from an information theoretic point of view following the universal prediction approach pursued in the 90's by F&Merhav. Interestingly, the extension to learning is not straight-forwar

We prove a new connection between gentle measurement (where one wants to measure n quantum states, in a way that damages the states only by a little) and differential privacy (where one wants to query a database about n users, in a way that reveals only a little about any individual user). The connection is bidirectional, though with loss of parameters in going from DP to gentle measurement. Exploiting this connection, we present a new algorithm for approximating the outcomes of many measurements on a collection of quantum states, a task called "shadow tomography". The new algorithm has the advantages of being gentle and online (the measurements can be chosen adaptively).

Joint work with Scott Aaronson.

No prior knowledge about quantum mechanics or computing will be assumed.

This work studies the relationship between the classification performed by deep neural networks (DNNs) and the decision of various classic classifiers, namely k-nearest neighbors (k-NN), support vector machines (SVM), and logistic regression (LR). This is studied at various layers of the network, providing us with new insights on the ability of DNNs to both memorize the training data and generalize to new data at the same time, where k-NN serves as the ideal estimator that perfectly memorizes the data.

First, we show that DNNs' generalization improves gradually along their layers and that memorization of non-generalizing networks happens only at the last layers.

We also observe that the behavior of DNNs compared to the linear classifiers SVM and LR is quite the same on the training and test data regardless of whether the network generalizes. On the other hand, the similarity to k-NN holds only at the absence of overfitting. This suggests that the k-NN behavior of the network on new data is a good sign of generalization. Moreover, this allows us to use existing k-NN theory for DNNs.

Complex classifiers may exhibit "embarrassing" failures even in "easy" cases where humans can provide a simple justified explanation. Avoiding such failures is obviously of key importance. In this work, we focus on one such setting, where a label is perfectly predictable if the input contains certain features, or rules, and otherwise it is predictable by a linear classifier. We define a hypothesis class that captures this notion and determine its sample complexity. We also give evidence that efficient algorithms cannot achieve this sample complexity. We then derive a simple and efficient algorithm and show that its sample complexity is close to optimal, among efficient algorithms. Experiments on synthetic and sentiment analysis data demonstrate the efficacy of the method, both in terms of accuracy and interpretabilit

A graph is automatically also a metric space, but is there anything interesting to say about such metric spaces? Many fascinating and concrete questions are encapsulated in the more general (and vague) question "to what extent can a finite graph emulate the properties of a infinite regular tree ?". We will see how this leads us to investigate expansion in graphs and questions about the large scale aspects of graph metrics including girth and diameter and the relations between the two. If time permits (probably not) I may also say a little about the local geometry of graphs.

This talk is based on many collaborations which I have had over the years, among my more recent collaborators are Michael Chapman, Yuval Peled and Yonatan Bilu. The talk requires no particular previous background and should be accessible to general mathematical audience.

In 1981 P.Arnoux and J.-C.Yoccoz constructed their famous foliation on a surface of genus three with zero flux. Later it was shown that this example can be generalized, and in particular that there is an interesting fractal set of parameters that give rise to the foliations with similar properties. This fractal was named in honour of Gerard Rauzy.

In my talk I will briefly discuss how such a family of foliations appeared in different branches of mathematics (symbolic dynamics, Teichmuller dynamics, low-dimensional topology, geometric group theory) and even in theoretical physics (conductivity theory in monocrystals) and explain what do we know about this family from ergodic point of view.

The talk is based on joint work with Pascal Hubert and Artur Avila and on a work in progress with Ivan Dynnikov and Pascal Hubert.

I'll revisit decompositions of de Rham complexes in positive characteristic (Deligne-Illusie), by discussing relations between cotangent complexes, liftings mod $p^2$, and de Rham-Witt and derived de Rham complexes. Such relations have been recently observed independently by Bhargav Bhatt.

A linear threshold function (LTF) is a Boolean function f:{-1,1}^n -> {0,1} of the form f(x) = 1_{ \sum a_i x_i > t}, for some fixed coefficients a_i and some threshold t. LTFs play an important role in complexity theory, machine learning, and other fields.

In this talk we present a new approach that allows us obtaining sharp results on Fourier-theoretic properties of biased LTFs. In particular, we determine the exact asymptotic order of the total influence and of the degree-1 Fourier weight of any biased LTF, in terms of its maximal (normalized) coefficient and its expectation. This provides a sharp generalization of theorems proved by Matulef, O'Donnell, Rubinfeld, and Servedio (in the context of property testing), and settles a conjecture posed by Kalai et al.

Our main tools are 'local' forms of the classical Chernoff inequality, like the following one proved by Devroye and Lugosi (2008): Let {x_i} be independent random variables uniformly distributed in {-1, 1}, and let a_i be nonnegative numbers such that \sum a_i^2 =1. If for some t > 0, we have Pr [\sum a_i x_i > t] = b, then Pr[\sum a_i x_i > t + delta] < b/2 holds for delta < c/ \sqrt {log(1/b)}, where c is a universal constant. Such inequalities seem to be little-known and probably can be useful in other contexts as well.

Joint work with Ohad Klein.

We will consider two different types of hypergeometric decompositions of special data associated to five pencils of quartic surfaces. We see that over the complex numbers, the middle cohomology of these five pencils yield hypergeometric Picard-Fuchs equations. Using these parameters, we then consider the same pencils over finite fields, decomposing their rational point counts using the finite-field hypergeometric functions with the same parameters as above. This is joint work with Charles Doran, Adriana Salerno, Steven Sperber, John Voight, and Ursula Whitcher.

MPC-in-the-head is a novel paradigm introduced in the work of Ishai et al. [IshaiKOS09] and, roughly speaking, allows the design of a zero-knowledge proof system for any NP-relation by relying on any multiparty computation (MPC) protocol in a modular way. On a high-level, in this transformation, the prover emulates ``in-its-head'' an execution of an MPC protocol that securely evaluates the NP-relation on the witness and commits to the views of the parties induced by this run. The verifier then tests the veracity of the computation by challenging the prover to open (decommit) the views of a subset of the parties. The key insight in the compilation is that the soundness and zero-knowledge property directly reduces to the robustness and simulation guarantees of the underlying MPC protocol.

In order to prove concentration estimates for (products of) measures with heavier tails than the standard Gaussian measure one can use several variants of the classical log-Sobolev inequality, e.g., Beckner-type inequalities of Latala and Oleszkiewicz or modified log-Sobolev inequalities of Gentil, Guillin, and Miclo. The main result I plan to present asserts that a probability measure on R^d which satisfies the former inequality satisfies also the latter. Based on joint work with Franck Barthe.

When one considers the functor of parabolic induction in various contexts there arises immediately the question of the existence of left or right adjoints. For example in the p-adic setting there is a natural left adjoint, but it was shown by Bernstein that in fact there is also a right adjoint, and they turn out to be isomorphic - this phenomenon is called "second adjointness". We explain how second adjointness is directly related to a natural braiding on a categorified version of the Hall algebra and describe the interplay between the two settings and lay out a strategy of how this connection can help understand both sides better.

Brownian motion is a classical process in probability theory belonging to the class of "Log-correlated random fields". It is well known do to Bramson that the order of the maximum has a different logarithmic correction as the corresponding independent setting.

In this talk we look at a version of branching Brownian motion where we slightly vary the diffusion parameter in a way that, when looking at the order of the maximum, we can smoothly interpolate between the logarithmic correction for independent random variables ($\frac{1}{2\sqrt 2}\ln(t)$) and the logarithmic correction of BBM ($\frac{3}{2\sqrt 2}\ln(t)$) and the logarithmic correction of 2-speed BBM with increasing variances ($\frac{6}{2\sqrt 2}\ln(t)$). We also establish in all cases the asymptotic law of the maximum and characterise the extremal process, which turns out to coincide essentially with that of standard BBM. We will see that the key to the above results is a precise understanding of the entropic repulsion experienced by an extremal particle. (joint work with A. Bovier)

Deep learning has been amongst the most emerging fields in computer science and engineering. In recent years it has been shown that deep networks are vulnerable to attacks by adversarial examples. I will introduce a novel flexible approach named Houdini for generating adversarial examples for complex and structured tasks and demonstrate successful attacks on different applications such as speech recognition, pose estimation, semantic image segmentation, speaker verification, and malware detection. Then I will discuss how this weakness can be turned into two secure applications. The first is a new technique for watermarking deep network models in a black-box way. That is, concealing information within the model that can be used by the owner of the model to claim ownership. The second application is a novel method to Speech Steganography, namely hiding a secret spoken message within an ordinary public spoken message. I will conclude the talk by a brief discussion of our attempts to detect such adversarial attacks, based on multiple semantic label representations

The derandomization of MA, the probabilistic version of NP, is a long standing open question. In this talk, we connect this problem to a variant of another major problem: the quantum PCP conjecture. Our connection goes through the surprising quantum characterization of MA by Bravyi and Terhal. They proved the MA-completeness of the problem of deciding whether the groundenergy of a uniform stoquastic local Hamiltonian is zero or inverse polynomial. We show that the gapped version of this problem, i.e. deciding if a given uniform stoquastic local Hamiltonian is frustration-free or has energy at least some constant, is in NP. Thus, if there exists a gap-amplification procedure for uniform stoquastic Local Hamiltonians (in analogy to the gap amplification procedure for constraint satisfaction problems in the original PCP theorem), then MA = NP (and vice versa). Furthermore, if this gap amplification procedure exhibits some additional (natural) properties, then P = RP. This is a joint work with Dorit Aharonov.

Inverse problems appear in many applications, such as image deblurring, inpainting and super-resolution. The common approach to address them is to design a specific algorithm (or recently - a deep neural network) for each problem. The Plug-and-Play (P&P) framework, which has been recently introduced, allows solving general inverse problems by leveraging the impressive capabilities of existing denoising algorithms. While this fresh strategy has found many applications, a burdensome parameter tuning is often required in order to obtain high-quality results. In this work, we propose an alternative method for solving inverse problems using off-the-shelf denoisers, which requires less parameter tuning (can be also translated into less pre-trained denoising neural networks). First, we transform a typical cost function, composed of fidelity and prior terms, into a closely related, novel optimization problem. Then, we propose an efficient minimization scheme with a plug-and-play property, i.e., the prior term is handled solely by a denoising operation. Finally, we present an automatic tuning mechanism to set the method's parameters. We provide a theoretical analysis of the method, and empirically demonstrate its impressive results for image inpainting, deblurring and super-resolution. For the latter, we also present an image-adaptive learning approach that further improves the results.

We study the Excluded Grid Theorem, a fundamental structural result in graph theory, that was proved by Robertson and Seymour in their seminal work on graph minors. The theorem states that there is a function f, such that for every integer g > 0, every graph of treewidth at least f(g) contains the (gxg)-grid as a minor. For every integer g>0, let f(g) be the smallest value for which the theorem holds. Establishing tight bounds on f(g) is an important graph-theoretic question. Robertson and Seymour showed that f(g) is at least of order g^2 log g). For a long time, the best known upper bounds on f(g) were super-exponential in g. The first polynomial upper bound of f(g) = O(g^98 poly log g) was proved by Chekuri and Chuzhoy. It was later improved to f(g) = O(g^36 poly log g), and then to f(g) = O(g^19 poly log g). In this talk we present our recent work that further improves this bound to f(g) = O(g^9 poly log g) via a simpler proof. Moreover, while there are natural barriers that seem to prevent the previous methods from yielding tight bounds for the theorem, it seems conceivable that the techniques proposed in this thesis can lead to even tighter bounds on f(g).

As it was observed a few years ago, there exists a certain signed count of real lines on real projective hypersurfaces of degree 2n+1 and dimension n that, contrary to the honest "cardinal" count, is independent of the choice of a hypersurface, and by this reason provides, as a consequence, a strong lower bound on the honest count. Originally, in this invariant signed count the input of a line was given by its local contribution to the Euler number of an appropriate auxiliary universal vector bundle.

The aim of the talk is to present other, in a sense more geometric, interpretations of the signs involved in the invariant count. In particular, this provides certain generalizations of Segre indices of real lines on cubic surfaces and Welschinger-Solomon weights of real lines on quintic threefolds.

This is a joint work with S.Finashin.

The theory of locally compact quantum groups grew out of the need to extend Pontryagin's duality for locally compact abelian groups to a wider class of objects, as well as from a modern "quantum" point of view suggesting the replacement of some algebras of functions on a group by non-commutative objects, namely operator algebras. In this talk, which will be split into two parts, we will show how several fundamental notions from probability and geometric group theory fit in this framework.

The first part will be an introduction to locally compact quantum groups. We will present the rationale and the definitions, give examples, and explain how the theory is related to other branches of math. If time permits, we will also touch upon more specific notions related to the second part.

In the second part we will discuss convolution semigroups of states, as well as generating functionals, on locally compact quantum groups. One type of examples comes from probability: the family of distributions of a L\'evy process form a convolution semigroup, which in turn admits a natural generating functional. Another type of examples comes from (locally compact) group theory, involving semigroups of positive-definite functions and conditionally negative-definite functions, which provide

important information about the group's geometry. We will explain how these notions are related and how all this extends to the quantum world; derive geometric characterizations of two approximation properties of locally compact quantum groups; see how generating functionals may be (re)constructed and study their domains; and indicate how our results can be used to study cocycles.

Based on joint work with Adam Skalski.

No background in operator algebras will be assumed.

Modern robotic and vision systems are often equipped with a direct 3D data acquisition device, e.g. a LiDAR or RGBD camera, which provides a rich 3D point cloud representation of the surroundings. Point clouds have been used successfully for localization and mapping tasks, but their use in semantic understanding has not been fully explored. Recent advances in deep learning methods for images along with the growing availability of 3D point cloud data have fostered the development of new 3D deep learning methods that use point clouds for semantic understanding. However, their unstructured and unordered nature make them an unnatural input to deep learning methods. In this work we propose solutions to three semantic understanding and geometric processing tasks: point cloud classification, segmentation, and normal estimation. We first propose a new global representation for point clouds called the 3D Modified Fisher Vector (3DmFV). The representation is structured and independent of order and sample size. As such, it can be used with 3DmFV-Net, a newly designed 3D CNN architecture for classification. The representation introduces a conceptual change for processing point clouds by using a global and structured spatial distribution. We demonstrate the classification performance on the ModelNet40 CAD dataset and the Sydney outdoor dataset obtained by LiDAR. We then extend the architecture to solve a part segmentation task by performing per point classification. The results here are demonstrated on the ShapeNet dataset. We use the proposed representation to solve a fundamental and practical geometric processing problem of normal estimation using a new 3D CNN (Nesti-Net). To that end, we propose a local multi-scale representation called Multi Scale Point Statistics (MuPS) and show that using structured spatial distributions is also as effective for local multi-scale analysis as for global analysis. We further show that multi-scale data integrates well with a Mixture of Experts (MoE) architecture. The MoE enables the use of semi-supervised scale prediction to determine the appropriate scale for a given local geometry. We evaluate our method on the PCPNet dataset. For all methods we achieved state-of-the-art performance without using an end-to-end learning approach.

Generative Adversarial Networks (GANs) is a recent algorithmic framework that has won considerable attention. In a nutshell, GANs receive as input an IID sample and outputs synthetic data that should resemble data from the true underlying distribution. For example, consider an algorithm that receives as input some tunes from a specific music genre (e.g. jazz, rock, pop) and then outputs a new, original, tune from that genre.

From a theoretical perspective, the distinction between algorithms that genuinely generate original new examples vs. algorithms that perform naive manipulations (or even merely memorization) of the input sample is an elusive distinction. This makes the theoretical analysis of GANs algorithms challenging.

In this work we introduce two mathematical frameworks for the task of generating synthetic data. The first model we consider is inspired by GANs, and the learning algorithm has only an indirect access to the target distribution via a discriminator. The second model, called DP-Foolability, exploits the notion of differential privacy as a criterion for "non-memorization".

We characterize learnability in each of these models as well as discuss the interrelations. As an application we prove that privately PAC learnable classes are DP-foolable. As we will discuss, this can be seen as an analogue of the equivalence between uniform convergence and learnability in classical PAC learning.

Joint work with Olivier Bousquet and Shay Moran.

https://arxiv.org/pdf/1902.03468.pdf

Let F be a function field and G a connected split reductive group over F. We define a "strange" operator between different spaces of automorphic functions on G(A)/G(F), and show that this operator is natural from the viewpoint of the geometric Langlands program via the functions-sheaves dictionary. We discuss how to define this operator over a number field by relating it to pseudo-Eisenstein series and inversion of the standard intertwining operator. This operator is also connected to Deligne-Lusztig duality and cohomological duality of representations over a local field.

The prediction of interactions between nonlinear laser beams is a longstanding open problem. A traditional assumption is that these interactions are deterministic. We have shown, however, that in the nonlinear Schrodinger equation (NLS) model of laser propagation, beams lose their initial phase information in the presence of input noise. Thus, the interactions between beams become unpredictable as well. Not all is lost, however. The statistics of many interactions are predictable by a universal model.

Computationally, the universal model is efficiently solved using a novel spline-based stochastic computational method. Our algorithm efficiently estimates probability density functions (PDF) that result from differential equations with random input. This is a new and general problem in numerical uncertainty-quantification (UQ), which leads to surprising results and analysis at the intersection of probability and approximation theory.

A homomorphic secret-sharing scheme is a secret-sharing scheme that allows locally mapping shares of a secret to compact shares of a function of the secret. The talk will survey the current state of the art on homomorphic secret sharing, covering efficient constructions, applications in cryptography and complexity theory, and open questions.

We establish two-sided bounds for expectations of order statistics (k-th maxima) of moduli of coordinates of centred log-concave random vectors with uncorrelated coordinates. Our bounds are exact up to multiplicative universal constants in the unconditional case for all k and in the isotropic case for k < c n^{5/6}. We also derive two-sided estimates for expectations of sums of k largest moduli of coordinates for some classes of random vectors. The talk will be based on the joint work with Rafal Latala.

A classical theorem of Mittag-Leffler asserts that in a given Riemann surface X, for any pattern of multiplicities of poles and any configuration of residues (summing to zero), there is a meromorphic 1-form on X that realize them. The only obstruction is that residues at simple poles should be nonzero.

If we require that the multiplicity of the zeroes is also prescribed, the problem can be reformulated in terms of strata of meromorphic differentials. Using the dictionary between complex analysis and flat geometry, we are able to provide a complete characterization of configurations of residues that are realized for a given pattern of singularities. Two nontrivial obstructions appear concerning the combinatorics of the multiplicity of zeroes and the arithmetics of the residues. This is a joint work with Quentin Gendron.

We will discuss several results relating the behavior of a random walk on a planar graph and the geometric properties of a nice embedding of the graph in the plane (specifically, circle packing of the graph). One such a result is that for a bounded degree graph, the simple random walk is recurrent if and only if the boundary of the nice embedding is a polar set (that is, Brownian motion misses it almost surely). If the degrees are unbounded, this is no longer true, but for the case of circle packing of a triangulation, there are weights which are obtained naturally from the circle packing, such that when the boundary is polar, the weighted random walk is recurrent (we believe the converse also hold). These weights arise also in the context of discrete holomorphic and harmonic functions, a discrete analog of complex holomorphic functions. We show that as the sizes of circles, or more generally, the lengths of edges in the nice embedding of the graph tend to zero, the discrete harmonic functions converge to their continuous counterpart with the same boundary conditions. Equivalently, that the exit measure of the weighted random walk converges to the exit measure of standard Brownian motion. This improves previous results of Skopenkov 2013 and Werness 2015, who proves similar results under additional local and global assumptions on the embedding. In particular, we make no assumptions on the degrees of the graph, making the result applicable to models of random planar maps.

Based of joint works with Daniel Jerison, Asaf Nachmias, Matan Seidel and Juan Souto.

Speaker 1: Fanny Augeri (WIS)

Title : Nonlinear large deviations bounds with applications to sparse Erdos-Renyi graphs.

Abstract: In this talk, I will present the framework of the so-called nonlinear large deviations introduced by Chatterjee and Dembo. In a seminal paper, they provided a sufficient criterion in order that the large deviations of a function on the discrete hypercube to be due by only changing the mean of the background measure. This sufficient condition was formulated in terms of the complexity of the gradient of the function of interest. I will present general nonlinear large deviation estimates similar to Chatterjee-Dembo's original bounds except that we do not require any second order smoothness. The approach relies on convex analysis arguments and is valid for a broad class of distributions. Then, I will detail an application of this nonlinear large deviations bounds to the problem of estimating the upper tail of cycles counts in sparse Erdos-Renyi graphs down to the connectivity parameter $n^{-1/2}$.

Speaker 2: Elliot Paquette (Ohio)

Title: The Gaussian analytic function is either bounded or covers the plane

Abstract: The Gaussian analytic function (GAF) is a power series with independent Gaussian coefficients. In the case that this power series has radius of convergence 1, under mild regularity assumptions on the coefficients, it is a classical theorem that the power series is a.s. bounded on open disk if and only if it extends continuously to a function on the closed unit disk a.s. Nonetheless, there exists a natural range of coefficients in which the GAF has boundary values in L-p, but is a.s. unbounded. How wild are these boundary values? Well, Kahane asked if a GAF either a.s. extends continuously to the closed disk or a.s. has range covering the whole plane. We show partial progress in establishing this in the affirmative.

Joint with Alon Nishry.

TBA

An (\epsilon,\phi)-expander decomposition of a graph G=(V,E) with m edges is a partition of vertices into clusters such that (1) each cluster induces subgraph with conductance at least \phi, and (2) the number of inter-cluster edges is at most \epsilon m. This decomposition has a wide range of applications including approximation algorithms for the unique game, fast algorithms for flow and cut problems, and dynamic graph algorithms.

I will describe a new algorithm for constructing (~O(\phi),\phi)-expander decomposition in ~O(m/\phi) time. This is the first nearly linear time algorithm when \phi is at least 1/polylog(m), which is the case in most practical settings and theoretical applications. Previous results either take \Omega(m^{1+o(1)}) time, or attain nearly linear time but with a weaker expansion guarantee where each output cluster is guaranteed to be contained inside some unknown expander.

Our technique can be easily extended to the dynamic setting where the graph undergoing updates. This talk is based on joint work with Di Wang [Saranurak Wang SODA'19].

Following Ulam, Banach asked on the existence of dynamical system on real line with simple Lebesgue spectrum. I discuss the connection of this problem to the famous Banach problem in ergodic theory due essentially to Rohklin. I will further present my recent contribution to this problem and the connection to so called Erdos flat polynomials problem in Harmonic analysis due to J. Bourgain . My talk is based on my recent work and joint work with Mahendra Nadkarni (CBS Mumbai, India).

Poincare duality provides an isomorphism between the homology and cohomology of a compact manifold, up to a shift. For \pi-finite spaces, i.e. spaces with finitely many non-zero homotopy groups, all of which are finite, there is a similar duality only for Q-coefficients, but no such duality exists with F_p coefficients. However, as shown by Michael Hopkins and Jacob Lurie, there is a duality between the homology and cohomology of \pi-finite spaces with coefficients in some extra-ordinary cohomology theories called Morava K-theories. This property of Morava K-theory is called ambidexterity.I will explain what is ambidexterity, some of its consequences and our contribution to the subject.

This is a joint work with Lior Yanovski and Tomer Schlank.

The purpose of this talk is to understand the behavior of the extremal zeros of random polynomials of the form $ P_N(z) = \sum_{k=0}^{N} a_k R_k(z)$ where the family $(R_k)_{k \leq N}$ is an orthonormal basis for the scalar product $\langle P,Q \rangle = \int P(z) \overline{Q(z)} e^{-2N V^{\nu(z)}} d\nu(z)$ with $\nu$ a radial probability measure on $\CC$ and $V^{\nu}(z)= \int \log |z-w|d\nu(w)$.

Although the zeros of these polynomials spread like the measure $\nu$, the zeros of maximum modulus lie outsite of the support. More precisely, the point process of the roots outside of the support of the equilibrium measure converges towards the Bergman point process of the complement of the support.

We also study similar results on a model of Coulomb gases in dimension $2$ where the confining potential is generated by the presence of a fixed background of positive charges. If $\nu$ is a probability measure, we study the system of particles with joint distribution on $\CC^N$, $\frac{1}{Z_N} \prod_{i \leq j} |x_i-x_j|^2 e^{-2(N+1)\sum_{k=1}^{N}V^{\nu}(x_k)} d\ell_{\CC^N}(x_1,\dots,x_N).$ This model in closely related to the study of the zeros of random polynomials. We show that the extremal particles of this Coulomb gas present a similar behavior to the random polynomial setting.

All the results mentioned above are done in collaboration with David Garcia-Zelada.

Consider a simple random walk on $\mathbb{Z}$ with a random coloring of $\mathbb{Z}$. Look at the sequence of the first $N$ steps taken and colors of the visited locations. From it, you can deduce the coloring of approximately $\sqrt{N}$ integers. Suppose an adversary may change $\delta N$ entries in that sequence. What can be deduced now? We show that for any $\theta<0.5,p>0$, there are $N_{0},\delta_{0}$ such that if $N>N_{0}$ and $\delta<\delta_{0}$ then with probability $>1-p$ we can reconstruct the coloring of $>N^{\theta}$ integers.

In this talk I will present two recent works:

1) **Dynamic Temporal Alignment of Speech to Lips**

Many speech segments in movies are re-recorded in a studio during post-production, to compensate for poor sound quality as recorded on location. Manual alignment of the newly-recorded speech with the original lip movements is a tedious task. We present an audio-to-video alignment method for automating speech to lips alignment, stretching and compressing the audio signal to match the lip movements. This alignment is based on deep audio-visual features, mapping the lips video and the speech signal to a shared representation. Using this shared representation we compute the lip-sync error between every short speech period and every video frame, followed by the determination of the optimal corresponding frame for each short sound period over the entire video clip. We demonstrate successful alignment both quantitatively, using a human perception-inspired metric, as well as qualitatively. The strongest advantage of our audio-to-video approach is in cases where the original voice is unclear, and where a constant shift of the sound can not give perfect alignment. In these cases, state-of-the-art methods will fail.

2) **Neural separation of observed and unobserved distributions**

Separating mixed distributions is a long standing challenge for machine learning and signal processing. Most current methods either rely on making strong assumptions on the source distributions or rely on having training samples of each source in the mixture. In this work, we introduce a new method - Neural Egg Separation - to tackle the scenario of extracting a signal from an unobserved distribution additively mixed with a signal from an observed distribution. Our method iteratively learns to separate the known distribution from progressively finer estimates of the unknown distribution. In some settings, Neural Egg Separation is initialization sensitive, we therefore introduce Latent Mixture Masking which ensures a good initialization. Extensive experiments on audio and image separation tasks show that our method outperforms current methods that use the same level of supervision, and often achieves similar performance to full supervision.

We will discuss the field of definition of a rational function and in what ways it can change under iteration, in particular when the degree over the base field drops. We present two families of rational functions with the property above, and prove that in the special case of polynomials, only one of these families is possible. We also explain how this relates to Ritt's decomposition theorem on polynomials. Joint work with Francesco Veneziano (SNS Pisa).

We explore the power of interactive proofs with a distributed verifier. In this setting, the verifier consists of n nodes and a graph G that defines their communication pattern. The prover is a single entity that communicates with all nodes by short messages. The goal is to verify that the graph G belongs to some language in a small number of rounds, and with small communication bound, i.e., the proof size.

This interactive model was introduced by Kol, Oshman and Saxena (PODC 2018) as a generalization of non-interactive distributed proofs. They demonstrated the power of interaction in this setting by constructing protocols for problems as Graph Symmetry and Graph Non-Isomorphism -- both of which require proofs of (n^2)-bits without interaction.

In this work, we provide a new general framework for distributed interactive proofs that allows one to translate standard interactive protocols (i.e., with a centralized verifier) to ones where the verifier is distributed with a proof size that depends on the computational complexity of the verification algorithm run by the centralized verifier. We show the following:

* Every (centralized) computation performed in time O(n) on a RAM can be translated into three-round distributed interactive protocol with O(log n) proof size. This implies that many graph problems for sparse graphs have succinct proofs (e.g., testing planarity).

* Every (centralized) computation implemented by either a small space or by uniform NC circuit can be translated into a distributed protocol with O(1) rounds and O(log n) bits proof size for the low space case and polylog(n) many rounds and proof size for NC.

* We show that for Graph Non-Isomorphism, one of the striking demonstrations of the power of interaction, there is a 4-round protocol with O(log n) proof size, improving upon the O(n*log n) proof size of Kol et al.

* For many problems, we show how to reduce proof size below the seemingly natural barrier of log n. By employing our RAM compiler, we get a 5-round protocol with proof size O(loglog n) for a family of problems including Fixed Automorphism, Clique and Leader Election (for the latter two problems we actually get O(1) proof size).

* Finally, we discuss how to make these proofs non-interactive {\em arguments} via random oracles.

Our compilers capture many natural problems and demonstrate the difficulty in showing lower bounds in these regimes.

Joint work with Moni Naor and Merav Parter.

A conjunction over a binary alphabet is a boolean function specified by a length n pattern of 0's, 1's and wildcards. On input bit strings of length n, the function outputs 1 if the input matches the pattern at all non wildcard positions. At CRYPTO 2018, Bishop et al. proposed a simple and elegant construction to obfuscate this class of functions by embedding the pattern in the error positions of a noisy Reed-Solomon codeword, and placing the codeword in a group exponent. They prove their scheme achieves a notion of security called "distributional virtual black box" in the generic group model for random conjunctions with at most 0.774n wildcards.

In this talk, I'll show how to abstract the Bishop et al. scheme to obtain a significantly more efficient "dual" scheme. In the generic group model, our scheme admits an intuitive proof of security and does not place any restrictions on the number of wildcards.

Next, I'll describe a simple modification to the construction that avoids encoding in a group exponent and is secure under the Learning Parity with Noise (LPN) assumption. At the heart of our security proof is a reduction from standard LPN to LPN with "structured error."

No prior knowledge of the Bishop et al. paper will be assumed.

We introduce a new method for obtaining quantitative convergence rates for the central limit theorem (CLT) in the high dimensional regime. The method is based on the notion of a martingale embedding, a multivariate analogue of Skorokhod's embedding. Using the method we are able to obtain several new bounds for convergence in transportation distance and in entropy, and in particular: (a) We improve the best known bound, for convergence in quadratic Wasserstein transportation distance for bounded random vectors; (b) We derive a non-asymptotic convergence rate for the entropic CLT in arbitrary dimension, for log-concave random vectors; (c) We give an improved bound for convergence in transportation distance under a log-concavity assumption and improvements for both metrics under the assumption of strong log-concavity.

In this talk, we will review the method, and explain how one might use it in order to prove quantitative statements about rates of convergence.

We introduce a 3D shape generative model based on deep neural networks. A new image-like (i.e., tensor) data representation for genus-zero 3D shapes is devised. It is based on the observation that complicated shapes can be well represented by multiple parameterizations (charts), each focusing on a different part of the shape.

The new tensor data representation is used as input to Generative Adversarial Networks for the task of 3D shape generation. The 3D shape tensor representation is based on a multichart structure that enjoys a shape covering property and scale-translation rigidity. Scale-translation rigidity facilitates high quality 3D shape learning and guarantees unique reconstruction. The multi-chart structure uses as input adataset of 3D shapes (with arbitrary connectivity) and a sparse correspondence between them.

The output of our algorithm is a generative model that learns the shape distribution and is able to generate novel shapes, interpolate shapes, and explore the generated shape space. The effectiveness of the method is demonstrated for the task of anatomic shape generation including human body and bone (teeth) shape generation.

Recently, the Riemannian geometry of the space of symmetric positive-defini

* Joint work with Or Yair, Ori Katz, and Miri Ben-Chen

Computing shortest paths is one of the fundamental problems of graph algorithms. The goal of *dynamic* all pairs shortest paths (APSP) is to maintain shortest path trees from all vertices as the edges of the graph change over time. The algorithm is said to be decremental if it handles only deletions, incremental if it handles only insertions and fully dynamic if it handles both deletions and insertions.

In this talk I will present a near optimal decremental algorithm that maintains approximate all pairs shortest paths.

The sharp threshold phenomenon is a central topic of research in the analysis of Boolean functions. Here, one aims to give sufficient conditions for a monotone Boolean function f to satisfy $\mu p(f)=o(\mu q(f))$, where $q = p + o(p)$, and $\mu p(f)$ is the probability that $f=1$ on an input with independent coordinates, each taking the value $1$ with probability $p$. The dense regime, where $\mu p(f)=\Theta(1)$, is somewhat understood due to seminal works by Bourgain, Friedgut, Hatami, and Kalai. On the other hand, the sparse regime where $\mu p(f)=o(1)$ was out of reach of the available methods. However, the potential power of the sparse regime was envisioned by Kahn and Kalai already in 2006. In this talk we show that if a monotone Boolean function $f$ with $\mu p(f)=o(1)$ satisfies some mild pseudo-randomness conditions then it exhibits a sharp threshold in the interval $[p,q]$, with $q = p+o(p)$. More specifically, our mild pseudo-randomness hypothesis is that the $p$-biased measure of $f$ does not bump up to $\Theta(1)$ whenever we restrict $f$ to a sub-cube of constant co-dimension, and our conclusion is that we can find $q=p+o(p)$, such that $\mu p(f)=o(\mu q(f))$ At its core, this theorem stems from a novel hypercontactive theorem for Boolean functions satisfying pseudorandom conditions, which we call `small generalized influences'. This result takes on the role of the usual hypercontractivity theorem, but is significantly more effective in the regime where $p = o(1)$. We demonstrate the power of our sharp threshold result by reproving the recent breakthrough result of Frankl on the celebrated Erdos matching conjecture, and by proving conjectures of Huang--Loh--Sudakov and Furedi--Jiang for a new wide range of the parameters. Based on a joint work with Keevash, Long, and Minzer.

In recent years, deep neural networks (DNNs) achieved unprecedented performance in many vision tasks. However, state-of-the-art results are typically achieved by very deep networks, which can reach tens of layers with tens of millions of parameters. To make DNNs implementable on platforms with limited resources, it is necessary to weaken the tradeoff between performance and efficiency. In this work, we propose a new activation unit, which is suitable for both high and low level vision problems. In contrast to the widespread per-pixel activation units, like ReLUs and sigmoids, our unit implements a learnable nonlinear function with spatial connections. This enables the net to capture much more complex features, thus requiring a significantly smaller number of layers in order to reach the same performance. We illustrate the effectiveness of our units through experiments with state-of-the-art nets for classification, denoising, and super resolution. With our approach, we are able to reduce the size of these models by nearly 50% without incurring any degradation in performance.

*Spotlight presentation at CVPR'18 (+ submitted extension)

A joint work with Tamar Rott Shaham and Tomer Michaeli

I will talk about my joint work with Dave Benson which constructs new symmetric tensor categories in characteristic 2 arising from modular representation theory of elementary abelian 2-groups, and about its conjectural generalization to characteristic p>2. I will also discuss my work with Gelaki and Coulembier which shows that integral symmetric tensor categories in characteristic p>2 whose simple objects form a fusion category are super-Tannakian (i.e., representation categories of a supergroup scheme).

The Blind Source Separation (BSS) problem consists of retrieving signals of interest, termed the sources, from a dataset consisting of their mixtures. One of the most popular and common paradigms for solving the BSS problem is Independent Component Analysis (ICA), where the sources are assumed to be (only) mutually statistically independent random processes, and the mixtures are assumed to be linear combinations thereof, where the linear mixing operator is unknown. In this talk, we shall start with the Gaussian Maximum Likelihood (GML) approach for the semi-blind problem, in which the sources are assumed to be temporally-dive

The talk will be about two classical problems in the theory of Carleman classes of smooth functions. The first one is to describe the image of a Carleman class under the Borel map (which maps a smooth function to its jet of Taylor coefficients at a given point). The second one concerns possible ways to construct a function in a given Carleman class with prescribed Taylor coefficients. I will present solutions to both problems. If time permits, I will also discuss related problems which originate in the singularity theory of Carleman classes.

The reducibility and structure of parabolic inductions is a basic problem in the representation theory of p-adic groups. Of particular interest are principal series and degenerate principal series representations, that is parabolic induction of 1-dimensional representations of Levi subgroups.

In this talk, I will start by describing the functor of normalized induction and its left adjoint, the Jacquet functor, and by going through several examples in the group SL_4(Q_p) will describe an algorithm which can be used to determine reducibility of such representations.

This algorithm is the core of a joint project with Hezi Halawi, in which we study the structure of degenerate principal series of exceptional groups of type En (see https://arxiv.org/abs/1811.02974).

Recent years have seen the emergence of new kinds of software including deep learning, programmable computer networks, and blockchains. Unfortunately, these systems have been shown to suffer from critical safety and security errors, affecting their wider adoption. The goal of my research is to develop new automated program verification and synthesis techniques which ensure safety and reliability of these systems.

In this talk, I will start by introducing AI2, the first automated verifier for neural networks able to certify large convolutional models. The key idea behind AI2 is to bridge abstract interpretation and neural networks, enabling a sound over-approximation of a network’s behavior in a scalable manner. I will then briefly discuss DL2, a system which enables clean interaction with deep learning models by allowing users to pose queries in a declarative manner. Finally, I will demonstrate how automated program analysis and synthesis can address key security and reliability challenges in domains such as computer networks and blockchains, preventing severe outages and financial losses.

Bio: Dana Drachsler-Cohen is an ETH Postdoctoral Fellow at the department of Computer Science, ETH Zurich. Her research interests span automated reasoning, program synthesis and machine learning. She obtained her PhD from the Computer Science Department at the Technion in 2017.

To compute the spectral norm of a p-tensor one needs to optimize a homogeneous polynomial of degree p over the sphere. When p=2 (the matrix case) it is algorithmically easy, but for p>2 it can be NP-hard. In this talk I will focus on (randomized) optimization in high-dimensions when the objective function is a linear combination of homogeneous polynomials with Gaussian coefficients. Such random functions are called spherical spin glasses in physics and have been extensively studied since the 80s. I will describe certain geometric properties of spherical spin glasses unique to the full-RSB case, and explain how they can be used to design a polynomial time algorithm that finds points within a small multiplicative error from the global maximum.

We tackle the problem of multiple image alignment and 3D reconstruction under extreme noise.

Modern alignment schemes, based on similarity measures, feature matching and optical flow are often pairwise, or assume global alignment. Nevertheless, under extreme noise, the alignment success sharply deteriorates, since each image does not contain enough information. Yet, when multiple images are well aligned, the signal emerges from the stack.

As the problems of alignment and 3D reconstruction are coupled, we constrain the solution by taking into account only alignments that are geometrically feasible and solve for the entire stack simultaneously. The solution is formulated as a variational problem where only a single variable per pixel, the scene's distance, is solved for. Thus, the complexity of the algorithm is independent of the number of images. Our method outperforms state-of-the-art techniques, as indicated by our simulations and by experiments on real-world scenes.

And, finally- I will discuss how this algorithm is related to our current and planned work with underwater robots.

We show that when data are endogenously truncated the widely-used IV fails to render the relationship causal as well as introduces bias into the exogenous covariates. We offer a newly-introduce

A family of sets F is said to satisfy the (p,q)-property if among any p sets in F, some q have a non-empty intersection. Hadwiger and Debrunner (1957) conjectured that for any p > q > d there exists a constant c = c_d(p,q), such that any family of compact convex sets in R^d that satisfies the (p,q)-property, can be pierced by at most c points. The classical Helly's Theorem is equivalent to the fact that c_d(p,p)=1 (p > d).

In a celebrated result from 1992, Alon and Kleitman proved the conjecture. However, obtaining sharp bounds on the minimal such c_d(p,q), called `the Hadwiger-Debrunner numbers', is still a major open problem in combinatorial geometry.

In this talk we present improved upper and lower bounds on the Hadwiger-Debrunner numbers, the latter using the hypergraph container method.

Based on joint works with Shakhar Smorodinsky and Gabor Tardos.

Godofredo Iommi (PUC Chile)

**Title: Upper semi-continuity of the entropy map for Markov shifts**

Abstract: In this talk I will show that for finite entropy countable Markov shifts the entropy map is upper semi-continuous when restricted to the set of ergodic measures. This is joint work with Mike Todd and Anibal Velozo.

Paul Dario (ENS)

**Title: Homogenization on supercritical percolation cluster**

Abstract: The standard theory of stochastic homogenization requires an assumption of uniform ellipticity on the environment. In this talk, we investigate how one can remove this assumption in a specific case: the infinite cluster of the supercritical Bernouilli percolation of Zd. We will present a renormalization argument for the infinite cluster and show how one can use it to adapt the theory developped in the uniformly elliptic setting. We will then review some results which can be obtained through this technique: homogenization theorem, large scale regularity, Liouville theorem and bounds on the corrector.

The common practice in image generation is to train a network and fix it to a specific working point. In contrast, in this work we would like to provide control at inference time in a generic way. We propose to add a second training phase, where we train an additional tuning block. The goal of these tuning blocks is to provide us with control over the output image by modifying the latent space representation. I will first present our latest attempt in the domain of generative adversarial, where we aim to improve the quality of the results using the discriminator information -- we name this adversarial feedback loop. Second, I will present Dynamic-Net, where we train a single network such that it can emulate many networks trained with different objectives. The main assumption of both works is that we can learn how to change the latent space in order to achieve a specific goal. Evaluation on many application shows that the latent space can be manipulated, such that, it allows us to have diversified control at inference time.

* Joint work with Firas Shama*, Alon Shoshan* and Lihi Zelnik Manor.

Deep learning has become pervasive in everyday life, powering language applications like Apple's Siri, Amazon's Alexa, and Google Translate. The inherent limitation of these deep learning systems, however, is that they often function as a "black box," preventing researchers and users from discerning the roles of different components and what they learn during the training process. In this talk, I will describe my research on interpreting deep learning models for language along three lines. First, I will present a methodological framework for investigating how these models capture various language properties. The experimental evaluation will reveal a learned hierarchy of internal representations in deep models for machine translation and speech recognition. Second, I will demonstrate that despite their success, deep models of language fail to deal even with simple kinds of noise, of the type that humans are naturally robust to. I will then propose simple methods for improving their robustness to noise. Finally, I will turn to an intriguing problem in language understanding, where dataset biases enable trivial solutions to complex language tasks. I will show how to design models that are more robust to such biases, and learn less biased latent representations

Random curves in space and how they are knotted give an insight into the behavior of "typical" knots and links. They have been studied by biologists and physicists in the context of the structure of random polymers. Several randomized models have been suggested and investigated, and many results have been obtained via computational experiment. The talk will begin with a review of this area.

In work with Hass, Linial, and Nowik, we study knots based on petal projections, potholder diagrams, and more. We have found explicit formulas for the limit distribution of finite type invariants of random knots and links in the Petaluma model. I will discuss these results and sketch proof ideas as time permits.

Tensor categories are abelian k-linear monoidal categories modelled on the representation categories of affine (super)group schemes over k. Deligne gave very succinct intrinsic criteria for a tensor category to be equivalent to such a representation category, over fields k of characteristic zero. These descriptions are known to fail badly in prime characteristics. In this talk, I will present analogues in prime characteristic of these intrinsic criteria. Time permitting, I will comment on the link with a recent conjecture of V. Ostrik which aims to extend Deligne's work in a different direction.

Inspired by empirical observations on honey bee colonies, we analyze the emergence of task differentiation in a model complex system, characterized by an absence of hierarchical control, yet able to exhibit coordinated behavior and collective function. The analysis considers the steady-state response of a mechanical interaction network to exogenous resonant excitation. It demonstrates how an optimal allocation of excitation sensitivities across the network nodes that results in a maximal response may be constructed either using global information about the network topology or, alternatively, through iterated dynamics of an intuitive learning paradigm that only relies on local information within the network. Importantly, the analysis derives explicit conditions on the topology that guarantee optimal selection using only local dynamics, but also describes circumstances when the system may naturally evolve to a condition of collapse, in which all the excitation sensitivities equal zero, at least over intermediate times. The discussion considers the implications to other network designs, including naturally occurring networks, as well as the effects of noise and nonlinearity.

In the classical Node-Disjoint Paths (NDP) problem, we are given an n-vertex graph G, and a collection of pairs of its vertices, called demand pairs. The goal is to route as many of the demand pairs as possible, where to route a pair we need to select a path connecting it, so that all selected paths are disjoint in their vertices.

The best current algorithm for NDP achieves an $O(\sqrt{n})$-approximation, while, until recently, the best negative result was a roughly $\Omega(\sqrt{\log n})$-hardness of approximation. Recently, an improved $2^{\Omega(\sqrt{\log n})}$-hardness of approximation for NDP was shown, even if the underlying graph is a subgraph of a grid graph, and all source vertices lie on the boundary of the grid. Unfortunately, this result does not extend to grid graphs.

The approximability of NDP in grids has remained a tantalizing open question, with the best upper bound of $\tilde{O}(n^{1/4})$, and the best lower bound of APX-hardness. In this talk we come close to resolving this question, by showing an almost polynomial hardness of approximation for NDP in grid graphs.

Our hardness proof performs a reduction from the 3COL(5) problem to NDP, using a new graph partitioning problem as a proxy. Unlike the more standard approach of employing Karp reductions to prove hardness of approximation, our proof is a Cook-type reduction, where, given an input instance of 3COL(5), we produce a large number of instances of NDP, and apply an approximation algorithm for NDP to each of them. The construction of each new instance of NDP crucially depends on the solutions to the previous instances that were found by the approximation algorithm.

Joint work with David H.K. Kim and Rachit Nimavat.

We define a functor from the category of representations of algebraic supergroups with reductive even part to the category of equivariant sheaves and show several applications of this construction to representation theory, in particular projectivity criterion, classification of blocks and computation of dimensions of irreducible representations.

Every $k$ entries in a permutation can have one of k! different relative orders, called patterns. How many times does each pattern occur in a large random permutation of size $n$? The distribution of this k!-dimensional vector of pattern densities was studied by Janson, Nakamura, and Zeilberger (2015). Their analysis showed that some component of this vector is asymptotically multinormal of order $1/\sqrt(n)$, while the orthogonal component is smaller. Using representations of the symmetric group, and the theory of U-statistics, we refine the analysis of this distribution. We show that it decomposes into $k$ asymptotically uncorrelated components of different orders in $n$, that correspond to representations of $S_k$. Some combinations of pattern densities that arise in this decomposition have interpretations as practical nonparametric statistical tests.

Imbalanced data is problematic for many machine learning applications. Some classifiers, for example, can be strongly biased due to diversity of class populations. In unsupervised learning, imbalanced sampling of ground truth clusters can lead to distortions of the learned clusters. Removing points (undersampling) is a simple solution, but this strategy leads to information loss and can decrease generalization performance. We instead focus on data generation to artificially balance the data.

In this talk, we present a novel data synthesis method, which we call SUGAR (Synthesis Using Geometrically Aligned Random-walks) [1], for generating data in its original feature space while following its intrinsic geometry. This geometry is inferred by a diffusion kernel [2] that captures a data-driven manifold and reveals underlying structure in the full range of the data space -- including undersampled regions that can be augmented by new synthesized data.

We demonstrate the potential advantages of the approach by improving both classification and clustering performance on numerous datasets. Finally, we show that this approach is especially useful in biology, where rare subpopulations and gene-interaction relationships are affected by biased sampling.

[1] Lindenbaum, Ofir, Jay S. Stanley III, Guy Wolf, and Smita Krishnaswamy. "Geometry-Based Data Generation." *NIPS* (2018).

[2] Coifman, Ronald R., and Stéphane Lafon. "Diffusion maps." *Applied and computational harmonic analysis *(2006).

In the last 35 years, geometric flows have proven to be a powerful tool in geometry and topology. The Mean Curvature Flow is, in many ways, the most natural flow for surfaces in Euclidean space.

In this talk, which will assume no prior knowledge, I will illustrate how mean curvature flow could be used to address geometric questions. I will then explain why the formation of singularities of the mean curvature flow poses difficulties for such applications, and how recent new discoveries about the structure of singularities (including a work joint with Kyeongsu Choi and Robert Haslhofer) may help overcome those difficulties.

The problem of computational super-resolution asks to recover fine features of a signal from inaccurate and bandlimited data, using an a-priori model as a regularization. I will describe several situations for which sharp bounds for stable reconstruction are known, depending on signal complexity, noise/uncertainty level, and available data bandwidth. I will also discuss optimal recovery algorithms, and some open questions.

We show that static data structure lower bounds in the group (linear) model imply semi-explicit lower bounds on matrix rigidity. In particular, we prove that an explicit lower bound of t >> log^2(n) on the cell-probe complexity of linear data structures in the group model, even against arbitrarily small linear space (s = (1+eps)n), would already imply a semi-explicit (P^NP) construction of rigid matrices with significantly better parameters than the current state of art (Alon, Panigrahy, and Yekhanin, 2009). Our result further asserts that polynomial (t > n^eps) data structure lower bounds against near-maximal space, would imply super-linear circuit lower bounds for log-depth linear circuits (a four-decade open question). In the succinct space regime (s = n+o(n)), we show that any improvement on current cell-probe lower bounds in the linear model would also imply new rigidity bounds. Our main result relies on a new connection between the "inner" and "outer" dimensions of a matrix (Paturi and Pudlak, 2006), and on a new worst-to-average case reduction for rigidity, which is of independent interest.

Joint work with Zeev Dvir (Princeton) and Alexander Golovnev (Harvard).

Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of $(1+\epsilon)$ under a projection onto a random $O(log(k/\epsilon)/\epsilon^2)$-dimensional subspace whp. Further, the cost of every clustering is preserved within $(1+\epsilon)$. Crucially, the dimension does not depend on the total number of points n in the instance. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant $p$.

For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.

Joint work with Konstantin Makarychev and Ilya Razenshteyn.

The goal of style transfer algorithms is to render the content of one image using the style of another. We propose Style Transfer by Relaxed Optimal Transport and Self-Similarity (STROTSS), a new optimization-based style transfer algorithm. We extend our method to allow user-specified point-to-point or region-to-region control over visual similarity between the style image and the output. Such guidance can be used to either achieve a particular visual effect or correct errors made by unconstrained style transfer. In order to quantitatively compare our method to prior work, we conduct a large-scale user study designed to assess the style-content tradeoff across settings in style transfer algorithms. Our results indicate that for any desired level of content preservation, our method provides higher quality stylization than prior work.

Joint work with Nick Kolkin and Jason Salavon.

Understanding deep learning calls for addressing three fundamental questions: expressiveness, optimization and generalization. Expressiveness refers to the ability of compactly sized deep neural networks to represent functions capable of solving real-world problems. Optimization concerns the effectiveness of simple gradient-based algorithms in solving non-convex neural network training programs. Generalization treats the phenomenon of deep learning models not overfitting despite having much more parameters than examples to learn from. This talk will describe a series of works aimed at unraveling some of the mysteries behind optimization and expressiveness. I will begin by discussing recent analyses of optimization for deep linear neural networks. By studying the trajectories of gradient descent, we will derive the most general guarantee to date for efficient convergence to global minimum of a gradient-based algorithm training a deep network. Moreover, in stark contrast to conventional wisdom, we will see that, sometimes, gradient descent can train a deep linear network faster than a classic linear model. In other words, depth can accelerate optimization, even without any gain in expressiveness, and despite introducing non-convexity to a formerly convex problem. In the second (shorter) part of the talk, I will present an equivalence between convolutional and recurrent networks --- the most successful deep learning architectures to date --- and hierarchical tensor decompositions. The equivalence brings forth answers to various questions concerning expressiveness, resulting in new theoretically-b

Optimization works covered in this talk were in collaboration with Sanjeev Arora, Elad Hazan, Noah Golowich and Wei Hu. Expressiveness works were with Amnon Shashua, Or Sharir, Yoav Levine, Ronen Tamari and David Yakira.

In this talk, I will discuss a question which originates in complex analysis but is really a problem in non-linear elliptic PDE. A finite Blaschke product is a proper holomorphic self-map of the unit disk, just like a polynomial is a proper holomorphic self-map of the complex plane. A celebrated theorem of Heins says that up to post-composition with a M\"obius transformation, a finite Blaschke product is uniquely determined by the set of its critical points. Konstantin Dyakonov suggested that it may interesting to extend this result to infinite degree. However, one must be a little careful since infinite Blaschke products may have identical critical sets. I will show that an infinite Blaschke product is uniquely determined by its "critical structure" and describe all possible critical structures which can occur. By Liouville's correspondence, this question is equivalent to studying nearly-maximal solutions of the Gauss curvature equation $\Delta u = e^{2u}$. This problem can then be solved using PDE techniques, using the method of sub- and super-solutions.

The hydrogen atom system is one of the most thoroughly studied examples of a quantum mechanical system. It can be fully solved, and the main reason why is its (hidden) symmetry. In this talk I shall explain how the symmetries of the Schrödinger equation for the hydrogen atom, both visible and hidden, give rise to an example in the recently developed theory of algebraic families of Harish-Chandra modules. I will show how the algebraic structure of these symmetries completely determines the spectrum of the Schrödinger operator and sheds new light on the quantum nature of the system. No prior knowledge on quantum mechanics will be assumed.

As software has grown increasingly critical to our society's infrastructure, mechanically-verified software has grown increasingly important, feasible, and prevalent. Proof assistants have seen tremendous growth in recent years because of their success in the mechanical verification of high-value applications in many areas, including cyber security, cyber-physical systems, operating systems, compilers, and microkernels. These proof assistants are built on top of constructive type theory whose computational interpretation is given by the proofs-as-programs paradigm, which establishes a correspondence between formal proofs and computer programs. However, while both proof theory and programming languages have evolved significantly over the past years, the cross-fertilization of the independent new developments in each of these fields has yet to be explored in the context of this paradigm. This naturally gives rise to the following questions: how can modern notions of computation influence and contribute to formal foundations, and how can modern reasoning techniques improve the way we design and reason about programs?

In this talk I first demonstrate how using programming principles that go beyond the standard lambda-calculus, namely state and non-determinism, promotes the specification and verification of modern systems, e.g. distributed systems. I then illustrate the surprising fragility of proof assistants in the presence of such new computational capabilities, and outline my ongoing efforts to develop a more robust foundation. For the converse direction, I show how incorporating modern proof-theoretic techniques offers a more congenial framework for reasoning about hard programming problems and hence facilitates the verification effort.

Representation theory of non-compact real groups, such as SL(2,R), is a fundamental discipline with uses in harmonic analysis, number theory, physics, and more. This theory is analytical in nature, but in the course of the 20th century it was algebraized and geometrized (the key contributions are by Harish-Chandra for the former and by Beilinson-Bernstein for the latter). Roughly and generally speaking, algebraization strips layers from the objects of study until we are left with a bare skeleton, amenable to symbolic manipulation. Geometrization, again very roughly, reveals how algebraic objects have secret lives over spaces - thus more amenable to human intuition. In this talk, I will try to motivate and present one example - the calculation of the Casselman-Jacquet module of a principal series representation (I will explain the terms in the talk).

The Bakry-Emery theorem asserts that uniformly log-concave probability measures satisfy certain functional inequalities, with constants that are better than those associated with the Gaussian measure. In this talk, I will explain how if the constant is almost that of the Gaussian, then the measure almost splits off a Gaussian factor, with explicit quantitative bounds. The proof is based on a combination of Stein's method and simple arguments from calculus of variations. Joint work with Thomas Courtade.

This talk will describe my past and ongoing work on translating images and words between very different datasets without supervision. Humans often do not require supervision to make connections between very different sources of data, but this is still difficult for machines. Recently great progress was made by using adversarial training a powerful yet tricky method. Although adversarial methods have had great success, they have critical failings which significantly limit their breadth of applicability and motivate research into alternative non-adversarial methods. In this talk, I will describe novel non-adversarial methods for unsupervised word translation and for translating images between very different datasets (joint work with Lior Wolf). As image generation models are an important component in our method, I will present a non-adversarial image generation approach, which is often better than current adversarial approaches (joint work with Jitendra Malik).

We all know what is an algebraically closed field and that any field can be embedded into an algebraically closed field. But what happens if multiplication is not commutative? In my talk I'll suggest a definition of an algebraically closed skew field, give an example of such a skew field, and show that not every skew field can be embedded into an algebraically closed one.

It is still unknown whether an algebraically closed skew field exists in the finite characteristic case!

Ruelle's operator theorem states that the Ruelle operator $L$, which is a positive operator acting on Holder functions, is conjugated to $P+R$ where $R$ is a one-dimensional projection and the norm of $R$ is smaller than 1. This decomposition, also known as spectral gap, is of interest as it allows to characterise the underlying dynamical system through, e.g., central limit theorems or continuous response to perturbations. However, the conjugation depends on the existence of a positive eigenfunction of $L$, which might not exist in more general, fibred situations due to purely functorial reasons. A possibility to circumvent this problem is to consider quotients of operators of the form $f \mapsto \frac{L^m(f L^n (1))}{L^{m+n}(1)}.$ In fact, it is possible to provide reasonable conditions such that their dual operators contract the Wasserstein distance exponentially in $m$. The result gives rise, for example, to a law of the iterated logarithm for continued fractions with sequentially restricted entries or a topology on the set of equilibrium states for semigroups of expanding maps. This is joint work with Paulo Varandas and Xuan Zhang.

We all capture the world around us through digital data such as images, videos and sound. However, in many cases, we are interested in certain properties of the data that are either not available or difficult to perceive directly from the input signal. My goal is to "Re-render Reality", i.e., develop algorithms that analyze digital signals and then create a new version of it that allows us to see and hear better. In this talk, I'll present a variety of methodologies aimed at enhancing the way we perceive our world through modified, re-rendered output. These works combine ideas from signal processing, optimization, computer graphics, and machine learning, and address a wide range of applications. More specifically, I'll demonstrate how we can automatically reveal subtle geometric imperfection in images, visualize human motion in 3D, and use visual signals to help us separate and mute interference sound in a video. Finally, I'll discuss some of my future directions and work in progress.

BIO: Tali is a Senior Research Scientist at Google, Cambridge, developing algorithms at the intersection of computer vision and computer graphics. Before Google, she was a Postdoctoral Associate at the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. William T. Freeman. Tali completed her Ph.D studies at the school of electrical engineering, Tel-Aviv University, Israel, under the supervision of Prof. Shai Avidan, and Prof. Yael Moses. Her research interests include computational photography, image synthesize, geometry and 3D reconstruction.

In this talk we survey some formulations of fairness requirements for decision making under uncertainty. We then discuss results from 3 recent papers:

- Treating individuals fairly is not in conflict with long-term scientific learning goals if the population is sufficiently diverse.
- When there is a pipeline of decisions, end-to-end fairness is impossible to achieve even in a very simple model.
- Exploiting the knowledge acquired by others can unfairly advantage the free rider.

These papers are joint work with a number of co-authors:

Christopher Jung, Neil Lutz, Jamie Morgenstern, Aaron Roth, Bo Waggoner, Steven Wu, and Juba Ziani

Despite their superb empirical performance, deep learning models for natural language processing (NLP) are often considered black boxes, as relatively little is known as to what accounts for their success. This lack of understanding turns model development into a slow and expensive trial-and-error process, which limits many researchers from developing state-of-the-ar

This is joint work with Hao Peng, Sam Thomson and Noah A. Smith

Zero knowledge protocols are spectacular, allowing to prove NP statements without revealing anything but their validity. An essential element that enables this wonder is interaction. But how much interaction exactly is needed? This question has long challenged cryptographers and is yet to be settled under standard assumptions. In fact, the question appears to be equally challenging also for natural relaxations of the zero knowledge requirement. The difficulty in answering the round complexity question stems from that of a foundational question in cryptography --- what is the power of non-black-box reductions?

In this talk, I will explain this difficulty and present a new non-black-box technique that resolves, under standard assumptions, the round complexity of weak zero knowledge protocols (Dwork-Naor-Reingold-Stockmeyer '98). Specifically, assuming quasipolynomial hardness of the Learning with Errors problem and fully-homomorphic encryption, we construct a two message protocol, a setting where (full-fledged) zero knowledge is impossible.

The talk will assume no prior knowledge in cryptography. It is based on joint work with Dakshita Khurana and Omer Paneth (the paper can be found on

A long-standing open problem, known as Hadwiger's covering problem, asks what is the smallest natural number $N(n)$ such that every convex body in can be covered by a union of the interiors of at most $N(n)$ of its translates. Despite continuous efforts, the best general upper bound known for this number remains as it was more than sixty years ago, of the order of ${2n \choose n} \ln n$.

In this talk, I will discuss some history of this problem and present a new result in which we improve this bound by a sub-exponential factor. Our approach combines ideas from previous work, with tools from Asymptotic Geometric Analysis. As a key step, we use thin-shell estimates for isotropic log-concave measures to prove a new lower bound for the maximum volume of the intersection of a convex body $K$ with a translate of $-K$. We further show that the same bound holds for the volume of $K\cap(-K)$ if the center of mass of $K$ is at the origin.

If time permits we shall discuss some other methods and results concerning this problem and its relatives.

Joint work with H. Huang, B. Vritsiou, and T. Tkocz

Visual repetitions are abundant in our surrounding physical world: small image patches tend to reoccur within a natural image, and across different rescaled versions thereof. Similarly, semantic repetitions appear naturally inside an object class within image datasets, as a result of different views and scales of the same object. In my thesis I studied deviations from these expected repetitions, and demonstrated how this information can be exploited to tackle both low-level and high-level vision tasks. These include “blind” image reconstruction tasks (e.g. dehazing, deblurring), image classification confidence estimation, and more.

For fixed t>1 and L>3 we establish sharp asymptotic formula for the log-probability that the number of cycles of length L in the Erdos - Renyi random graph G(N,p) exceeds its expectation by a factor t, assuming only that p >> log N/sqrt(N). We obtain such sharp upper tail bounds also for the Schatten norms of the corresponding adjacency matrices, and in a narrower range of p=p(N), also for general subgraph counts. In this talk, based on a recent joint work with Nick Cook, I will explain our approach and in particular our quantitative refinement of Szemeredi's regularity lemma for sparse random graphs in the large deviations regime.

Deep convolutional network architectures are often assumed to guarantee generalization for small image translations and deformations. In this paper we show that modern CNNs (VGG16, ResNet50, and InceptionResNetV2) can drastically change their output when an image is translated in the image plane by a few pixels, and that this failure of generalization also happens with other realistic small image transformations. Furthermore, we see these failures to generalize more frequently in more modern networks. We show that these failures are related to the fact that the architecture of modern CNNs ignores the classical sampling theorem so that generalization is not guaranteed. We also show that biases in the statistics of commonly used image datasets makes it unlikely that CNNs will learn to be invariant to these transformations. Taken together our results suggest that the performance of CNNs in object recognition falls far short of the generalization capabilities of humans.

Joint work with Aharon Azulay

This talk concerns the well-studied model of "cake-cutting" for studying questions regarding notions of fair division of resources. We focus on discrete versions rather than using infinite-precision real values, specifically, focusing on the communication complexity. Using general discrete simulations of classical infinite-precision protocols (Robertson-Webb and moving-knife), we roughly partition the various fair-allocation problems into 3 classes: "easy" (constant number of rounds of logarithmic many bits), "medium" (poly-log total communication), and "hard".

Our main technical result concerns two of the "medium" problems (perfect allocation for 2 players and equitable allocation for any number of players) which we prove are not in the "easy" class. Our main open problem is to separate the "hard" from the "medium" classes.

Joint work with Simina Branzei

For a vector space F^n over a field F, an (η, ß)-dimension expander of degree d is a collection of d linear maps Γ_j : F^n \to F^n such that for every subspace U of F^n of dimension at most ηn, the image of U under all the maps, ∑_{j=1}^d Γ_j(U), has dimension at least ßdim(U). Over a finite field, a random collection of d=O(1) maps Γ_j over excellent “lossless” expansion with high probability: ß ≈ d for η ≥ Ω(1/\eta). When it comes to a family of explicit constructions (for growing n), however, achieving even expansion factor β = 1 + ε with constant degree is a non-trivial goal. We present an explicit construction of dimension expanders over finite fields based on linearized polynomials and subspace designs, drawing inspiration from recent progress on list decoding in the rank-metric. Our approach yields the following:

- Lossless expansion over large fields; more precisely ß ≥ (1–ε)d and η ≥ (1–ε)/d with d=O_ε(1), when |F| ≥ Ω(n).
- Optimal up to constant factors expansion over fields of arbitrarily small polynomial size; more precisely ß ≥ Ω(δd) and η ≥ Ω(1/(δd)) with d = O_δ(1), when |F| ≥ n^δ.

Previously, an approach reducing to monotone expanders (a form of vertex expansion that is highly non-trivial to establish) gave (Ω(1), 1+Ω(1))-dimension expanders of constant degree over all fields. An approach based on “rank condensing via subspace designs” led to dimension expanders with ß ≥ Ω(√d) over large finite fields. Ours is the first construction to achieve lossless dimension expansion, or even expansion proportional to the degree. Based on joint work with Venkatesan Guruswami and Chaoping Xing.

As algorithmic prediction systems have become more widespread, so too have concerns that these systems may be discriminatory against groups of people protected by laws and ethics. We present a recent line of work that takes a complexity theoretic perspective towards combating discrimination in prediction systems. We'll focus on fair classification within the versatile framework of Dwork et al. [ITCS'12], which assumes the existence of a metric that measures similarity between pairs of individuals. Unlike earlier work, we do not assume that the entire metric is known to the learning algorithm; instead, the learner can query this metric a bounded number of times. We propose a new notion of fairness called *metric multifairness* and show how to achieve this notion in our setting. Metric multifairness is parameterized by a similarity metric d on pairs of individuals to classify and a rich collection C of (possibly overlapping) "comparison sets" over pairs of individuals. At a high level, metric multifairness guarantees that *similar subpopulations are treated similarly*, as long as these subpopulations are identified within the class C.

In this talk I will present an algorithms for approximating the number of k-cliques in a graph when given query access to the graph. This problem was previously studied for the cases of k=2 (edges) and k=3 (triangles). We give an algorithm that works for any k >= 3, and is actually conceptually simpler than the k=3 algorithm. We consider the standard query model for general graphs via (1) degree queries, (2) neighbor queries and (3) pair queries. Let n denote the number of vertices in the graph, m the number of edges, and C_k the number of k-cliques. We design an algorithm that outputs a (1+\epsilon)-approximation (with high probability) for C_k, whose expected query complexity and running time are O (\frac{n}{C_k^{1/k}}+\frac{m^{k/2}}{C_k}) poly (\log n, 1/\epsilon,k).

Hence, the complexity of the algorithm is sublinear in the size of the graph for C_k = \omega(m^{k/2-1}). Furthermore, we prove a lower bound showing that the query complexity of our algorithm is essentially optimal (up to the dependence on \log n, 1/\epsilon and k).

This is joint work with Talya Eden and C. Seshadhri.

In typical imaging systems, an image/video is first acquired, then compressed for transmission or storage, and eventually presented to human observers using different and often imperfect display devices. While the resulting quality of the perceived output image may severely be affected by the acquisition and display processes, these degradations are usually ignored in the compression stage, leading to an overall sub-optimal system performance. In this work we propose a compression methodology to optimize the system's end-to-end reconstruction error with respect to the compression bit-cost. Using the alternating direction method of multipliers (ADMM) technique, we show that the design of the new globally-optimized compression reduces to a standard compression of a "system adjusted" signal. Essentially, we propose a new practical framework for the information-theoretic problem of remote source coding. The main ideas of our method are further explained using rate-distortion theory for Gaussian signals. We experimentally demonstrate our framework for image and video compression using the state-of-the-art HEVC standard, adjusted to several system layouts including acquisition and rendering models. The experiments established our method as the best approach for optimizing the system performance at high bit-rates from the compression standpoint.

In addition, we relate the proposed approach also to signal restoration using complexity regularization, where the likelihood of candidate solutions is evaluated based on their compression bit-costs.

Using our ADMM-based approach, we present new restoration methods relying on repeated applications of standard compression techniques. Thus, we restore signals by leveraging state-of-the-art models designed for compression. The presented experiments show good results for image deblurring and inpainting using the JPEG2000 and HEVC compression standards.

* Joint work with Prof. Alfred Bruckstein and Prof. Michael Elad.

** More details about the speaker and his research work are available at http://ydar.cswp.cs.technion.ac.il/

We consider the goal of social welfare maximization where a single item is to be assigned to one of to n potential agents with interdependent values.

That is, each agent has her own private signal, and the valuation of each agent is a known function of all n private signals. This captures settings such as valuations for artwork, oil drilling rights, broadcast rights, and many more. In the interdependent value setting, all previous work has assumed a so-called single-crossing condition. Single-crossing means that the impact of agent i’s private signal, s_i, on her own valuation is greater than the impact of s_ii on the valuation of any other agent. It is known that without the single-crossing condition an efficient outcome cannot be obtained. We study welfare maximization for interdependent valuations through the lens of approximation.

We show that, in general, without the single-crossing condition, one cannot hope to approximate the optimal social welfare any better than the approximation given by assigning the item to a random bidder. Consequently, we introduce a relaxed version of single-crossing, c-single-crossing, parameterized by c ≥ 1, which means that the impact of s_i on the valuation of agent i is at least 1/c times the impact of s_i on the valuation of any other agent (c = 1 is single-crossing). Using this parameterized notion, we obtain a host of positive results. We also consider interdependent settings when valuations are concave and give improved results.

Joint work with Alon Eden, Michal Feldman, and Kira Goldner.

Given an underlying finite point set P in the plane, we seek a small set Q that would hit any convex set that contains at least an Epsilon-fraction of P. Such a set Q is called a weak Epsilon-net. The study of Epsilon-nets is central to Computational and Combinatorial Geometry, and it bears important connections to Statistical Learning Theory, Extremal Combinatorics, Discrete Optimization, and other areas.

It is an outstanding open problem to determine tight asymptotic bounds on weak Epsilon-nets with respect to convex sets. For any underlying point set in the plane we describe such a net whose cardinality is roughly proportional to Epsilon^{-3/2}. This is the first improvement of the over-25-year-old bound of Alon, Barany, Furedi, and Kleitman.

The tremendous success of the Machine Learning paradigm heavily relies on the development of powerful optimization methods. The canonical algorithm for training learning models is SGD (Stochastic Gradient Descent), yet this method has its limitations. It is often unable to exploit useful statistical/geometric structure, it might degrade upon encountering prevalent non-convex phenomena, and it is hard to parallelize. In this talk I will discuss an ongoing line of research where we develop alternative methods that resolve some of SGD"s limitations. The methods that I describe are as efficient as SGD, and implicitly adapt to the underlying structure of the problem in a data dependent manner.

In the first part of the talk, I will discuss a method that is able to take advantage of hard/easy training samples. In the second part, I will discuss a method that enables an efficient parallelization of SGD. Finally, I will briefly describe a method that implicitly adapts to the smoothness and noise properties of the learning objective.

In the minimum k-edge-connected spanning subgraph (k-ECSS) problem the goal is to find the minimum weight subgraph resistant to up to k-1 edge failures. This is a central problem in network design, and a natural generalization of the minimum spanning tree (MST) problem. In this talk, I will present fast randomized distributed approximation algorithms for k-ECSS in the CONGEST model.

SRB measures are an important object in dynamical systems and mathematical physics. Named after Sinai , Ruelle and Bowen, these measures have important properties of being equilibrium states which describe chaotic behaviour, yet may also describe the asymptotic of ``observable” events in the phase space. An open and important question, is in what generality do systems admit SRB measures?

We present the notion of generalised SRB measures (GSRB in short), mention some of their important properties, and present a new condition to characterise their existence on a general setup.

The first part of the talk will describe some of the motivation leading to define and to study SRB measures; and so we will define GSRB measures and compare their properties with the properties sought for SRB measures. We will also describe a case study of examples motivating to study GSRB measures. Our new result is a characterisation of systems admitting GSRB measures.

In the second part of the talk, as much as time permits, we will present some key steps in the construction of GSRB measures.

The "Duty of Care"is a legal obligation requiring one to adhere to a standard of reasonable care while performing acts that could harm others. We propose a formal, mathematical interpretation of the duty of care in the context of "being cautious while driving". The framework enables tackling moral questions in a formal way and is an example of a new research field which we call regulatory science.

Joint work with Shaked Shammah and Amnon Shashua.

Single-particle cryo-electron microscopy (cryo-EM) is an innovative technology for elucidating structures of biological molecules at atomic-scale resolution. In a cryo-EM experiment, tomographic projections of a molecule, taken at unknown viewing directions, are embedded in highly noisy images at unknown locations. The cryo-EM problem is to estimate the 3-D structure of a molecule from these noisy images.

Inspired by cryo-EM, the talk will focus on two estimation problems: *multi-reference alignment* and *blind deconvolution*. These problems abstract away much of the intricacies of cryo-EM, while retaining some of its essential features. In multi-reference alignment, we aim to estimate a signal from its noisy, rotated observations. While the rotations and the signal are unknown, the goal is only to estimate the signal. In the blind deconvolution problem, the goal is to estimate a signal from its convolution with an unknown, sparse signal in the presence of noise. Focusing on the low SNR regime, I will propose the *method of moments* as a computationally efficient estimation framework for both problems and will introduce its properties. In particular, I will show that the method of moments allows estimating the sought signal accurately in any noise level, provided sufficiently many observations are collected, with only one pass over the data. I will then argue that the same principles carry through to cryo-EM, show examples, and draw potential implications.

The representation theory of GL(n,F), F non-archimedean is a classical subject initiated by Bernstein and Zelevinsky in the 1970s.

I will review some recent results and conjectures which aim to characterize irreducibility of parabolic induction, in terms of geometry. Joint with Alberto Minguez

Ramsey theory assures us that in any graph there is a clique or independent set of a certain size, roughly logarithmic in the graph size. But how difficult is it to find the clique or independent set? If the graph is given explicitly, then it is possible to do so while examining a linear number of edges. If the graph is given by a black-box, where to figure out whether a certain edge exists the box should be queried, then a large number of queries must be issued. But what if one is given a program or circuit for computing the existence of an edge? What if we are assured that the program is small without being given explicitly?

In this talk I will explore recent work on the complexity of search problems with guaranteed solution (the class TFNP) and the tight relationship with cryptographic assumptions and techniques.

Based on joint works with Pavel Hubacek, Ilan Komargodski and Eylon Yogev

In this talk I will survey the way machine learning research is affecting the field of security, and the way security research Is affecting the field of machine learning. After giving several examples of each type, I will discuss some of the latest developments In adversarial machine learning research, which are used by hackers to defeat security mechanisms based on regular machine learning techniques

In the vector balancing problem, we are given N vectors v_1,..., v_N in an n-dimensional normed space, and our goal is to assign signs to them, so that the norm of their signed sum is as small as possible. The balancing constant of the vectors is the smallest number beta, such that any subset of the vectors can be balanced so that their signed sum has norm at most beta.

The vector balancing constant generalizes combinatorial discrepancy, and is related to rounding problems in combinatorial optimization, and to the approximate Caratheodory theorem. We study the question of efficiently approximating the vector balancing constant of any set of vectors, with respect to an arbitrary norm. We show that the vector balancing constant can be approximated in polynomial time to within factors logarithmic in the dimension, and is characterized by (an appropriately optimized version of) a known volumetric lower bound. Our techniques draw on results from geometric functional analysis and the theory of Gaussian processes. Our results also imply an improved approximation algorithm for hereditary discrepancy.

Joint work with Aleksandar Nikolov, Nicole Tomczak-Jaegermann and Kunal Talwar.

In the talk I will tell about a theory of cyclic elements in semisimple Lie algebras. This notion was introduced by Kostant, who associated a cyclic element with the principal nilpotent and proved that it is regular semisimple. In particular, I will tell about the classification of all nilpotents giving rise to semisimple and regular semisimple cyclic elements. The results are from my joint work with V. Kac and E. Vinberg.

There is a movement in design of experiments away from the classic randomization put forward by Fisher, Cochran and others to one based on optimization. In fixed-sample trials comparing two groups, measurements of subjects are known in advance and subjects can be divided optimally into two groups based on a criterion of homogeneity or "imbalance" between the two groups. These designs are far from random. This talk seeks to understand the benefits and the costs over classic randomization in the context of different performance criterions such as Efron's worst-case analysis. In the criterion that we motivate, randomization beats optimization. However, the optimal design is shown to lie between these two extremes. Much-needed further work will provide a procedure to find this optimal designs in different scenarios in practice. Until then, it is best to randomize.

I will define the Enright functor for contragredient Lie superalgebras and discuss its properties. If time permits, we may discuss a proof of Arakawa's Theorem for osp(1|2l).

This work is a joint project with V. Serganova.

The goal of my talk (based on joint work with Vladimir Retakh) is to introduce noncommutative clusters and their mutations, which can be viewed as generalizations of both classical and quantum cluster structures.

Each noncommutative cluster S is built on a torsion-free group G and a certain collection of its automorphisms. We assign to S a noncommutative algebra A(S) related to the group algebra of G, which is an analogue of the cluster algebra, and establish a noncommutative version of Laurent Phenomenon in some algebras A(S).

"Cluster groups" G for which the Noncommutative Laurent Phenomenon holds include triangular groups of marked surfaces (closely related to the fundamental groups of their ramified double covers), free group of rank 2, and principal noncommutative tori which exist for any exchange matrix B.

Various algebraic and topological situations give rise to compatible sequences of representations of different groups, such as the symmetric groups, with stable asymptotic behavior. Representation stability is a recent approach to studying such sequences, which has proved effective for extracting important invariants. Coming from this point of view, I will introduce the associated character theory, which formally explains many of the approach's strengths (in char 0). Central examples are simultaneous characters of all symmetric groups, or of all Gl(n) over some finite field. Their mere existence gives applications to statistics of random matrices over finite fields, and raises many combinatorial questions.

We will show that there exists a metric space that does not admit a coarse embedding into any Alexandrov space of global nonpositive curvature, thus answering a question of Gromov (1993). In contrast, any metric space embeds coarsely into an Alexandorv space of nonnegative curvature. Based on joint works with Andoni and Neiman, and Eskenazis and Mendel.

Let $X$ be an $n$-dimensional smooth projective variety over a non-Archimedean local field $K$, such as the $p$-adic numbers, and let $\omega$ be an global $n$-form on $X$. The set $X(K)$ of $K$-points on $X$ has the structure of a $K$-analytic manifold, and $\omega$ induces a measure $|\omega|$ on $X(K)$. For any finite extension $K'$ of $K$, there is a natural continuous map from $X(K')$ to the Berkovich analytification $X^{\mathrm{an}}$ of $X$. We study the asymptotics of the images of the measures $|\omega\otimes_KK'|$ on $X^{\mathrm{an}}$ as $K'$ runs through towers of finite extensions of $K$. This is joint work with Johannes Nicaise.

Let F be a fixed infinite, vertex-transitive graph. We say a graph G is `r-locally F' if for every vertex v of G, the ball of radius r and centre v in G is isometric to the ball of radius r in F. For each positive integer n, let G_n be a graph chosen uniformly at random from the set of all unlabelled, n-vertex graphs that are r-locally F. We investigate the properties that the random graph G_n has with high probability --- i.e., how these properties depend on the fixed graph F.

We show that if F is a Cayley graph of a torsion-free group of polynomial growth, then there exists a positive integer r_0 such that for every integer r at least r_0, with high probability the random graph G_n = G_n(F,r) defined above has largest component of size between n^{c_1} and n^{c_2}, where 0 < c_1 < c_2 < 1 are constants depending upon F alone, and moreover that G_n has a rather large automorphism group. This contrasts sharply with the random d-regular graph G_n(d) (which corresponds to the case where F is replaced by the infinite d-regular tree).

Our proofs use a mixture of results and techniques from group theory, geometry and combinatorics.

We obtain somewhat more precise results in the case where F is L^d (the standard Cayley graph of Z^d): for example, we obtain quite precise estimates on the number of n-vertex graphs that are r-locally L^d, for r at least linear in d.

Many intriguing open problems remain: concerning groups with torsion, groups with faster than polynomial growth, and what happens for more general structures than graphs.

This is joint work with Itai Benjamini (WIS).

I will describe a new approach to the study of spherical spin glass models via free energy landscapes, defined by associating to interior points of the sphere the free energy computed only over the spherical band around them.

They are closely related to several fundamental objects from spin glass theory: the TAP free energy, pure states decomposition, overlap distribution, and temperature chaos. I will explain some of of those connections.

TBA

TBA

In this talk I explore various limitations of sensors and displays, and suggest new ways to overcome them.

These include:

- The limited 2D displays of today's screens - I will show how we can design new displays that enable us to see in 3D *without* any glasses.
- The limited spatial resolution of images - I will discuss the crucial factors for successful Super-Resolution.
- The poor image quality due to motion blur (due to camera motion or scene motion) - I will present a new approach for Blind Non-Uniform Deblurring.

A classical problem in causal inference is that of matching treatment units to control units in an observational dataset. This problem is distinct from simple estimation of treatment effects as it provides additional practical interpretability of the underlying causal mechanisms that is not available without matching. Some of the main challenges in developing matching methods arise from the tension among (i) inclusion of as many covariates as possible in defining the matched groups, (ii) having matched groups with enough treated and control units for a valid estimate of average treatment effect in each group, (iii) computing the matched pairs efficiently for large datasets, and (iv) dealing with complicating factors such as non-independence among units. We propose the Fast Large-scale Almost Matching Exactly (FLAME) framework to tackle these problems for categorical covariates. At its core this framework proposes an optimization objective for match quality that captures covariates that are integral for making causal statements while encouraging as many matches as possible. We demonstrate that this framework is able to construct good matched groups on relevant covariates and further extend the methodology to incorporate continuous and other complex covariates. related papers: https://arxiv.org/abs/1707.06315, https://arxiv.org/abs/1806.06802

Moosa and Scanlon defined a general notion of "fields with operators'', that generalizes those of difference and differential fields. In the case of "free'' operators in characteristic zero they also analyzed the basic model-theoretic properties of the theory of such fields. In particular, they showed in this case the existence of the model companion, a construction analogous to that of algebraically closed fields for usual fields. In positive characteristic, they provided an example showing that the model companion need not exist.

I will discuss work, joint with Beyarslan, Hoffman and Kowalski, that completes the description of the free case, namely, it provides a full classification of those free operators for which the model companion exists. Though the motivating question is model theoretic, the description and the proof are completely algebraic and geometric. If time permits, I will discuss additional properties, such as quantifier elimination. All notions related to model theory and to fields with operators will be explained (at least heuristically).

SPECIAL NOTE: this will be part of a model theory day. Thus, the talk will be preceded by an introduction to algebraic geometry by the same speaker, 10-10:45 (in Room 1) and followed by a talk by Nick Ramsey " Classification Theory and the Construction of PAC Fields" , 14-16 (in Room 155). See https://mt972.weebly.com/ for more information

Technological advances have changed every aspect of our lives in recent decades, yet, for the most part, the same systems of democratic decision making have been in place for centuries. I will argue that computer scientists can help rethink the practice of democracy, as well as its potential applications. I will focus on three emerging paradigms that go far beyond your run-of-the-mill election: (i) liquid democracy, an approach that allows voters to transitively delegate their votes; (ii) participatory budgeting, whereby residents collectively decide how to spend their local government's budget; and (iii) virtual democracy, which employs instant elections among machine learning models of real voters to address the grand AI challenge of ethical decision making.

We present a joint audio-visual model for isolating a single speech signal from a mixture of sounds such as other speakers and background noise. Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video. In this paper, we present a deep network-based model that incorporates both visual and auditory signals to solve this task. The visual features are used to "focus" the audio on desired speakers in a scene and to improve the speech separation quality. To train our joint audio-visual model, we introduce AVSpeech, a new dataset comprised of thousands of hours of video segments from the Web. We demonstrate the applicability of our method to classic speech separation tasks, as well as real-world scenarios involving heated interviews, noisy bars, and screaming children, only requiring the user to specify the face of the person in the video whose speech they want to isolate. Our method shows clear advantage over state-of-the-art audio-only speech separation in cases of mixed speech. In addition, our model, which is speaker-independent (trained once, applicable to any speaker), produces better results than recent audio-visual speech separation methods that are speaker-dependent (require training a separate model for each speaker of interest).

Joint work with Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, Bill Freeman and Miki Rubinstein of Google Research.

Cryo-electron microscopy (cryo-EM) is an imaging technology that is revolutionizing structural biology, enabling reconstruction of molecules at near-atomic resolution.

Cryo-EM produces a large number of noisy two-dimensional tomographic projection images of a molecule, taken at unknown viewing directions.

The extreme levels of noise make classical tasks in statistics and signal processing, such as alignment, detection and clustering, very challenging.

I will start the talk by studying the multi-reference alignment problem, which can be interpreted as a simplified model for cryo-EM. In multi-reference alignment, we aim to estimate multiple signals from circularly-translated, unlabeled, noisy copies.

In high noise regimes, the measurements cannot be aligned or clustered. Nonetheless, accurate and efficient estimation can be achieved via group-invariant representations (invariant polynomials). Furthermore, such estimators achieve the optimal estimation rate.

Then, I will show how this framework can be applied to the problem of 2-D classification in cryo-EM. In the last part of the talk, I will introduce the analog invariants of the cryo-EM problem and discuss how they can be used for ab initio modeling.

generation of neuronal network oscillations are not well understood. We view this process as the individual neurons' oscillations being communicated among the nodes in the network, mediated by the impedance profiles of the isolated (uncoupled) individual neurons. In order to test this idea, we developed a mathematical tool that we refer to as the Frequency Response Alternating Map (FRAM). The FRAM describes how the impedances of the individual oscillators interact to generate network responses to oscillatory inputs. It consists of decoupling the non-autonomous part of the coupling term and substituting the reciprocal coupling by a sequence of alternating one-directional forcing effects (cell 1 forces cell 2, which in turn forces cell 1 and so on and so forth). The end result is an expression of the network impedance for each node (in the network) as power series, each term involving the product of the impedances of the autonomous part of the individual oscillators. For linear systems we provide analytical expressions of the FRAM and we show that its convergence properties and limitations. We illustrate numerically that this convergence is relatively fast. We apply the FRAM to the phenomenon of network resonance to the simplest type of oscillatory network: two non-oscillatory nodes receiving oscillatory inputs in one or the two nodes. We discuss extensions of the FRAM to include non-linear systems and other types of network architectures.

The use of latent variables in probabilistic modeling is a standard approach in numerous data analysis applications. In recent years, there has been a surge of interest in spectral methods for latent variable models, where inference is done by analyzing the lower order moments of the observed data. In contrast to iterative approaches such as the EM algorithm, under appropriate conditions spectral methods are guaranteed to converge to the true model parameters given enough data samples.

The focus of the seminar is the development of novel spectral based methods for two problems in statistical machine learning. In the first part, we address unsupervised ensemble learning, where one obtains predictions from different sources or classifiers, yet without knowing the reliability and expertise of each source, and with no labeled data to directly assess it. We develop algorithms to estimate the reliability of the classifiers based on a common assumption that different classifiers make statistically independent errors. In addition, we show how one can detect subsets of classifiers that strongly violate the model of independent errors, in a fully unsupervised manner.

In the second part of the seminar we show how one can use spectral methods to learn the parameters of binary latent variable models. This model has many applications such as overlapping clustering and Gaussian-Bernoulli restricted Boltzmann machines. Our methods are based on computing the eigenvectors of both the second and third moments of the observed variables.

For both problems, we show that spectral based methods can be applied effectively, achieving results that are state of the art in various problems in computational biology and population genetics.

Information transfer between triangle meshes is of great importance in computer graphics and geometry processing. To facilitate this process, a smooth and accurate map is typically required between the two meshes. While such maps can sometimes be computed between nearly-isometric meshes, the more general case of meshes with diverse geometries remains challenging. This talk describes a novel approach for direct map computation between triangle meshes without mapping to an intermediate domain, which optimizes for the harmonicity and reversibility of the forward and backward maps. Our method is general both in the information it can receive as input, e.g. point landmarks, a dense map or a functional map, and in the diversity of the geometries to which it can be applied. We demonstrate that our maps exhibit lower conformal distortion than the state-of-the-art, while succeeding in correctly mapping key features of the input shapes.

(joint with Noam Aigerman, Raz Sluzky and Yaron Lipman)

Computing homeomorphisms between surfaces is an important task in shape analysis fields such as computer graphics, medical imaging and morphology. A fundamental tool for these tasks is solving Dirichlet's problem on an arbitrary Jordan domain with disc topology, where the boundary of the domain is mapped homeomorphically to the boundary of a specific target domain: A convex polygon. By the Rado-Kneser-Choquet Theorem such harmonic mappings are homeomorphisms onto the convex polygon. Standard finite element approximations of harmonic mappings lead to discrete harmonic mappings, which have been proven to be homeomorphisms as well. Computing the discrete harmonic mappings is very efficient and reliable as the mappings are obtained as the solution of a sparse linear system of equations.

In this talk we show that the methodology above, can be used to compute *conformal* homeomorphisms, for domains with either disc or sphere topology:

By solving Dirichlet's problem with correct boundary conditions, we can compute conformal homeomorphisms from arbitrary Jordan domains to a specific canonical domain- a triangle. The discrete conformal mappings we compute are homeomorphisms, and approximate the conformal homeomorphism uniformly and in H^1. Similar methodology can also be used to conformally map a sphere type surface to a planar Jordan domain, whose edges are identified so that the planar domain has the topology of a sphere.

In knot theory and more generally the theory of 3-manifolds various "quantum invariants" like the Witten-Reshetikhin-Turaev or the Kashaev invariant have been much studied in recent years, in particular because of the famous "volume conjecture" related to the asymptotic growth of the Kashaev invariant. Rather surprisingly, it transpired a few years ago that these invariants also have very non-trivial number-theoretical properties, including a kind of weak invariance under the modular group SL(2,Z) ("quantum modular forms") and the experimental discovery of the appearance of certain units in cyclotomic extensions as factors in the asymptotic expansions. The talk will report on this and specifically on recent joint work with Frank Calegari and Stavros Garoufalidis that constructs such units in a purely algebraic way starting from elements in algebraic K-theory or in the more elementary "Bloch group". As an unexpected application, this result led to the proof of a well-known conjecture of Nahm on the modularity of certain q-hypergeometric series.

The classical Gaussian isoperimetric inequality, established in the 70's independently by Sudakov-Tsirelson and Borell, states that the optimal way to decompose $\mathbb{R}^n$ into two sets of prescribed Gaussian measure, so that the (Gaussian) area of their interface is minimal, is by using two complementing half-planes. This is the Gaussian analogue of the classical Euclidean isoperimetric inequality, and is therefore referred to as the "single-bubble" case.

A natural generalization is to decompose $\mathbb{R}^n$ into $q \geq 3$ sets of prescribed Gaussian measure. It is conjectured that when $q \leq n+1$, the configuration whose interface has minimal (Gaussian) area is given by the Voronoi cells of $q$ equidistant points. For example, for $q=3$ (the "double-bubble"conjecture) in the plane ($n=2$), the interface is conjectured to be a "tripod" or "Y" - three rays meeting at a single point in 120 degree angles. For $q=4$ (the "triple-bubble" conjecture) in $\mathbb{R}^3$, the interface is conjectured to be a tetrahedral cone.

We confirm the Gaussian double-bubble and, more generally, multi-bubble conjectures for all $3 \leq q \leq n+1$. The double-bubble case $q=3$ is simpler, and we will explain why.

None of the numerous methods discovered over the years for establishing the classical $q=2$ case seem amenable to the $q \geq 3$ cases, and our method consists of establishing a Partial Differential Inequality satisfied by the isoperimetric profile. To treat $q > 3$, we first prove that locally minimimal configurations must have flat interfaces, and thus convex polyhedral cells. Uniqueness of minimizers up to null-sets is also established.

This is joint work with Joe Neeman (UT Austin).

Typically, when semi-discrete approximations to time-dependent partial differential equations (PDE) or schemes for ordinary differential equation (ODE) are constructed they are derived such that they are stable and have a specified truncation error $\tau$. Under these conditions, the Lax-Richtmyer equivalence theorem assures that the scheme converges and that the error is, at most, of the order of $|| \tau ||$. In most cases, the error is in indeed of the order of $|| \tau ||$. We demonstrate that schemes can be constructed, whose truncation errors are $\tau$, however, the actual errors are much smaller. This error reduction is made by constructing the schemes such that they inhibit the accumulation of the local errors; therefore they are called Error Inhibiting Schemes (EIS).

In the Minimum Hypergraph Bisection problem, the vertex set of a hypergraph has to be partitioned into two parts of equal size so that the number of hyperedges intersecting both parts is minimized.

This problem is a natural generalization of the well-studied Minimum Bisection problem in graphs.

We present a sharp distinction between Minimum Bisection in hypergraphs and graphs.

Whereas it is well-known that all bi-criteria approximation algorithms for Minimum Bisection in graphs can be extended to hypergraphs with the exact same guarantees, we prove that this is not the case when considering true (i.e., non bi-criteria) approximation algorithms.

Specifically, we show that Minimum Hypergraph Bisection admits an $\tilde{\mathcal{O}}(\sqrt{n})$ approximation algorithm.

However, we also show that any $\alpha$-approximation algorithm for Minimum Hypergraph Bisection implies an approximation of $\Omega(\alpha^{-2})$ for Densest $k$-Subgraph.

Thus, assuming the exponential time hypothesis there is no efficient approximation algorithm for Minimum Hypergraph Bisection with an approximation ratio $n^{poly(\log{\log{n}})}$.

In particular, Minimum Hypergraph Bisection is much harder to approximate than Minimum Bisection in graphs, for which a logarithmic approximation algorithm exists.

If time permits, the problem of constructing trees that are cut sparsifiers for hypergraph and vertex cuts will also be discussed.

While similar trees lie at the heart of powerful algorithms for Minimum Bisection in graphs, we prove that this is not the case for hypergraphs.

Joint work with Harald R\"{a}cke and Richard Stotz.

**Ohad Feldheim: Convergence of a quantile admission processes**

**Abstract**: Consider the following stochastic model for a growing set. At time 0 the model consists of the singleton S = {-infty}. At every subsequent time, two i.i.d. samples, distributed according to some distribution D on R, are suggested as candidates for S. If the smaller among the two is closer to at least a fraction of r of the current elements of S (in comparison with the larger one), then it is admitted into S.

How will the distribution of the members of S evolve over time as a function of r and D?

This model was suggested by Alon, Feldman, Mansour, Oren and Tennenholtz as a model for the evolution of an exclusive social group. We'll show that the empirical distribution of the elements of S converges to a (not-necessarily deterministic) limit distribution for any r and D.

This we do by relating the process to a random walk in changing environment. The analysis of this random walk involves various classical exponential concentration inequalities as well as a new general inequality concerning mean and minimum of independent random variables.

Joint work with Naomi Feldheim

**Eviatar Procaccia:** **Stabilization of Diffusion Limited Aggregation in a Wedge**

**Abstract**: We prove a discrete Beurling estimate for the harmonic measure in a wedge in $\mathbf{Z}^2$, and use it to show that Diffusion Limited Aggregation (DLA) in a wedge of angle smaller than $\pi/4$ stabilizes. This allows to consider the infinite DLA and questions about the number of arms, growth and dimension. I will present some conjectures and open problems.

Artificial illumination plays a vital role in human civilization. In computer vision, artificial light is extensively used to recover 3D shape, reflectance, and further scene information. However, in most computer vision applications using artificial light, some additional structure is added to the illumination to facilitate the task at hand. In this work, we show that the ubiquitous alternating current (AC) lights already have a valuable inherent structure that stems from bulb flicker. By passively sensing scene flicker, we reveal new scene information which includes: the type of bulbs in the scene, the phases of the electric grid up to city scale, and the light transport matrix. This information yields unmixing of reflections and semi-reflections, nocturnal high dynamic range, and scene rendering with bulbs not observed during acquisition. Moreover, we provide methods that enable capturing scene flicker using almost any off-the-shelf camera, including smartphones.

In underwater imaging, similar structures are added to illumination sources to overcome the limited imaging range. In this setting, we show that by optimizing camera and light positions while taking the effect of light propagation in scattering media into account we achieve superior imaging of underwater scenes while using simple, unstructured illumination.

The fisherman caught a quantum fish. "Fisherman, please let me go", begged the fish, "and I will grant you three wishes". The fisherman agreed. The fish gave the fisherman a quantum computer, three quantum signing tokens and his classical public key. The fish explained: "to sign your three wishes, use the tokenized signature scheme on this quantum computer, then show your valid signature to the king, who owes me a favor".

The fisherman used one of the signing tokens to sign the document "give me a castle!" and rushed to the palace. The king executed the classical verification algorithm using the fish's public key, and since it was valid, the king complied.

The fisherman's wife wanted to sign ten wishes using their two remaining signing tokens. The fisherman did not want to cheat, and secretly sailed to meet the fish. "Fish, my wife wants to sign ten more wishes". But the fish was not worried: "I have learned quantum cryptography following the previous story (The Fisherman and His Wife by the brothers Grimm). The quantum tokens are consumed during the signing. Your polynomial wife cannot even sign four wishes using the three signing tokens I gave you".

"How does it work?" wondered the fisherman. "Have you heard of quantum money? These are quantum states which can be easily verified but are hard to copy. This tokenized quantum signature scheme extends Aaronson and Christiano's quantum money scheme, and a variant by Zhandry, which is why the signing tokens cannot be copied".

"Does your scheme have additional fancy properties?" the fisherman asked. "Yes, the scheme has other security guarantees: revocability, testability and everlasting security. Furthermore, if you're at sea and your quantum phone has only classical reception, you can use this scheme to transfer the value of the quantum money to shore", said the fish, and swam away.

Joint work with Shalev Ben-David. https://arxiv.org/abs/1609.09047

**Bhaswar Bhattacharya**: Large Deviation Variational Problems in Random Combinatorial Structures

The upper tail problem in the Erdos-Renyi random graph $G\sim\mathcal{G}_{n,p}$, where every edge is included independently with probability $p$, is to estimate the probability that the number of copies of a graph $H$ in $G$ exceeds its expectation by a factor of $1+\delta$. The arithmetic analog of this problem counts the number of $k$-term arithmetic progressions in a random subset of $\{1, 2, \ldots, N\}$, where every element is included independently with probability $p$. The recently developed framework of non-linear large deviations (Chatterjee and Dembo (2016) and Eldan (2017)) shows that the logarithm of these tail probabilities can be reduced to a natural variational problem on the space of weighted graphs/functions. In this talk we will discuss methods for solving these variational problems in the sparse regime ($p \rightarrow 0$), and show how the solutions are often related to extremal problems in combinatorics. (This is based on joint work with Shirshendu Ganguly, Eyal Lubetzky, Xuancheng Shao, and Yufei Zhao.)

**Elliot Paquette**: Random matrix point processes via stochastic processes

In 2007, Virag and Valko introduced the Brownian carousel, a dynamical system that describes the eigenvalues of a canonical class of random matrices. This dynamical system can be reduced to a diffusion, the stochastic sine equation, a beautiful probabilistic object requiring no random matrix theory to understand. Many features of the limiting eigenvalue point process, the Sine--beta process, can then be studied via this stochastic process. We will sketch how this stochastic process is connected to eigenvalues of a random matrix and sketch an approach to two questions about the stochastic sine equation: deviations for the counting Sine--beta counting function and a functional central limit theorem.

The celebrated Gan-Gross-Prasad conjectures aim to describe the branching behavior of representations of classical groups, i.e., the decomposition of irreducible representations when restricted to a lower rank subgroup.

These conjectures, whose global/automorphic version bear significance in number theory, have thus far been formulated and resolved for the generic case.

In this talk, I will present a newly formulated rule in the p-adic setting (again conjectured by G-G-P) for restriction of representations in non-generic Arthur packets of GL_n.

Progress towards the proof of the new rule takes the problem into the rapidly developing subject of quantum affine algebras. These techniques use a version of the Schur-Weyl duality for affine Hecke algebras, combined with new combinatorial information on parabolic induction extracted by Lapid-Minguez.

All known algorithms for solving NP-complete problems require exponential time in the worst case; however, these algorithms nevertheless solve many problems of practical importance astoundingly quickly, and are hence relied upon in a broad range of applications. This talk is built around the observation that "Empirical Hardness Models" - statistical models that predict algorithm runtime on novel instances from a given distribution - work surprisingly well. These models can serve as powerful tools for algorithm design, specifically by facilitating automated methods for algorithm design and for constructing algorithm portfolios. They also offer a statistical alternative to beyond-worst-case analysis and a starting point for theoretical investigations.

bio at http://www.cs.ubc.ca/~kevinlb/bio.html

In analysis, a convolution of two functions usually results in a smoother, better behaved function. Given two morphisms f,g from algebraic varieties X,Y to an algebraic group G, one can define a notion of convolution of these morphisms. Analogously to the analytic situation, this operation yields a morphism (from X x Y to G) with improved smoothness properties.

In this talk, I will define a convolution operation and discuss some of its properties. I will then present a recent result; if G is an algebraic group, X is smooth and absolutely irreducible, and f:X-->G is a dominant map, then after finitely many self convolutions of f, we obtain a morphism with the property of being flat with fibers of rational singularities (a property which we call (FRS)).

Moreover, Aizenbud and Avni showed that the (FRS) property has an equivalent analytic characterization, which leads to various applications such as counting points of schemes over finite rings, representation growth of certain compact p-adic groups and arithmetic groups of higher rank, and random walks on (algebraic families of) finite groups. We will discuss some of these applications, and maybe some of the main ideas of the proof of the above result.

Joint with Yotam Hendel.

Random surfaces in statistical physics are commonly modeled by a real-valued function phi on a lattice, whose probability density penalizes nearest-neighbor fluctuations. Precisely, given an even function V, termed the potential, the energy H(phi) is computed as the sum of V over the nearest-neighbor gradients of phi, and the probability density of phi is set proportional to exp(-H(phi)). The most-studied case is when V is quadratic, resulting in the so-called Gaussian free field. Brascamp, Lieb and Lebowitz initiated in 1975 the study of the global fluctuations of random surfaces for other potential functions and noted that understanding is lacking even in the case of the quartic potential, V(x)=x^4. We will review the state of the art for this problem and present recent work with Alexander Magazinov which finally settles the question of obtaining upper bounds for the fluctuations for the quartic and many other potential functions.

Fast and robust three-dimensional reconstruction of facial geometric structure from a single image is a challenging task with numerous applications in computer vision and graphics. We propose to leverage the power of convolutional neural networks (CNNs) to produce highly detailed face reconstruction directly from a single image. For this purpose, we introduce an end-to-end CNN framework which constructs the shape in a coarse-to-fine fashion. The proposed architecture is composed of two main blocks, a network which recovers the coarse facial geometry (CoarseNet), followed by a CNN which refines the facial features of that geometry (FineNet). To alleviate the lack of training data for face reconstruction, we train our model using synthetic data as well as unlabeled facial images collected from the internet. The proposed model successfully recovers facial geometries from real images, even for faces with extreme expressions and under varying lighting conditions. In this talk, I will summarize three papers that were published at 3DV 2016, CVPR 2017 (as an oral presentation), and ICCV 2017.

Bio: Matan Sela holds a Ph.D in Computer Science from the Technion - Israel Institute of Technology. He received B.Sc. and M.Sc. (both with honors) in electrical engineering, both from The Technion - Israel Institute of Technology in 2012 and 2015, respectively. During summer 2017, he was a research intern at Google, Mountain View, California. His interests are Machine Learning, Computer Vision, Computer Graphics, Geometry Processing and any combination of thereof.

The Baum-Bott indexes are important local invariants for singular holomorphic foliations by curves with isolated singularities. On a compact complex manifold, we consider foliations with fixed cotangent bundle. Then, the Baum-Bott map associates to a foliation its Baum-Bott indexes on its singularities. We focus on foliations on the projective space and we are interested in the generic rank of that map. The generic rank for foliations on the projective plane is known. For high-dimensional projectives spaces, we give an upper bound and in some cases we determine the generic rank.

This paper makes progress on the problem of explicitly constructing a binary tree code with constant distance and constant alphabet size.

For every constant delta < 1 we give an explicit binary tree code with distance delta and alphabet size poly(log n), where n is the depth of the tree. This is the first improvement over a two-decade-old construction that has an exponentially larger alphabet of size poly(n).

As part of the analysis, we prove a bound on the number of positive integer roots a real polynomial can have in terms of its sparsity with respect to the Newton basis---a result of independent interest.

Joint work with G. Cohen and B. Haeupler

**David Jerison:** Localization of eigenfunctions via an effective potential

We discuss joint work with Douglas Arnold, Guy David, Marcel Filoche and Svitlana Mayboroda. Consider the Neumann boundary value problem for the operator $L u = - \mbox{div} (A \nabla u) + Vu$ on a Lipschitz domain $\Omega$ and, more generally, on a manifold with or without boundary. The eigenfunctions of $L$ are often localized, as a result of disorder of the potential $V$, the matrix of coefficients $A$, irregularities of the boundary, or all of the above. In earlier work, Filoche and Mayboroda introduced the function $u$ solving $Lu = 1$, and showed numerically that it strongly reflects this localization. Here, we deepen the connection between the eigenfunctions and this {\em landscape} function $u$ by proving that its reciprocal $1/u$ acts as an {\em effective potential}. The effective potential governs the exponential decay of the eigenfunctions of the system and delivers information on the distribution of eigenvalues near the bottom of the spectrum.

**Ron Rosenthal:** Eigenvector correlation in the complex Ginibre ensemble

The complex Ginibre ensemble is a non-Hermitian random matrix on C^N with i.i.d. complex Gaussian entries normalized to have mean zero and variance 1=N. Unlike the Gaussian unitary ensemble, for which the eigenvectors are orthogonal, the geometry of the eigenbases of the Ginibre ensemble are not particularly well understood. We will discuss a some results regarding the analytic and algebraic structure of eigenvector correlations in this matrix ensemble. In particular, we uncover an extended algebraic structure which describes the asymptotic behavior (as N goes to infinity) of these correlations. Our work extends previous results of Chalker and Mehlig [CM98], in which the correlation for pairs of eigenvectors was computed. Based on a joint work with Nick Crawford.

Recent work has shown impressive success in automatically creating new images with desired properties such as transferring painterly style, modifying facial expressions or manipulating the center of attention of the image. In this talk I will discuss two of the standing challenges in image synthesis and how we tackle them:

- I will describe our efforts in making the synthesized images more photo-realistic.

- I will further show how we can broaden the scope of data that can be used for training synthesis networks, and with that provide a solution to new applications.

This talk will include an introduction to the topic of V(D)J rearrangements of particular subsets of T cells and B cells of the adaptive human immune system, in particular of IgG heavy chains. There are many statistical problems that arise in trying to understand these cells. They involve estimating aspects of functionals of discrete probabilities on (random) finite sets. Topics include but are not limited to exchangeability, estimating non-centrality parameters, and estimating covariance matrices from what are called "replicates" that have been amplified by the PCR process and (partially) sequenced.

I have received considerable assistance from Lu Tian, and also Yi Liu; as well, I have been helped considerably by Andrew Fire and Scott Boyd, and also Jorg Goronzy

Single cells and cells in a tissue respond to stimuli by deforming, changing their shape, and/or moving. Some of these responses can be understood from the underlying biochemical signaling, a topic that has been of interest to both biologists and modelers. In our recent work, my group has studied the link between mechanical tension on cells and their internal chemical signaling. (Our primary focus has been, and remains, that of Rho proteins.) Here I will describe a simple "toy model" that captures key aspects of what is known biologically. The model is simple enough to understand mathematically, and yet capable fo displaying several regimes of behavior consistent with experimental observations. I describe how we investigated the model in a single cell, and how this was then used to capture multiple cells that interact with one another mechanically. We find waves of expansion and contraction that sweep through the model "tissue" is certain parameter regimes. This work is joint with Cole Zmurchok and Dhanajay Bhaskar.

We study the problem of identifying correlations in multivariate data, under information constraints:

Either on the amount of memory that can be used by the algorithm, or the amount of communi- cation when the data is distributed across several machines. We prove a tight trade-off between the memory/communication complexity and the sample complexity, implying (for example) that to detect pairwise correlations with optimal sample complexity, the number of required mem-ory/communication bits is at least quadratic in the dimension. Our results substantially improve those of Shamir (2014), which studied a similar question in a much more restricted setting. To the best of our knowledge, these are the first provable sample/memory/communication trade-offs for a practical estimation problem, using standard distributions, and in the natural regime where the memory/communication budget is larger than the size of a single data point. To derive our theorems, we prove a new information-theoretic result, which may be relevant for studying other information-constrained learning problems.

Joint work with Ohad Shamir

Non-signaling strategies are collections of distributions with certain non-local correlations that have been studied recently in the context of delegation of computation.

In this talk I will discuss a recent result showing that the Hadamard based PCP of [ALMSS'92] is sound against non-signaling strategies. As part of the proof, we study the classical problem of linearity testing [BLR'93] in the setting of non-signaling strategies, and prove that any no-signaling strategy that passes the linearity test with high probability must be close to a quasi-distribution over linear functions.

Joint work with Alessandro Chiesa and Peter Manohar (UC Berkeley).

Consider a sequence of growing graphs converging locally weakly to an infinite (possibly random) tree. As there are uncountably many Ising and Potts Gibbs measures on the limiting tree in the low-temperature regime it is not apriori clear whether the local weak limit of such measures exists and if so, identifying the limit remains a challenge. In this talk, I will describe these limits. The talk is based on joint works with Amir Dembo and Allan Sly.

Recently, there has been increasing interest to leverage the competence of neural networks to analyze data. In particular, new clustering methods that employ deep embeddings have been presented.

In this talk, we depart from centroid-based models and suggest a new framework, called Clustering-driven deep embedding with PAirwise Constraints (CPAC), for non-parametric clustering using a neural network. We present a clustering-driven embedding based on a Siamese network that encourages pairs of data points to output similar representations in the latent space. Our pair-based model allows augmenting the information with labeled pairs to constitute a semi-supervised framework. Our approach is based on analyzing the losses associated with each pair to refine the set of constraints.

We show that clustering performance increases when using this scheme, even with a limited amount of user queries.

We present state-of-the-art results on different types of datasets and compare our performance to parametric and non-parametric techniques.

Our first theorem is a hierarchy theorem for the query complexity of testing graph properties with one-sided error; more precisely, we show that for every sufficiently fast-growing function f from (0,1) to the natural numbers, there is a graph property whose one-sided-error query complexity is precisely f(\Theta(\epsilon)). No result of this type was previously known for any f which is super-polynomial. Goldreich [ECCC 2005] asked to exhibit a graph property whose query complexity is exponential in 1/\epsilon. Our hierarchy theorem partially resolves this problem by exhibiting a property whose one-sided-error query complexity is exponential in 1/\epsilon. We also use our hierarchy theorem in order to resolve a problem raised by Alon and Shapira [STOC 2005] regarding testing relaxed versions of bipartiteness.

Our second theorem states that for any function f there is a graph property whose one-sided-error query complexity is at least f(\epsilon) while its two-sided-error query complexity is only polynomial in 1/\epsilon. This is the first indication of the surprising power that two-sided-error testing algorithms have over one-sided-error ones, even when restricted to properties that are testable with one-sided error. Again, no result of this type was previously known for any f that is super-polynomial.

The above theorems are derived from a graph theoretic result which we think is of independent interest, and might have further applications. Alon and Shikhelman [JCTB 2016] introduced the following generalized Turan problem: for fixed graphs H and T, and an integer n, what is the maximum number of copies of T, denoted by ex(n,T,H), that can appear in an n-vertex H-free graph? This problem received a lot of attention recently, with an emphasis on T = C_3, H=C_{2m+1}. Our third theorem gives tight bounds for ex(n,C_k,C_m) for all the remaining values of k and m.

Joint work with Asaf Shapira.

Non-Euclidean, or incompatible elasticity is an elastic theory for bodies that do not have a reference (stress-free) configuration. It applies to many systems, in which the elastic body undergoes inhomogeneous growth (e.g. plants, self-assembled molecules). Mathematically, it is a question of finding the "most isometric" immersion of a Riemannian manifold (M,g) into Euclidean space of the same dimension, by minimizing an appropriate energy functional.

Much of the research in non-Euclidean elasticity is concerned with elastic bodies that have one or more slender dimensions (such as leaves), and finding appropriate dimensionally-reduced models for them.

In this talk I will give an introduction to non-Euclidean elasticity, and then focus on thin bodies and present some recent results on the relations between their elastic behavior and their curvature.

Based on a joint work with Asaf Shachar.

We prove a dichotomy theorem for two-party protocols, and show that for every poly-time two-party protocol with single-bit output, at least one of following holds:

- The protocol can be used to construct a key-agreement protocol.
- For every constant ρ > 0 the parties' output is ρ -uncorrelated: let (X; Y; T) denote the parties' outputs and the protocol's transcript respectively. A protocol is &rho -uncorrelated if there exists an efficient "decorralizer" algorithm Decor, that when given a random transcript T, produces two numbers P
_{A}; P_{B}, such that no efficient algorithm can distinguish (U_{PS};U_{PB}; T) (where Up denotes a biassed coin with bias ρ from (X; Y; T), with distinguishing advantage larger than ρ.

Namely, if the protocol cannot be used to construct key-agreement, then its output distribution (X; Y; T) is trivial: it can be simulated non-interactively by the parties given public randomness (used to sample T). (The precise statement also has qualifiers of the form: "on infinitely many choices of the security parameter").

We use the above characterization to prove that (α= 24ε^{2})-correct differentially private symmetric protocol for computing XOR, implies the existence of key-agreement protocol. The above dependency between α and &epsilon is tight since an θ( ε^{2})-correct "-differentially private protocol for computing XOR is known to exists unconditionally. It also improves, in the ( ε,α)dependency aspect, upon Goyal et al. [ICALP '16] who showed that, for some constant c > 0, a c-correct "-differentially private protocol for computing XOR implies oblivious transfer. Our result extends to a weaker notion of di erential privacy in which the privacy only requires to hold against external observer. Interestingly, the reductions used for proving the above results are non black box.

Joint work with: Eran Omri and Kobbi Nissim and Ronen Shaltiel and Jad Silbak

Talk 1: David Ellis (Queen Mary U)

**Title: The edge-isoperimetric problem for antipodally symmetric subsets of the discrete cube.**

Abstract: A major open problem in geometry is to solve the isoperimetric problem for n-dimensional real projective space, i.e. to determine, for each real number V, the minimum possible size of the boundary of a (well-behaved) set of volume V, in n-dimensional real projective space. We study a discrete analogue of this question: namely, among all antipodally symmetric subsets of {0,1}^n of fixed size, which sets have minimal edge-boundary? We obtain a complete answer to the second question. This is joint work with Imre Leader (Cambridge)

Talk 2: Benjamin Fehrman (Max Planck Institute)

**Title: Well-posedness of stochastic porous media equations with nonlinear, conservative noise.**

Abstract: In this talk, which is based on joint work with Benjamin Gess, I will describe a pathwise well-posedness theory for stochastic porous media equations driven by nonlinear, conservative noise. Such equations arise in the theory of mean field games, as an approximation to the Dean-Kawasaki equation in fluctuating hydrodynamics, to describe the fluctuating hydrodynamics of a zero range process, and as a model for the evolution of a thin film in the regime of negligible surface tension. Our methods are loosely based on the theory of stochastic viscosity solutions, where the noise is removed by considering a class of test functions transported along underlying stochastic characteristics. We apply these ideas after passing to the equation's kinetic formulation, for which the noise enters linearly and can be inverted using the theory of rough paths.

The harnessing of modern computational abilities for many-body wave-function representations is naturally placed as a prominent avenue in contemporary condensed matter physics. Specifically, highly expressive computational schemes that are able to efficiently represent the entanglement properties which characterize many-particle quantum systems are of interest. In the seemingly unrelated field of machine learning, deep network architectures have exhibited an unprecedented ability to tractably encompass the convoluted dependencies which characterize hard learning tasks such as image classification or speech recognition. However, theory is still lagging behind these rapid empirical advancements, and key questions regarding deep learning architecture design have no adequate answers. In the presented work, we establish a Tensor Network (TN) based common language between the two disciplines, which allows us to offer bidirectional contributions. By showing that many-body wave-functions are structurally equivalent to mappings of convolutional and recurrent arithmetic circuits, we construct their TN descriptions in the form of Tree and Matrix Product State TNs, and bring forth quantum entanglement measures as natural quantifiers of dependencies modeled by such networks. Accordingly, we propose a novel entanglement based deep learning design scheme that sheds light on the success of popular architectural choices made by deep learning practitioners, and suggests new practical prescriptions. Specifically, our analysis provides prescriptions regarding connectivity (pooling geometry) and parameter allocation (layer widths) in deep convolutional networks, and allows us to establish a first of its kind theoretical assertion for the exponential enhancement in long term memory brought forth by depth in recurrent networks. In the other direction, we identify that an inherent re-use of information in state-of-the-art deep learning architectures is a key trait that distinguishes them from TN based representations. Therefore, we suggest a new TN manifestation of information re-use, which enables TN constructs of powerful architectures such as deep recurrent networks and overlapping convolutional networks. This allows us to theoretically demonstrate that the entanglement scaling supported by state-of-the-art deep learning architectures can surpass that of commonly used expressive TNs in one dimension, and can support volume law entanglement scaling in two dimensions with an amount of parameters that is a square root of that required by Restricted Boltzmann Machines. We thus provide theoretical motivation to shift trending neural-network based wave-function representations closer to state-of-the-art deep learning architectures.

Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. In this talk I will introduce our neural relational inference model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks.

I will describe work with C.E.Wayne in which we study how dissipation leads to a very slow drift in secular parameters for the Non-linear Schroedinger equation. The nice thing about this model is that one can describe a relatively complex phenomenon in almost explicit terms as a perturbation from an integrable Hamiltonian system

The advancement in quantum computing, where Google, IBM, Microsoft, Intel are competing in the (exponentially growing) number of qubits in their (some already) commercial quantum computers that they produce, requires the reexamination of the Internet Security, and the public key infrastructure. The talk will describe the concept of overlay security together with blockchain based directories for establishing symmetric keys. When combined with nested Lamport signature and Merkle trees for digital signatures the result is a complete, easily implementable architecture with information theoretically secure communication, and hash based signatures.

**Mark Rudelson (UMich)**

**Title**: Invertibility of the adjacency matrices of random graphs.

**Abstract**: Consider an adjacency matrix of a bipartite, directed, or undirected Erdos-Renyi random graph. If the average degree of a vertex is large enough, then such matrix is invertible with high probability. As the average degree decreases, the probability of the matrix being singular increases, and for a sufficiently small average degree, it becomes singular with probability close to 1. We will discuss when this transition occurs, and what the main reason for the singularity of the adjacency matrix is.

This is a joint work with Anirban Basak.

**Yinon Spinka (TAU)**

**Title**: Finitary codings of Markov random fields

**Abstract**: Let X be a stationary Z^d-process. We say that X can be coded by an i.i.d. process if there is a(deterministic and translation-invariant) way to construct a realization of X from i.i.d. variables associated to the sites of Z^d. That is, if there is an i.i.d. process Y and a measurable map F from the underlying space of Y to that of X, which commutes with translations of Z^d and satisfies that F(Y)=X in distribution. Such a coding is called finitary if, in order to determine the value of X at a given site, one only needs to look at a finite (but random) region of Y.

It is known that a phase transition (existence of multiple Gibbs states) is an obstruction for the existence of such a finitary coding. On the other hand, we show that when X is a Markov random field satisfying certain spatial mixing conditions, then X can be coded by an i.i.d. process in a finitary manner. Moreover, the coding radius has exponential tails, so that typically the value of X at a given site is determined by a small region of Y.

We give applications to models such as the Potts model, proper colorings and the hard-core model.

Image restoration algorithms are typically evaluated by some distortion measure (e.g. PSNR, SSIM, IFC, VIF) or by human opinion scores that quantify perceived perceptual quality. In this work, we prove mathematically that distortion and perceptual quality are at odds with each other. Specifically, we study the optimal probability for correctly discriminating the outputs of an image restoration algorithm from real images. We show that as the mean distortion decreases, this probability must increase (indicating worse perceptual quality). As opposed to the common belief, this result holds true for any distortion measure, and is not only a problem of the PSNR or SSIM criteria. However, as we show experimentally, for some measures it is less severe (e.g. distance between VGG features). We also show that generative-adversarial-nets (GANs) provide a principled way to approach the perception-distortion bound. This constitutes theoretical support to their observed success in low-level vision tasks. Based on our analysis, we propose a new methodology for evaluating image restoration methods, and use it to perform an extensive comparison between recent super-resolution algorithms. Our study reveals which methods are currently closest to the theoretical perception-distortion bound.

* Joint work with Yochai Blau.

We present novel oblivious routing algorithms for the splittable and the unsplittable multicommodity flow settings. Our algorithms for both models improve upon the state-of-the-art, in terms of running time and performance with respect to graphs that exhibit good expansion guarantees. As an intermediate step towards the unsplittable setting, we present a novel generalization of Valiant's classical load balancing scheme for packet-switched networks to arbitrary graphs, which is of independent interest. Our approach relies on diffusing traffic throughout the network and then regathering it to its destination, via iterative applications of the random walk operator. Consequently, the performance guarantees of our algorithms are derived from the convergence of the random walk operator to the stationary distribution and are expressed in terms of the spectral gap of the graph (which dominates the mixing time).

Genome-wide forces of mutation and selection create local and global sequence patterns that carry information on functional role of genomic DNA. I will describe algorithmic approaches to genome analysis that decode sequence patterns and identify features of structural gene organization and gene expression (e.g. leaderless transcription). The algorithms make unsupervised inference of the structure of underlying graphical model and its parameters.

The subsequently developed software tools were, among other applications, used at NCBI (Bethesda, MD) to annotate and re-annotate 2,500+ fungal genomes and 130,000+ prokaryotic genomes. Yet another algorithm was employed by DOE JGI (Walnut Creek, CA) to annotate the largest collection of metagenomes.

Speaker s short Bio

Mark Borodovsky, PhD, a Regents' Professor at the Joint Georgia Tech and Emory University Department of Biomedical Engineering and the School of Computational Science and Engineering. Since 1990 when he arrived to Atlanta from Moscow, Dr Borodovsky had also joint faculty appointments at the School of Biology and the School of Mathematics. Dr. Borodovsky is a Founder of the Georgia Tech Bioinformatics graduate program.

Suppose a language L can be decided by a bounded-error randomized algorithm that runs in space S and time n * poly(S). We give a randomized algorithm for L that still runs in space O(S) and time n * poly(S) that uses only O(S) random bits; our algorithm has a low failure probability on all but a negligible fraction of inputs of each length. An immediate corollary is a deterministic algorithm for L that runs in space O(S) and succeeds on all but a negligible fraction of inputs of each length. We also discuss additional complexity-theoretic applications of our technique.

Points of infinite multiplicity are particular points which the Brownian motion visits infinitely often. Following a work of Bass, Burdzy and Khoshnevisan, we construct and study a measure carried by these points. Joint work with Yueyun Hu and Zhan Shi.

Scattering effects in images, including those related to haze, fog, and appearance of clouds, are fundamentally dictated by microphysical characteristics of the scatterers. We define and derive recovery of these characteristics, in a three-dimensional heterogeneous medium. Recovery is based on a novel tomography approach. Multiview (multi-angular) and multi-spectral data are linked to the underlying microphysics using 3D radiative transfer, accounting for multiple-scattering. Despite the nonlinearity of the tomography model, inversion is enabled using a few approximations that we describe. As a case study, we focus on passive remote sensing of the atmosphere, where scatterer retrieval can benefit modeling and forecasting of weather, climate, and pollution.

We present a new way to derive the replica symmetric solution for the free energy in mean-field spin glasses. Only the Sherrington-Kirpatrick case has been worked out in details, but the method also works in other cases, for instance for the perceptron (work in progress), and probably also for the Hopfield net. The method is closely related to the TAP equations (for Thouless-Anderson-Palmer). It does not give any new results, presently, but it gives a new viewpoint, and it looks to be quite promising. As the TAP equations are widely discussed in the physics literature, also at low temperature, it is hoped that the method could be extended to this case, too. But this is open, and probably very difficult.

We will introduce the (new) notion of approximability in triangulated categories and show its power.

The brief summary is that the derived category of quasicoherent sheaves on a separated, quasicompact scheme is an approximable triangulated category.

As relatively easy corollaries one can: (1) prove an old conjecture of Bondal and Van den Bergh, about strong generation in D^{perf}(X), (2) generalize an old theorem of of Rouquier about strong generation in D^b_{coh}(X). Rouquier proved the result only in equal characteristic, we can extend to mixed characteristic, and (3) generalize a representability theorem of Bondal and Van den Bergh,from proper schemes of finite type over fields to proper schemes of finite type over any noetherian rings.

After stating these results and explaining what they mean, we will (time permitting) also mention structural theorems. It turns out that approximable triangulated categories have a fair bit of intrinsic, internal structure that comes for free.

Recent success stories of using machine learning for diagnosing skin cancer, diabetic retinopathy, pneumonia, and breast cancer may give the impression that artificial intelligence (AI) is on the cusp of radically changing all aspects of health care. However, many of the most important problems, such as predicting disease progression, personalizing treatment to the individual, drug discovery, and finding optimal treatment policies, all require a fundamentally different way of thinking. Specifically, these problems require a focus on *causality* rather than simply prediction. Motivated by these challenges, my lab has been developing several new approaches for causal inference from observational data. In this talk, I describe our recent work on the deep Markov model (Krishnan, Shalit, Sontag AAAI '17) and TARNet (Shalit, Johansson, Sontag, ICML '17).

Recent success stories of using machine learning for diagnosing skin cancer, diabetic retinopathy, pneumonia, and breast cancer may give the impression that artificial intelligence (AI) is on the cusp of radically changing all aspects of health care. However, many of the most important problems, such as predicting disease progression, personalizing treatment to the individual, drug discovery, and finding optimal treatment policies, all require a fundamentally different way of thinking. Specifically, these problems require a focus on *causality* rather than simply prediction. Motivated by these challenges, my lab has been developing several new approaches for causal inference from observational data. In this talk, I describe our recent work on the deep Markov model (Krishnan, Shalit, Sontag AAAI '17) and TARNet (Shalit, Johansson, Sontag, ICML '17).

We show that any area-preserving C^{r}-diffeomorphism of a two-dimensional surface displaying an elliptic fxed point can be C^{r}-perturbed to one exhibiting a chaotic island whose metric entropy is positive, for every 1 ≤ r≤ 1. This proves a conjecture of Herman stating that the identity map of the disk can be C^{ ∞}-perturbed to a conservative di eomorphism with positive metric entropy. This implies also that the Chirikov standard map for large and small parameter values can be C^{∞}- approximated by a conservative diffeomorphisms displaying a positive metric entropy (a weak version of Sinai's positive metric entropy conjecture). Finally, this sheds light onto a Herman's question on the density of C^{r}-conservative di eomorphisms displaying a positive metric entropy: we show the existence of a dense set formed by conservative diffeomorphisms which either are weakly stable (so, conjecturally, uniformly hyperbolic) or display a chaotic island of positive metric entropy.

This is a joint work with Pierre Berger.

The generalized Jacobian Jac_m(C ') of a smooth hyperelliptic curve C' associated with a module m is an algebraic group that can be described by using lines bundle of the curve C' or by using a symmetric product of the curve C' provided with a law of composition. This second definition of the Jacobian Jac_m(C') is directly related to the fibres of a Mumford system. To be precise it is a subset of the compactified Jac_m(C') which is related to the fibres. This presentation will help us to demystify the relationship of these two mathematical objects.

Multiple wireless sensing tasks, e.g. radar detection for driver safety, involve estimating the "channel" or relationship between signal transmitted and received.

In this talk I will tell about the standard math model for the radar channel. In the case where the channel is sparse, I will demonstrate a channel estimation algorithm that is sub-linear in sampling and arithmetic complexity (and convince you of the need for such).

The main ingredients in the algorithm will be the use of an intrinsic algebraic structure known as the Heisenberg group and recent developments in the theory of the sparse Fast Fourier Transform (sFFT, due to Indyk et al.)

The talk will assume minimal background knowledge.

Understanding the ways in which the Brain gives rise to the Mind (memory, behavior, cognition, intelligence, language) is the most challengingproblem facing science today. As the answer seems likely to be at least partly computational, computer scientists should work on this problem --- except there is no obvious place to start. I shall recount recent work (with W. Maass and S. Vempala) on a model for the formation and association of memories in humans, and reflect on why it may be a clue about language.

Variational problems in geometry, fabrication, learning, and related applications lead to structured numerical optimization problems for which generic algorithms exhibit inefficient or unstable performance. Rather than viewing the particularities of these problems as barriers to scalability, in this talk we embrace them as added structure that can be leveraged to design large-scale and efficient techniques specialized to applications with geometrically structure variables. We explore this theme through the lens of case studies in surface parameterization, optimal transport, and multi-objective design

A long line of research studies the space complexity of estimating a norm l(x) in the data-stream model, i.e., when x is the frequency vector of an input stream consisting of insertions and deletions of items of n types. I will focus on norms l (in R^n) that are *symmetric*, meaning that l is invariant under sign-flips and coordinate-permutations, and show that the streaming space complexity is essentially determined by the measure-concentration characteristics of l. These characteristics are known to govern other phenomena in high-dimensional spaces, such as the critical dimension in Dvoretzky's Theorem.

The family of symmetric norms contains several well-studied norms, such as all l_p norms, and indeed we provide a new explanation for the disparity in space complexity between p<=2 and p>2. We also obtain bounds for other norms that are useful in applications.

Joint work with Jaroslaw Blasiok, Vladimir Braverman, Stephen R. Chestnut, and Lin F. Yang.

Multi-finger caging offers a robust approach to object grasping. To securely grasp an object, the fingers are first placed in caging regions surrounding a desired immobilizing grasp. This prevents the object from escaping the hand, and allows for great position uncertainty of the fingers relative to the object. The hand is then closed until the desired immobilizing grasp is reached.

While efficient computation of two-finger caging grasps for polygonal objects is well developed, the computation of three-finger caging grasps has remained a challenging open problem. We will discuss the caging of polygonal objects using three-finger hands that maintain similar triangle finger formations during the grasping process. While the configuration space of such hands is four dimensional, their contact space which represents all two and three finger contacts along the grasped object's boundary forms a two-dimensional stratified manifold.

We will present a caging graph that can be constructed in the hand's relatively simple contact space. Starting from a desired immobilizing grasp of the object by a specific triangular finger formation, the caging graph is searched for the largest formation scale value that ensures a three-finger cage about the object. This value determines the caging regions, and if the formation scale is kept below this value, any finger placement within the caging regions will guarantee a robust object grasping.

An extremal metric, as defined by Calabi, is a canonical Kahler metric: it minimizes the curvature within a given Kahler class. According to the Yau-Tian-Donaldson conjecture, polarized Kahler manifolds admitting an extremal metric should correspond to stable manifolds in a Geometric Invariant Theory sense.

In this talk, we will explain that a projective extremal Kahler manifold is asymptotically relatively Chow stable. This fact was conjectured by Apostolov and Huang, and its proof relies on quantization techniques. We will explain various implications, such that unicity or splitting results for extremal metrics.

Joint work with Yuji Sano ( Fukuoka University).

A curious property of randomized log-space search algorithms is that their outputs are often longer than their workspace. One consequence is that there is no clear way to reproduce the same output when running the algorithm twice on the same input. It is not feasible to store the random bits (or the output) of the previous run in log-space, and using new random bits in another execution can result in a different output. This leads to the question: how can we reproduce the results of a randomized log space computation of a search problem?

We will give a precise definition of this notion of "reproducibility". Then we will show that every problem in search-RL has a randomized log-space algorithm where the output can be reproduced. Reproducibility can be thought of as an extension of pseudo-determinism. Indeed, for some problems in search-RL we show pseudo-deterministic algorithms whose running time significantly improve on known deterministic algorithms.

Joint work with Yang Liu.

Oren Louidor (Technion)

**Title: Dynamical freezing in a spin-glass with logarithmic correlations**.

Abstract: We consider a continuous time random walk on the 2D torus, governed by the exponential of the discrete Gaussian free field acting as potential. This process can be viewed as Glauber dynamics for a spin-glass system with logarithmic correlations. Taking temperature to be below the freezing point, we then study this process both at pre-equilibrium and in-equilibrium time scales. In the former case, we show that the system exhibits aging and recover the arcsine law as asymptotics for a natural two point temporal correlation function. In the latter case, we show that the dynamics admits a functional scaling limit, with the limit given by a variant of Kolmogorov's K-process, driven by the limiting extremal process of the field, or alternatively, by a super-critical Liouville Brownian motion. Joint work with A. Cortines, J. Gold and A. Svejda.

Alexander Glazman (Tel Aviv)

**Title: Level lines of a random Lipschitz function**

Abstract: We consider the uniform distribution on Lipschitz functions on the triangular lattice, i.e. all integer-valued functions which differ by 0 or 1 on any two adjacent vertices. We show that with a positive probability such a function exhibits macroscopic level lines. Instead of working directly with Lipschitz functions we map this model to the loop $O(2)$ model with parameter $x=1$. The loop $O(n)$ model is a model for a random collection of non-intersecting loops on the hexagonal lattice, which is believed to be in the same universality class as the spin $O(n)$ model.

A main tool in the proof is a positive association (FKG) property that was recently shown to hold when $n \ge 1$ and $0<x\le\frac{1}{\sqrt{n}}$. Though the case $n=2$, $x=1$ is not in the FKG regime, it turns out that when loops are assigned one of two colours independently the marginal on loops of either of the colours does satisfy the FKG property. The colouring of loops allows to view the loop $O(2)$ model with $x=1$ as coupling of two percolation configurations. Studying each of them independently and using the XOR operation we establish existence of macroscopic loops, i.e. level lines in the original setting.

(Based on a joint work with H. Duminil-Copin, and I. Manolescu)

In unsupervised domain mapping, the learner is given two unmatched datasets A and B. The goal is to learn a mapping G_AB that translates a sample in A to the analog sample in B. Recent approaches have shown that when learning simultaneously both G_AB and the inverse mapping G_BA, convincing mappings are obtained. In this work, we present a method of learning G_AB without learning G_BA. This is done by learning a mapping that maintains the distance between a pair of samples. Moreover, good mappings are obtained, even by maintaining the distance between different parts of the same sample before and after mapping. We present experimental results that the new method not only allows for one sided mapping learning, but also leads to preferable numerical results over the existing circularity-based constraint.

In formal verification, one uses mathematical tools in order to prove that a system satisfies a given specification.

In this talk I consider three limitations of traditional formal verification:

The first is the fact that treating systems as finite-state machines is problematic, as systems become more complex, and thus we must consider infinite-state machines.

The second is that Boolean specifications are not rich enough to specify properties required by today's systems. Thus, we need to go beyond the Boolean setting.

The third concerns certifying the results of verification: while a verifier may answer that a system satisfies its specification, a designer often needs some palpable evidence of correctness.

I will present several works addressing the above limitations, and the mathematical tools involved in overcoming them.

No prior knowledge is assumed. Posterior knowledge is guaranteed.

We develop the theory of semisimplifications of tensor categories defined by Barrett and Westbury. By definition, the semisimplification of a tensor category is its quotient by the tensor ideal of negligible morphisms, i.e., morphisms f such that Tr(fg)=0 for any morphism g in the opposite direction. In particular, we compute the semisimplification of the category of representations of a finite group in characteristic p in terms of representations of the normalizer of its Sylow p-subgroup. This allows us to compute the semisimplification of the representation category of the symmetric group S_{n+p} in characteristic p, where n=0,...,p-1, and of the Deligne category Rep^{ab} S_t, t in N. We also compute the semisimplification of the category of representations of the Kac-De Concini quantum group of the Borel subalgebra of sl_2. Finally, we study tensor functors between Verlinde categories of semisimple algebraic groups arising from the semisimplification construction, and objects of finite type in categories of modular representations of finite groups (i.e., objects generating a fusion category in the semisimplification).

This is joint work with Victor Ostrik.

In the area of distributed graph algorithms a number of network's entities with local views solve some computational task by exchanging messages with their neighbors. Quite unfortunately, an inherent property of most existing distributed algorithms is that throughout the course of their execution, the nodes get to learn not only their own output but rather learn quite a lot on the inputs or outputs of many other entities. This leakage of information might be a major obstacle in settings where the output (or input) of network's individual is a private information (e.g. distributed networks of selfish agents, decentralized digital currency such as Bitcoin, voting systems).

While being quite unfamiliar notion in the classical distributed setting, the notion of secure multi-party computation (MPC) is one of the main themes in the Cryptography community. Yet despite all extensive work in the area, no existing algorithm fits the framework of classical distributed models in which there are no assumptions on the graph topologies and only messages of bounded size are sent on the edges in each round.

In this work, we introduce a new framework for \emph{secure distributed graph algorithms} and provide the first \emph{general compiler} that takes any "natural" non-secure distributed algorithm that runs in $r$ rounds, and turns it into a secure algorithm that runs in $\widetilde{O}(r \cdot D \cdot poly(\Delta))$ rounds where $\Delta$ is the maximum degree in the graph and $D$ is its diameter. This round complexity is nearly optimal for bounded degree graphs.

The main technical part of our compiler is based on a new cycle cover theorem: We show that the edges of every bridgeless graph $G$ of diameter $D$ can be covered by a collection of cycles such that each cycle is of length $\widetilde{O}(D)$ and each edge of the graph $G$ appears in $\widetilde{O}(1)$ many cycles. This provides the basis for additional combinatorial constructions required by our compiler and might be of independent combinatorial and algorithmic interest.

Joint work with Merav Parter.

In recent years, robots have played an active role in everyday life: medical robots assist in complex surgeries, low-cost commercial robots clean houses and fleets of robots are used to efficiently manage warehouses. A key challenge in these systems is motion planning, where we are interested in planning a collision-free path for a robot in an environment cluttered with obstacles. While the general problem has been studied for several decades now, these new applications introduce an abundance of new challenges.

In this talk I will describe some of these challenges as well as algorithms developed to address them. I will overview general challenges such as compression and graph-search algorithms in the context of motion planning. I will show why traditional Computer Science tools are ill-suited for these problems and introduce alternative algorithms that leverage the unique characteristics of robot motion planning. In addition, I will describe domains-specific challenges such as those that arise when planning for assistive robots and for humanoid robots and overview algorithms tailored for these specific domains.

Language technology has become pervasive in everyday life, powering applications like Apple's Siri or Google's Assistant. Neural networks are a key component in these systems thanks to their ability to model large amounts of data. Contrary to traditional systems, models based on deep neural networks (a.k.a. deep learning) can be trained in an end-to-end fashion on input-output pairs, such as a sentence in one language and its translation in another language, or a speech utterance and its transcription. The end-to-end training paradigm simplifies the engineering process while giving the model flexibility to optimize for the desired task. This, however, often comes at the expense of model interpretability: understanding the role of different parts of the deep neural network is difficult, and such models are often perceived as "black-box". In this work, we study deep learning models for two core language technology tasks: machine translation and speech recognition. We advocate an approach that attempts to decode the information encoded in such models while they are being trained. We perform a range of experiments comparing different modules, layers, and representations in the end-to-end models. Our analyses illuminate the inner workings of end-to-end machine translation and speech recognition systems, explain how they capture different language properties, and suggest potential directions for improving them. The methodology is also applicable to other tasks in the language domain and beyond.

In this talk, I present an analogue of the Hardy-Littlewood conjecture on the asymptotic distribution of prime constellations in the setting of short intervals in function fields of smooth projective curves over finite fields.

I will discuss the definition of a "short interval" on a curve as an additive translation of the space of global sections of a sufficiently positive divisor E by a suitable rational function f, and show how this definition generalizes the definition of a short interval in the polynomial setting.

I will give a sketch of the proof which includes a computation of a certain Galois group, and a counting argument, namely, Chebotarev density type theorem.

This is a joint work with Tyler Foster.

Computational problems whose input is a program are central in Cryptography, as well as Complexity, Learning, and Optimization. The nature of such problems crucially depends on the way the program is accessed -- as a black box or explicitly by its implementation.

In which settings can we exploit code to gain an advantage over black-box access? In Cryptography, we explore this question from two opposing perspectives:

Protecting Code: Can we obfuscate a program's code so that its functionality is preserved but it is otherwise unintelligible? Intuitively, such obfuscated code reveals nothing more than black-box access to the program. Obfuscation is, therefore, a powerful tool with numerous applications in software protection and Cryptography.

Exploiting Code: Most security proofs in cryptography consist of a reduction that translates any potential attacker into an algorithm solving some underlying hard problem. While most security reductions only require black-box access to the attacker, for many applications black-box reductions are provably insufficient. Can we exploit the attacker's code to prove security where black-box reductions fail?

In this talk, I will describe new techniques for protecting and exploiting code, taking advantage of the inherent tension between these two tasks. I will also demonstrate applications of these techniques in and beyond cryptography.

Sebastien Bubeck (Microsoft Research)

**Title: k-server via multiscale entropic regularization**

Abstract: I will start by describing how mirror descent is a natural strategy for online decision making, specifically in online learning and metrical task systems. To motivate the k-server problem I will also briefly recall what we know and what we don't know for structured state/action spaces in these models. Using the basic mirror descent calculations I will show how to easily obtain a log(k)-competitive algorithm for k-paging. I will then introduce our new parametrization of fractional k-server on a tree, and explain how to analyze the movement cost of entropy-regularized mirror descent on this parametrization. This leads to a depth*log(k)-competitive (fractional) algorithm for general trees, and log^2(k) for HSTs. I will also briefly mention dynamic embeddings to go beyond the standard log(n) loss in the reduction from general metrics to HSTs.

Joint work with Michael B. Cohen, James R. Lee, Yin Tat Lee, and Aleksander Madry.

Percy Deift (Courant Institute )

**Title: Universality in numerical analysis with some examples of cryptographic algorithms.**

Abstract: We show that a wide variety of numerical algorithms with random data exhibit universality. Most of the results are computational, but in some important cases universality is established rigorously. We also discuss universality for some cryptographic algorithms.

Joint work with C. Pfrany, G. Menon, S. Olver, T. Trogdan and S. Miller.

Given a p-adic group G, number theorists are interested in producing admissible representations of G: representations which have a well-defined character functional. One way to produce such representations is by "Jacquet induction" from smaller groups, whose characters can be understood inductively. The complementary space of "new" characters which are not obtained by induction (complementary with respect to a natural metric on the space of characters) is given by what is called "elliptic" characters. Given a representation V of G, the "new" input from its character is captured by the operator Ax(V), with A (the Bernstein-Deligne-Kazhdan A-operator) the projector to the elliptic component (note that this is different from the component of the character lattice valued in elliptic elements). I will talk about my ongoing work with Xuhua He on extending this operator to a trace functional Ax(V) for V a finitely-generated representation (whose Grothendieck group is well understood), which works by first constructing a virtual elliptic admissible representation from any finitely generated representation.

Being the gradient flow of the area functional, the mean curvature flow can be thought of as a greedy algorithm for simplifying embedded shapes. But how successful is this algorithm?

In this talk, I will describe three examples for how mean curvature flow, as well as its variants and weak solutions, can be used to achieve this desired simplification.

The first is a short time smoothing effect of the flow, allowing to smooth out some rough, potentially fractal initial data.

The second is an application of mean curvature flow with surgery to smooth differential topology, allowing to conclude Schoenflies-type theorems about the moduli space of smooth embedded spheres and tori, satisfying some curvature conditions.

The third is an application of (weak,modified) mean curvature flow to differential geometry, allowing to relate bounds on the gaussian entropy functional to the topology of a closed hypersurface.

In this talk, which will assume no prior knowledge in PDE or mean curvature flow, I will try to highlight the relation between the analysis of the flow and in particular, its singularity formation, to both ''time dependent'' and "classical" geometry.

Some of the results described in this talk are joint works with Reto Buzano, Robert Haslhofer and Brian White.

We recall the notion of a hall algebra associated to a category, and explain how this construction can be done in a way that naturally includes a higher algebra structure, motivated by work of Toen and Dyckerhoff-Kapranov. We will then explain how this leads to new insights about the bi-algebra structure and related concepts.

The problem of computational super-resolution asks to recover high-frequency features of an object from the noisy and blurred/band-limited samples of its Fourier transform, based on some a-priori information about the object class. A core theoretical question is to quantify the possible amount of bandwidth extension and the associated stability of recovering the missing frequency components, as a function of the sample perturbation level and the object prior.

In this work we assume that the object has a compact space/time extent in one dimension (but otherwise can be fairly arbitrary), while the low-pass window can have a super-exponentially decaying "soft" shape (such as a Gaussian). In contrast, previously known results considered only the ideal "hard" window (a characteristic function of the band) and objects of finite energy. The super-resolution problem in this case is equivalent to a stable analytic continuation of an entire function of finite exponential type. We show that a weighted least-squares polynomial approximation with equispaced samples and a judiciously chosen number of terms allows one to have a super-resolution factor which scales logarithmically with the noise level, while the pointwise extrapolation error exhibits a Holder-type continuity with an exponent derived from weighted potential theory. The algorithm is asymptotically minimax, in the sense that there is essentially no better algorithm yielding meaningfully lower error over the same smoothness class.

The results can be considered as a first step towards analyzing the much more realistic model of a sparse combination of compactly-supported "approximate spikes", which appears in applications such as synthetic aperture radar, seismic imaging and direction of arrival estimation, and for which only limited special cases are well-understood.

Joint work with L.Demanet and H.Mhaskar.

The use of a computational PIR scheme has been instrumental in reducing interaction from interactive proofs, and in converting multi-prover interactive proofs to (single prover) 2-message computationally sound proofs (also known as arguments).

In this talk we will focus on the secrecy guarantees of this transformation.

We show that if we start with an interactive proof which is only *honest-verifier* zero-knowledge, and we use a quasi-poly secure *symmetric* PIR scheme (or a 2-message OT scheme) to reduce interaction, then the resulting 2-message argument is witness indistinguishable, and in the delayed-input setting it is distributional weak zero-knowledge (which implies strong witness indistinguishable and witness hiding in the delayed input setting). Moreover, under the same assumption (which can be instantiated from quasi-poly DDH/QR/N'th residuosity assumption), we construct a two-message argument with (similar) *statistical* secrecy guarantees. For the latter, we apply the PIR heuristic on a computationally sound proof, which is honest-verifier statistical zero-knowledge.

This is based on joint works with Abhishek Jain, Dakshita Khurana, Ron Rothblum and Amit Sahai.

Graph matching is one of the most well-studied problems in combinatorial optimization, with applications ranging from scheduling and object recognition to numerical analysis and computational chemistry.

Nevertheless, until recently very little was unknown about this problem in real-life **dynamic networks**, which aim to model the constantly changing physical world.

In the first part of the talk we'll discuss our work on dynamic graph matching, and in the second part we'll highlight our work on a few related problems.

Talk 1: Amir Dembo (Stanford)

**Title: Large deviations theory for chemical reaction networks.**

The microscopic dynamics of well-stirred networks of chemical reactions are modeled as jump Markov processes. At large volume, one may expect in this framework to have a straightforward application of large deviation theory. This is not at all true, for the jump rates are typically neither globally Lipschitz, nor bounded away from zero, with both blowup and absorption as quite possible scenarios. In joint work with Andrea Agazzi and Jean-Pierre Eckman, we utilize Lyapunov stability theory to bypass this challenge and to characterize a large class of network topologies that satisfy the full Wentzell-Freidlin theory of asymptotic rates of exit from domains of attraction.

Talk 2: Yuval Peres (Microsoft)

Title: **Trace reconstruction for the deletion channel**

Abstract: In the trace reconstruction problem, an unknown string $x$ of $n$ bits is observed through the deletion channel, which deletes each bit with some constant probability q, yielding a contracted string. How many independent outputs (traces) of the deletion channel are needed to reconstruct $x$ with high probability?

The best lower bound known is linear in $n$. Until 2016, the best upper bound was exponential in the square root of $n$. We improve the square root to a cube root using statistics of individual output bits and some inequalities for Littlewood polynomials on the unit circle. This bound is sharp for reconstruction algorithms that only use this statistical information. (Similar results were obtained independently and concurrently by De O'Donnell and Servedio). If the string $x$ is random, we can show a subpolynomial number of traces suffices by comparison to a random walk. (Joint works with Fedor Nazarov, STOC 2017, with Alex Zhai, FOCS 2017 and with Nina Holden & Robin Pemantle, preprint (2017).)

Image captioning -- automatic production of text describing a visual scene -- has received a lot of attention recently. However, the objective of captioning, evaluation metrics, and the training protocol remain somewhat unsettled. The general goal seems to be for machines to describe visual signal like humans do. We pursue this goal by incorporating a discriminability loss in training caption generators. This loss is explicitly "aware" of the need for the caption to convey information, rather than appear fluent or reflect word distribution in the human captions. Specifically, the loss in our work is tied to discriminative tasks: producing a referring expression (text that allows a recipient to identify a region in the given image) or producing a discriminative caption which allows the recipient to identify an image within a set of images. In both projects, use of the dscriminability loss does not require any additional human annotations, and relies on collaborative training between the caption generator and a comprehension model, which is a proxy for a human recipient. In experiments on standard benchmarks, we show that adding discriminability objectives not only improves the discriminative quality of the generated image captions, but, perhaps surprisingly, also makes the captions better under a variety of traditional metrics.

Joint work with Ruotian Luo (TTIC), Brian Price and Scott Cohen (Adobe).

Let K be a commutative ring. Consider the groups GLn(K). Bernstein and Zelevinsky have studied the representations of the general linear groups in case the ring K is a nite eld. Instead of studying the representations of GLn(K) for each n separately, they have studied all the representations of all the groups GLn(K) si- multaneously. They considered on R := nR(GLn(K)) structures called parabolic (or Harish-Chandra) induction and restriction, and showed that they enrich R with a structure of a so called positive self adjoint Hopf algebra (or PSH algebra). They use this structure to reduce the study of representations of the groups GLn(K) to the following two tasks:

1. Study a special family of representations of GLn(K), called cuspidal representa- tions. These are representations which do not arise as direct summands of parabolic induction of smaller representations.

2. Study representations of the symmetric groups. These representation also has a nice combinatorial description, using partitions.

In this talk I will discuss the study of representations of GLn(K) where K is a nite quotient of a discrete valuation ring (such as Z=pr or k[x]=xr, where k is a nite eld). One reason to study such representation is that all continuous complex representations of the groups GLn(Zp) and GLn(k[[x]]) (where Zp denotes the p-adic integers) arise from these nite quotients. I will explain why the natural generalization of the Harish-Chandra functors do not furnish a PSH algebra in this case, and how is this related to the Bruhat decomposition and Gauss elimination. In order to overcome this issue we have constructed a generalization of the Harish-Chandra functors. I will explain this generalization, describe some of the new functors properties, and explain how can they be applied to studying complex representations.

The talk will be based on a joint work with Tyrone Crisp and Uri Onn.

Today's technological world is increasingly dependent upon the reliability, robustness, quality of service and timeliness of networks including those of power distribution, financial, transportation, communication, biological, and social. For the time-critical functionality in transferring resources and information, a key requirement is the ability to adapt and reconfigure in response to structural and dynamic changes, while avoiding disruption of service and catastrophic failures. We will outline some of the major problems for the development of the necessary theory and tools that will permit the understanding of network dynamics in a multiscale manner.

Many interesting networks consist of a finite but very large number of nodes or agents that interact with each other. The main challenge when dealing with such networks is to understand and regulate the collective behavior. Our goal is to develop mathematical models and optimization tools for treating the ** Big Data** nature of large scale networks while providing the means to understand and regulate the collective behavior and the dynamical interactions (short and long-range) across such networks.

The key mathematical technique will be based upon the use optimal mass transport theory and resulting notions of curvature applied to weighted graphs in order to characterize network robustness. Examples will be given from biology, finance, and transportation.

I shall review the framework of algebraic families of Harish-Chandra modules, introduced recently, by Bernstein, Higson, and the speaker. Then, I shall describe three of their applications.

The first is contraction of representations of Lie groups. Contractions are certain deformations of representations with applications in mathematical physics.

The second is the Mackey bijection, this is a (partially conjectural) bijection between the admissible dual of a real reductive group and the admissible dual of its Cartan motion group.

The third is the hidden symmetry of the hydrogen atom as an algebraic family of Harish-Chandra modules.

Consider a random sequence of n bits that has entropy at least n-k, where k << n. A commonly used observation is that an average coordinate of this random sequence is close to being uniformly distributed, that is, the coordinate "looks random''. In this work, we prove a stronger result that says, roughly, that the average coordinate looks random to an adversary that is allowed to query about n/k other coordinates of the sequence, even if the adversary is non-deterministic.

As an application of this result, we prove a new result on depth-3 circuits, which recovers as a direct corollary the known lower bounds for the parity and majority functions, as well as a lower bound on sensitive functions due to Boppana.

In this talk we discuss the fine scale $L^2$-mass distribution of toral Laplace eigenfunctions with respect to random position. For the 2-dimensional torus, under certain flatness assumptions on the Fourier coefficients of the eigenfunctions and generic restrictions on energy levels, both the asymptotic shape of the variance and the limiting Gaussian law are established, in the optimal Planck-scale regime. We also discuss the 3-dimensional case, where the asymptotic behaviour of the variance is analysed in a more restrictive scenario. This is joint work with Igor Wigman.

Deep Learning has led to a dramatic leap in Super-Resolution (SR) performance in the past few years. However, being supervised, these SR methods are restricted to specific training data, where the acquisition of the low-resolution (LR) images from their high-resolution (HR) counterparts is predetermined (e.g., bicubic downscaling), without any distracting artifacts (e.g., sensor noise, image compression, non-ideal PSF, etc). Real LR images, however, rarely obey these restrictions, resulting in poor SR results by SotA (State of the Art) methods. In this paper we introduce "Zero-Shot" SR, which exploits the power of Deep Learning, but does not rely on prior training. We exploit the internal recurrence of information inside a single image, and * train a small image-specific CNN at test time, on examples extracted solely from the input image itself.* As such, it can adapt itself to different settings per image. This allows to perform SR of real old photos, noisy images, biological data, and other images where the acquisition process is unknown or non-ideal. On such images, our method outperforms SotA CNN-based SR methods, as well as previous unsupervised SR methods. To the best of our knowledge, this is the first unsupervised CNN-based SR method.

Abstract. As is well known and easy to prove the Weyl algebras A_n over a field of characteristic zero are simple. Hence any non-zero homomorphism from A_n to A_m is an embedding and m \geq n. V. Bavula conjectured that the same is true over the fields with finite characteristic. It turned out that exactly one half of his conjecture is correct (m \geq n but there are homomorphisms which are not embeddings).

If we replace the Weyl algebra by its close relative symplectic Poisson algebra (polynomial algebra with F[x_1, ..., x_n; y_1, ..., y_n] variables and Poisson bracket given by {x_i, y_i} =1 and zero on the rest of the pairs), then independently of characteristic all homomorphisms are embeddings.

In this talk I will describe a family of integral representations for the standard twisted L-function of a cuspidal representation of the exceptional group of type G_2. This integral representations. These integral representations are unusual in the sense that they unfold with a non-unique model. A priori this integral is not Eulerian but using remarkable machinery proposed by I. Piatetski-Shapiro and S. Rallis we prove that in fact the integral does factor. In the course of the plocal unramified calculation we use another non-standard method, approximations of generating functions. I will then describe a few applications of these integral representations to the study of the analytic behaviour of the this L-function and to various functorial lifts associated with the group G_2.

We present an explicit pseudorandom generator with polylog(n) seed length for read-once constant-width branching programs that can read their $n$ input bits in any order. This improves upon the work of Impagliazzo, Meka, and Zuckerman (FOCS, 2012), where they required seed length $n^{1/2+o(1)}$.

A central ingredient in our work is a bound on the Fourier spectrum of constant-width branching programs, settling a conjecture posed by Reingold, Steinke, and Vadhan (RANDOM, 2013).

Our analysis crucially uses a notion of local monotonicity on the edge labeling of the branching program. We carry critical parts of our proof under the assumption of local monotonicity and show how to deduce our results for unrestricted branching programs.

(Joint work with Eshan Chattopadhyay, Pooya Hatami, and Omer Reingold)

Machine learning has recently been revolutionized by the introduction of Deep Neural Networks. However, from a theoretical viewpoint these methods are still poorly understood. Indeed the key challenge in Machine Learning today is to derive rigorous results for optimization and generalization in deep learning. In this talk I will present several tractable approaches to training neural networks. At the second part I will discuss a new sequential algorithm for decision making that can take into account the structure in the action space and is more tuned with realistic decision making scenarios.

I will present our work that provides some of the first positive results and yield new, provably efficient, and practical algorithms for training certain types of neural networks. In a second work I will present a new online algorithm that learns by sequentially sampling random networks and asymptotically converges, in performance, to the optimal network. Our approach improves on previous random features based learning in terms of sample/computational complexity, and expressiveness. In a more recent work we take a different perspective on this problem. I will provide sufficient conditions that guarantee tractable learning, using the notion of refutation complexity. I will then discuss how this new idea can lead to new interesting generalization bounds that can potentially explain generalization in settings that are not always captured by classical theory.

In the setting of reinforcement learning I will present a recently developed new algorithm for decision making in a metrical action space. As an application, we consider a dynamic pricing problem in which a seller is faced with a stream of patient buyers. Each buyer buy at the lowest price in a certain time window. We use our algorithm to achieve an optimal regret, improving on previously known regret bound.

Schwartz functions are classically defined as smooth functions such that they, and all their (partial) derivatives, decay at infinity faster than the inverse of any polynomial. This was formulated on $\mathbb{R}^n$ by Laurent Schwartz, and later on Nash manifolds (smooth semi-algebraic varieties) by Fokko du Cloux and by Rami Aizenbud and Dima Gourevitch. In a joint work with Boaz Elazar we have extended the theory of Schwartz functions to the category of (possibly singular) real algebraic varieties. The basic idea is to define Schwartz functions on a (closed) algebraic subset of $\mathbb{R}^n$ as restrictions of Schwartz functions on $\mathbb{R}^n$.

Both in the Nash and the algebraic categories there exists a very useful characterization of Schwartz functions on open subsets, in terms of Schwartz functions on the embedding space: loosely speaking, Schwartz functions on an open subset are exactly restrictions of Schwartz functions on the embedding space, which are zero "to infinite order" on the complement to this open subset. This characterization suggests a very intuitive way to attach a space of Schwartz functions to an arbitrary (not necessarily semi-algebraic) open subset of $\mathbb{R}^n$.

In this talk, I will explain this construction, and more generally the construction of the category of tempered smooth manifolds. This category is in a sense the "largest" category whose objects "look" locally like open subsets of $\mathbb{R}^n$ (for some $n$), and on which Schwartz functions may be defined. In the development of this theory some classical results of Whitney are used, mainly Whitney type partition of unity (this will also be explained in the talk). As time permits, I will show some properties of Schwartz functions, and describe some possible applications. This is a work in progress.

Over the past two decades, machine learning has rapidly evolved and emerged as a highly influential discipline of computer science and engineering. One of the pillars of machine learning is mathematical optimization, and the connection between the two fields has been a primary focus of research. In this talk, I will present two recent works that contribute to this study, focusing on online learning---a central model in machine learning for sequential decision making and learning under uncertainty. In the first part of the talk, I will describe a foundational result concerned with the power of optimization in online learning, and give answer to the question: does there exist a generic and efficient reduction from online learning to black-box optimization? In the second part, I will discuss a recent work that employs online learning techniques to design a new efficient and adaptive preconditioned algorithm for large-scale optimization. Despite employing preconditioning, the algorithm is practical even in modern optimization scenarios such as those arising in training state-of-the-art deep neural networks. I will present the new algorithm along with its theoretical guarantees and demonstrate its performance empirically.

The random-cluster model is a dependent percolation model where the weight of a configuration is proportional to q to the power of the number of connected components. It is highly related to the ferromagnetic q-Potts model, where every vertex is assigned one of q colors, and monochromatic neighbors are encouraged. Through non-rigorous means, Baxter showed that the phase transition is first-order whenever $q > 4$ - i.e. there are multiple Gibbs measures at criticality. We provide a rigorous proof of this claim. Like Baxter, our proof uses the correspondence between the above models and the six-vertex model, which we analyze using the Bethe ansatz and transfer matrix techniques. We also prove Baxter's formula for the correlation length of the models at criticality.

This is joint work with Hugo Duminil-Copin, Maxime Gangebin, Ioan Manolescu, and Vincent Tassion.

We present a general approach to video understanding, inspired by semantic transfer techniques that have been successfully used for 2D image analysis. Our method considers a video to be a 1D sequence of clips, each one associated with its own semantics. The nature of these semantics -- natural language captions or other labels -- depends on the task at hand. A test video is processed by forming correspondences between its clips and the clips of reference videos with known semantics, following which, reference semantics can be transferred to the test video. We describe two matching methods, both designed to ensure that (a) reference clips appear similar to test clips and (b), taken together, the semantics of the selected reference clips is consistent and maintains temporal coherence. We use our method for video captioning on the LSMDC'16 benchmark, video summarization on the SumMe and TVSum benchmarks, Temporal Action Detection on the Thumos2014 benchmark, and sound prediction on the Greatest Hits benchmark. Our method not only surpasses the state of the art, in four out of five benchmarks, but importantly, it is the only single method we know of that was successfully applied to such a diverse range of tasks.

Digital geometry processing (DGP) is one of the core topics of computer graphics, and has been an active line of research for over two decades. On one hand, the field introduces theoretical studies in topics such as vector-field design, preservative maps and deformation theory. On the other hand, the tools and algorithms developed by this community are applicable in fields ranging from computer-aided design, to multimedia, to computational biology and medical imaging. Throughout my work, I have sought to bridge the gap between the theoretical aspects of DGP and their applications. In this talk, I will demonstrate how DGP concepts can be leveraged to facilitate real-life applications with the right adaptation. More specifically, I will portray how I have employed deformation theory to support problems in animation and augmented reality. I will share my thoughts and first taken steps to enlist DGP to the aid of machine learning, and perhaps most excitingly, I will discussion my own and the graphics community's contributions to computational fabrication field, as well as my vision for its future.

Bio: Dr. Amit H. Bermano is a postdoctoral Researcher at the Princeton Graphics Group, hosted by Professor Szymon Rusinkiewicz and Professor Thomas Funkhouser. Previously, he was a postdoctoral researcher at Disney Research Zurich in the computational materials group, led by Dr. Bernhard Thomaszewski. He conducted his doctoral studies at ETH Zurich under the supervision of Prof. Dr. Markus Gross, in collaboration with Disney Research Zurich. His Masters and Bachelors degrees were obtained at The Technion - Israel Institute of Technology under the supervision of Prof. Craig Gotsman. His research focuses on connecting the geometry processing field with other fields in computer graphics and vision, mainly by using geometric methods to facilitate other applications. His interests in this context include computational fabrication, animation, augmented reality, medical imaging and machine learning.

We present new heavy-hitters algorithms satisfying local-differential-privacy, with optimal or near-optimal worst-case error, running time, and memory. In our algorithms, the server running time is $\tilde O(n)$ and user running time is $\tilde O(1)$, hence improving on the prior state-of-the-art result of Bassily and Smith [STOC 2015] requiring $O(n^{5/2})$ server time and $O(n^{3/2})$ user time. With a typically large number of participants in local algorithms ($n$ in the millions), this reduction in time complexity is crucial for making locally-private heavy-hitters algorithms usable in practice.

Joint work with Raef Bassily, Kobbi Nissim, and Abhradeep Thakurta.

In this talk, I will try to present some techniques to handle the problem of large deviations of the spectrum of random matrices. I will focus on the case of macroscopic statistics of the spectrum of Hermitian matrices - in particular Wigner matrices - as the empirical distribution of the eigenvalues, the largest eigenvalue or the traces of powers.

In a first part, I will be concerned with the so-called "objective method''. Coined by David Aldous, this method consists in introducing, given a sequence of random objects, like random finite graphs, a new infinite random object from which one can deduce asymptotic properties of the original sequence. In the context of random matrices, this method has been mainly advertised by Balint Virag, and proven effective in showing universality results for the so-called beta-ensembles. Regarding large deviations of random matrices, this "objective method'' amounts to embed our sequence of matrices with growing size into an appropriate space on which one is able to understand the large deviations, and carry out a contraction principle. I will review several large deviations principles obtained by this method, given by interpretations of random matrices as either dense or sparse graphs, and point out the limits of this strategy.

Cryo-EM is an imaging technology that is revolutionizing structural biology; the Nobel Prize in Chemistry 2017 was recently awarded to Jacques Dubochet, Joachim Frank and Richard Henderson "for developing cryo-electron microscopy for the high-resolution structure determination of biomolecules in solution".

Cryo-electron microscopes produce a large number of very noisy two-dimensional projection images of individual frozen molecules. Unlike related methods, such as computed tomography (CT), the viewing direction of each image is unknown. The unknown directions, together with extreme levels of noise and additional technical factors, make the determination of the structure of molecules challenging.

While other methods for structure determination, such as x-ray crystallography and nuclear magnetic resonance (NMR), measure ensembles of molecules together, cryo-EM produces measurements of individual molecules. Therefore, cryo-EM could potentially be used to study mixtures of different conformations of molecules. Indeed, current algorithms have been very successful at analyzing homogeneous samples, and can recover some distinct conformations mixed in solutions, but, the determination of multiple conformations, and in particular, continuums of similar conformations (continuous heterogeneity), remains one of the open problems in cryo-EM.

I will discuss a one-dimensional discrete model problem, Heterogeneous Multireference Alignment, which captures many of the group properties and other mathematical properties of the cryo-EM problem. I will then discuss different components which we are introducing in order to address the problem of continuous heterogeneity in cryo-EM: 1. "hyper-molecules", the first mathematical formulation of truly continuously heterogeneous molecules, 2. The optimal representation of objects that are highly concentrated in both the spatial domain and the frequency domain using high-dimensional prolate spheroidal functions, and 3. Bayesian algorithms for inverse problems with an unsupervised-learning component for recovering such hyper-molecules in cryo-EM.

Weyl group multiple Dirichlet series are Dirichlet series in r complex variables which initially converge on a cone in C^r, possess analytic continuation to a meromorphic function on the whole complex space, and satisfy functional equations whose action on C^r is isomorphic to the Weyl group of a reduced root system. I will review different constructions of such series and discuss the relations between them.

The talk will discuss informational lower bounds of approximate Nash equilibrium in two complexity models: Query Complexity and Communication Complexity.

For both models we prove exponential (in the number of players) lower bound on the complexity of computing ε -Nash equilibrium, for constant value of approximation ε .

Consider a real Gaussian stationary process, either on Z or on R.

What is the probability that it remains positive on [0,N] for large N?

The relation between this probability, known as the persistence probability, and the covariance kernel of the process has been investigated since the 1950s with motivations stemming from probability, engineering and mathematical physics. Nonetheless, until recently, good estimates were known only for particular cases, or when the covariance kernel is either non-negative or summable.

In the first hour of the talk we will discuss new spectral methods which greatly simplify the analysis of persistence. We will then describe its qualitative behavior in a very general setting.

In the second hour, we will describe (very) recent progress. In particular we will show the proof of the "spectral gap conjecture'', which states: if the spectral measure vanishes on an interval containing 0 then the persistence is less then e^{-cN^2}, and this bound is tight if the measure is non-singular and compactly supported.

Time permitting, we will also discuss "tiny persistence'' phenomena (of the order of e^{-e^{cN}}).

Based on joint works with Ohad Feldheim, Benjamin Jaye, Fedor Nazarov and Shahaf Nitzan.

Isolating the voice of a specific person while filtering out other voices or background noises is challenging when video is shot in noisy environments, using a single microphone. For example, video conferences from home or office are disturbed by other voices, TV reporting from city streets is mixed with traffic noise, etc. We propose audio-visual methods to isolate the voice of a single speaker and eliminate unrelated sounds. Face motions captured in the video are used to estimate the speaker's voice, which is applied as a filter on the input audio. This approach avoids using mixtures of sounds in the learning process, as the number of such possible mixtures is huge, and would inevitably bias the trained model.

In the first part of this talk, I will describe a few techniques to predict speech signals by a silent video of a speaking person. In the second part of the talk, I will propose a method to separate overlapping speech of several people speaking simultaneously (known as the cocktail-party problem), based on the speech predictions generated by video-to-speech system.

Assume that $(X,f)$ is a dynamical system and $\phi$ is a real non negative potential such that the corresponding $f$-invariant measure $\mu_\phi$ measure is infinite. Under assumptions of good inducing schemes, we give conditions under which the pressure of $f$ for a perturbed potential $\phi+s\psi$ relates to the pressure of the induced system term.

This extends some of Sarig's results to the setting of infinite "equilibrium states".

In addition, limit properties of the family of measures $\mu_{\phi+s\psi}$ as $s\to 0$ are studied and statistical properties (e.g. correlation coefficients) under the limit measure are derived. I will discuss several examples.

This is based on joint work with H. Bruin and M. Todd.

As a concrete variant of motivic integration, I will discuss uniform p-adic integration and constructive aspects of results involved. Uniformity is in the p-adic fields, and, for large primes p, in the fields F_p((t)) and all their finite field extensions. Using real-valued Haar measures on such fields, one can study integrals, Fourier transforms, etc. We follow a line of research that Jan Denef started in the eighties, with in particular the use of model theory to study various questions related to p-adic integration. A form of uniform p-adic quantifier elimination is used. Using the notion of definable functions, one builds constructively a class of complex-valued functions which one can integrate (w.r.t. any of the variables) without leaving the class. One can also take Fourier transforms in the class. Recent applications in the Langlands program are based on Transfer Principles for uniform p-adic integrals, which allow one to get results for F_p((t)) from results for Q_p, once p is large, and vice versa. These Transfer Principles are obtained via the study of general kinds of loci, some of them being zero loci. More recently, these loci are playing a role in the uniform study of p-adic wave front sets for (uniformly definable) p-adic distributions, a tool often used in real analysis.

This talk contains various joint works with Gordon, Hales, Halupczok, Loeser, and Raibaut, and may mention some work in progress with Aizenbud about WF-holonomicity of these distributions, in relation to a question raized by Aizenbud and Drinfeld.

Covariance matrix estimation is essential in many areas of modern Statistics and Machine Learning including Graphical Models, Classification/Discriminant Analysis, Principal Component Analysis, and many others. Classical statistics suggests using Sample Covariance Matrix (SCM) which is a Maximum Likelihood Estimator (MLE) in the Gaussian populations. Real world data, however, usually exhibits heavy-tailed behavior and/or contains outliers, making the SCM non-efficient or even useless. This problem and many similar ones gave rise to the Robust Statistics field in early 60s, where the main goal was to develop estimators stable under reasonable deviations from the basic Gaussian assumptions. One of the most prominent robust covariance matrix estimators was introduced and thoroughly studied by D. Tyler in the mid-80s. This important representative of the family of M-estimators can be defined as an MLE of a certain population. The problem of robust covariance estimation becomes even more involved in the high-dimensional scenario, where the number of samples n is of the order of the dimension p, or even less. In such cases, prior knowledge, often referred to as structure, is utilized to decrease the number of degrees of freedom and make the estimation possible. Unlike the Gaussian setting, in Tyler's case even imposition of linear structure becomes challenging due to the non-convexity of the negative log-likelihood. Recently, Tyler's target function was shown to become convex under a certain change of metric (geodesic convexity), which stimulated further investigation of the estimator.

In this work, we focus on the so-called group symmetry structure, which essentially means that the true covariance matrix commutes with a group of unitary matrices. In engineering applications such structures appear due to the natural symmetries of the physical processes; examples include circulant, perHermitian, proper quaternion matrices, etc. Group symmetric constraints are linear, and thus convex in the regular Euclidean metric. We show that they are also convex in the geodesic metric. These properties allow us to develop symmetric versions of the SCM and Tyler's estimator and build a general framework for their performance analysis. The classical results claim that at least n = p and n = p+1 samples in general position are necessary to ensure the existence and uniqueness of the SCM and Tyler's estimator, respectively. We significantly improve the sample complexity requirements for both estimators under the symmetry structure and show that in some cases even 1 or 2 samples are enough to guarantee the existence and uniqueness regardless of the ambient dimension.

In this paper we study the adaptive complexity of submodular optimization. Informally, the adaptive complexity of a problem is the minimal number of sequential rounds required to achieve a constant factor approximation when polynomially-many queries can be executed in parallel at each round. Adaptivity is a fundamental concept that is heavily studied in computer science, largely due to the need for parallelizing computation. Somewhat surprisingly, very little is known about adaptivity in submodular optimization. For the canonical problem of maximizing a monotone submodular function under a cardinality constraint, to the best of our knowledge, all that is known to date is that the adaptive complexity is between 1 and Ω(n).Our main result in this paper is a tight characterization showing that the adaptive complexity of maximizing a monotone submodular function under a cardinality constraint is, up to lower order terms, θ(log n):We describe an algorithm which requires O(log n) sequential rounds and achieves an approximation that is arbitrarily close to 1/3; We show that no algorithm can achieve an approximation better than O(1 / log n) with fewer than O(log n / log log n) rounds. Thus, when allowing for parallelization, our algorithm achieves a constant factor approximation exponentially faster than any known existing algorithm for submodular maximization. Importantly, the approximation algorithm is achieved via adaptive sampling and complements a recent line of work on optimization of functions learned from data. In many cases, we do not know the functions we optimize and learn them from labeled samples. Recent results show that no algorithm can obtain a constant factor approximation guarantee using polynomially-many labeled samples as in the PAC and PMAC models, drawn from any distribution. Since learning with non-adaptive samples over any distribution results in a sharp impossibility, we consider learning with adaptive samples where the learner obtains poly(n) samples drawn from a distribution of her choice in every round. Our result implies that in the realizable case, where there is a true underlying function generating the data, θ(log n) batches, up to lower order terms, of adaptive samples are necessary and sufficient to approximately "learn to optimize" a monotone submodular function under a cardinality constraint. This is joint work with Yaron Singer.

Given an unknown D-dimensional quantum state rho, as well as M two-outcome measurements E_1,...,E_M, how many copies of rho do we need, if we want to learn the approximate probability that E_i accepts rho for *every* i? In this talk, I'll prove the surprising result --I didn't believe it myself at first -- that one can achieve this using a number of copies that's polylogarithmic in both M and D. So, e.g., one can learn whether *every* size-n^3 quantum circuit accepts or rejects an n-qubit state, given only poly(n) copies of the state. To prove this will require first surveying previous results on measuring quantum states and succinctly describing them, including my 2004 postselected learning theorem, and my 2006 "Quantum OR Bound" (with an erroneous proof fixed in 2016 by Harrow, Lin, and Montanaro).

As time permits, I'll also discuss new joint work with Xinyi Chen, Elad Hazan, and Ashwin Nayak, which takes my 2006 result on PAC-learnability of quantum states, and extends to the setting of online learning. Here we show that, given a sequence of T two-outcome measurements on an n-qubit state, even if the sequence is chosen adversarially, one can still learn to predict the outcomes of those measurements with total regret O(n) (in the "realizable" case) or O(sqrt(Tn)) (in the "non-realizable" case).

No quantum computing background will be assumed.

Calabi conjectured that the complex Monge-Ampere equation on compact Kaehler manifolds has a unique solution.

This was solved by Yau in 1978. In this talk, we present a non-archimedean version on projective Berkovich spaces.

In joint work with Burgos, Jell, Kunnemann and Martin, we improve a result of Boucksom, Favre and Jonsson in the equicharacteristic 0 case. We give also a result in positive equicharacteristic using test ideals.

We study the non-vanishing gradient-like vector fields $v$ on smooth compact manifolds $X$ with boundary. We call such fields traversing. With the help of a boundary generic field $v$, we divide the boundary $\d X$ of $X$ into two complementary compact manifolds, $\d^+X(v)$ and $\d^-X(v)$. Then we introduce the causality map $C_v: \d^+X(v) \to \d^-X(v)$, a distant relative of the Poincare return map. Let $\mathcal F(v)$ denote the oriented 1-dimensional foliation on $X$, produced by a traversing $v$-flow.

Our main result, the Holography Theorem, claims that, for boundary generic traversing vector fields $v$, the knowledge of the causality map $C_v$ is allows for a reconstruction of the pair $(X, \mathcal F(v))$, up to a homeomorphism $\Phi: X \to X$ which is the identity on the boundary $\d X$. In other words, for a massive class of ODE's, we show that the topology of their solutions, satisfying a given boundary value problem, is rigid. We call these results ``holographic" since the $(n+1)$-dimensional $X$ and the un-parameterized dynamics of the flow on it are captured by a single correspondence $C_v$ between two $n$-dimensional screens, $\d^+X(v)$ and $\d^-X(v)$.

This holography of traversing flows has numerous applications to the dynamics of general flows. Time permitting, we will discuss some applications of the Holography Theorem to the geodesic flows and the inverse scattering problems on Riemannian manifolds with boundary.

Let Rep(GL(m|n)) denote the category of finite-dimensional algebraic representations of the supergroup Gl(m|n). Nowadays the abelian structure (Ext^1 between irreducibles, block description,...) is well understood. Kazhdan-Lusztig theory gives an algorithmic solution for the character problem, and in special cases even explicit character formulas. However we understand the monoidal structure hardly at all (e.g. the decomposition of tensor products into the indecomposable constituents). I will talk about the problem of decomposing tensor products "up to superdimension 0", i.e. about the structure of Rep(GL(m|n))/N where N is the ideal of indecomposable representations of superdimension 0.

When dealing with a highly variable problem such as action recognition, focusing on a small area, such as the hand's region, makes the problem more manageable, and enables us to invest relatively high amount of resources needed for interpretation in a small but highly informative area of the image. In order to detect this region of interest in the image and properly analyze it, I have built a process that includes several steps, starting with a state of the art hand detector, incorporating both detection of the hand by appearance and by estimation of human body pose. The hand detector is built upon a fully convolutional neural network, detecting hands efficiently and accurately. The human body pose estimation starts with a state of the art head detector and continues with a novel approach where each location in the image votes for the position of each body keypoint, utilizing information from the whole image. Using dense, multi-target votes enables us to compute image-dependent joint keypoint probabilities by looking at consensus voting, and accurately estimates the body pose. Once the detection of the hands is complete, an additional step of segmentation of the hand and fingers is made. In this step each hand pixel in the image is labeled using a dense fully convolutional network. Finally, an additional step is made to segment and identify the held object. Understanding the hand-object interaction is an important step toward understanding the action taking place in the image. These steps enable us to perform fine interpretation of hand-object interaction images as an essential step towards understanding the human-object interaction and recognizing human activities.

Recent progress in imaging technologies leads to a continuous growth in biomedical data, which can provide better insight into important clinical and biological questions. Advanced machine learning techniques, such as artificial neural networks are brought to bear on addressing fundamental medical image computing challenges such as segmentation, classification and reconstruction, required for meaningful analysis of the data. Nevertheless, the main bottleneck, which is the lack of annotated examples or 'ground truth' to be used for training, still remains.

In my talk, I will give a brief overview on some biomedical image analysis problems we aim to address, and suggest how prior information about the problem at hand can be utilized to compensate for insufficient or even the absence of ground-truth data. I will then present a framework based on deep neural networks for the denoising of Dynamic contrast-enhanced MRI (DCE-MRI) sequences of the brain. DCE-MRI is an imaging protocol where MRI scans are acquired repetitively throughout the injection of a contrast agent, that is mainly used for quantitative assessment of blood-brain barrier (BBB) permeability. BBB dysfunctionality is associated with numerous brain pathologies including stroke, tumor, traumatic brain injury, epilepsy. Existing techniques for DCE-MRI analysis are error-prone as the dynamic scans are subject to non-white, spatially-dependent and anisotropic noise. To address DCE-MRI denoising challenges we use an ensemble of expert DNNs constructed as deep autoencoders, where each is trained on a specific subset of the input space to accommodate different noise characteristics and dynamic patterns. Since clean DCE-MRI sequences (ground truth) for training are not available, we present a sampling scheme, for generating realistic training sets with nonlinear dynamics that faithfully model clean DCE-MRI data and accounts for spatial similarities. The proposed approach has been successfully applied to full and even temporally down-sampled DCE-MRI sequences, from two different databases, of stroke and brain tumor patients, and is shown to favorably compare to state-of-the-art denoising methods.

We consider the problem of hidden common manifold extraction from multiple data sets, which have observation-specific distortions and artifacts. A new manifold learning method is presented based on alternating products of diffusion operators and local kernels. We provide theoretical analysis showing that our method is able to build a variant of the Laplacian of the hidden common manifold, while suppressing the observation-specific artifacts. The generality of this method is demonstrated in data analysis applications, where different types of devices are used to measure the same activity. In particular, we present applications to problems in biomedicine, neuroscience, and audio analysis.

This is joint work with Roy Lederman and Hau-tieng Wu.

We prove the first super-logarithmic lower bounds on the cell probe complexity of dynamic *boolean* (a.k.a. decision) data structure problems, a long-standing milestone in data structure lower bounds. We introduce a new technique and use it to prove a ~ log^{1.5}(n) lower bound on the operational time of a wide range of boolean data structure problems, most notably, on the query time of dynamic range counting *over F_2* ([Patrascu07]). Proving a super-logarithmic lower bound for this problem was explicitly posed as one of five important open problems in the late Mihai Patrascu's obituary [Tho13]. This result also implies the first super-logarithmic lower bound for the classical 2D range counting problem,one of the most fundamental data structure problems in computational geometry and spatial databases. We derive similar lower bounds for boolean versions of dynamic polynomial evaluation and 2D "rectangle stabbing", and for the (non-boolean) problems of range selection and range median. Our technical centerpiece is a new way of "weakly" simulating dynamic data structures using efficient *one-way* communication protocols with small advantage over random guessing. This simulation involves a surprising excursion to low-degree (Chebychev) polynomials which may be of independent interest, and offers an entirely new algorithmic angle on the "cell sampling" method of Panigrahy et al. [PTW10].

Joint work with Kasper Green-Larsen and Huacheng Yu.

We study the classic bipartite matching problem in the online setting, first introduced in the seminal work of Karp, Vazirani and Vazirani. Specifically, we consider the problem for the well-studied class of regular graphs. Matching in this class of graphs was studied extensively in the offline setting. In the online setting, an optimal deterministic algorithm, as well as efficient algorithms under stochastic input assumptions were known. In this work, we present a novel randomized algorithm with competitive ratio tending to one on this family of graphs, under adversarial arrival order. Our main contribution is a novel algorithm which achieves competitive ratio 1-O(\sqrt{\log d}/\sqrt{d}) in expectation on d-regular graphs. In contrast, we show that all previously-known online algorithms, such as the generally worst-case optimal ranking algorithm of Karp et al., are restricted to a competitive ratio strictly bounded away from one, even as d grows. Moreover, we show the convergence rate of our algorithm's competitive ratio to one is nearly tight, as no algorithm achieves competitive ratio better than 1-O(1/\sqrt{d}). Finally, we show that our algorithm yields a similar competitive ratio with high probability, as well as guaranteeing each offline vertex a probability of being matched tending to one.

The recent success of convolutional neural networks (CNNs) for image processing tasks is inspiring research efforts attempting to achieve similar success for geometric tasks. One of the main challenges in applying CNNs to surfaces is defining a natural convolution operator on surfaces. In this paper we present a method for applying deep learning to sphere-type shapes using a global seamless parameterization to a planar flat-torus, for which the convolution operator is well defined. As a result, the standard deep learning framework can be readily applied for learning semantic, high-level properties of the shape. An indication of our success in bridging the gap between images and surfaces is the fact that our algorithm succeeds in learning semantic information from an input of raw low-dimensional feature vectors.

We demonstrate the usefulness of our approach by presenting two applications: human body segmentation, and automatic landmark detection on anatomical surfaces. We show that our algorithm compares favorably with competing geometric deep-learning algorithms for segmentation tasks, and is able to produce meaningful correspondences on anatomical surfaces where hand-crafted features are bound to fail.

Joint work with: Meirav Galun, Noam Aigerman, Miri Trope, Nadav Dym, Ersin Yumer, Vladimir G. Kim and Yaron Lipman.

The past five years have seen a dramatic increase in the performance of recognition systems due to the introduction of deep architectures for feature learning and classification. However, the mathematical reasons for this success remain elusive. In this talk we will briefly survey some existing theory of deep learning. In particular, we will focus on data structure based theory and discuss two recent developments.

The first work studies the generalization error of deep neural network. We will show how the generalization error of deep networks can be bounded via their classification margin. We will also discuss the implications of our results for the regularization of the networks. For example, the popular weight decay regularization guarantees the margin preservation, but it leads to a loose bound to the classification margin. We show that a better regularization strategy can be obtained by directly controlling the properties of the network's Jacobian matrix.

The second work focuses on solving minimization problems with neural networks. Relying on recent recovery techniques developed for settings in which the desired signal belongs to some low-dimensional set, we show that using a coarse estimate of this set leads to faster convergence of certain iterative algorithms with an error related to the accuracy of the set approximation. Our theory ties to recent advances in sparse recovery, compressed sensing and deep learning. In particular, it provides an explanation for the successful approximation of the ISTA (iterative shrinkage and thresholding algorithm) solution by neural networks with layers representing iterations.

Joint work with Guillermo Sapiro, Miguel Rodrigues, Jure Sokolic, Alex Bronstein and Yonina Eldar.

Computational social choice deals with algorithms for aggregating individual preferences or opinions towards collective decisions. AI researchers (including myself) have long argued that such algorithms could play a crucial role in the design and implementation of multiagent systems. However, in the last few years I have come to realize that the "killer app" of computational social choice is helping people -- not software agents -- make joint decisions. I will illustrate this theme through two recent endeavors: Spliddit.org, a website that offers provably fair solutions to everyday problems; and Robovote.org, which provides optimization-driven voting methods.

Throughout the talk, I will devote special attention to the theoretical foundations and results that make these services possible.

The theory of hypergeometric functions with matrix argument was developed in the 1950s by S. Bochener for Hermitian matrices, and by C. Herz for symmetric matrices. This theory admits a common generalization to the setting of symmetric cones, which is discussed in the book by Faraut-Koranyi. It also has applications to the study of non-central distributions in statistics and to the theory of random matrices.

In the 1980s, I.G. Macdonald introduced a one parameter family of multivariate hypergeometric functions, which, for special values of the parameter, are the *radial* parts of the matrix hypergeometric functions. He also formulated a number of natural conjectures about these functions, which in the matrix case can be proved by appropriate integral formulas. However this technique is unavailable in the general setting and as a result these conjectures have remained open.

In recent work with G. Olafsson we have solved most of these conjectures, using ideas from the theory of Cherednik algebras and Jack polynomials. Among other results we obtain sharp estimates for the exponential kernel that allow us to establish a rigorous theory of the Fourier and Laplace transforms, and we prove an explicit formula for the Laplace transform of a Jack polynomial conjectured by Macdonald. This opens the door for several future developments in the associated harmonic analysis, some of which we also treat. This includes (1) the Paley-Wiener theorem, (2) Laplace transform identities for hypergeometric functions, and (3) the "so-called" Ramanujan master theorem.

**Note the unusual room [De Picciotto ****Building, Room 25]**

We show that there exist binary locally testable codes (for all rates) and locally correctable codes (for low rates) with rate and distance approaching the Gilbert-Varshamov bound (which is the best rate-distance tradeoff known for general binary error-correcting codes). Our constructions use a number of ingredients: Thommesen's random concatenation technique, the Guruswami-Sudan-Indyk strategy for list-decoding concatenated codes, the Alon-Edmonds-Luby distance amplification method, and the local list-decodability and local testability of Reed-Muller codes. Interestingly, this seems to be the first time that local testability is used in the construction of locally correctable codes.

Joint work with Sivakanth Gopi, Rafael Oliveira, Noga Ron-Zewi and Shubhangi Saraf

This is a joint work with Bezrukavnikov and Kazhdan. The goal of my talk is to give an explicit formula for the Bernstein projector to representations of depth $\leq r$. As a consequence, we show that the depth zero Bernstein projector is supported on topologically unipotent elements and it is equal to the restriction of the character of the Steinberg representation. As another application, we deduce that the depth $r$ Bernstein projector is stable. Moreover, for integral depths our proof is purely local.

Image denoising is the most fundamental problem in image enhancement, and it is largely solved: It has reached impressive heights in performance and quality -- almost as good as it can ever get. But interestingly, it turns out that we can solve many other problems using the image denoising "engine". I will describe the Regularization by Denoising (RED) framework: using the denoising engine in defining the regularization of any inverse problem. The idea is to define an explicit image-adaptive regularization functional directly using a high performance denoiser. Surprisingly, the resulting regularizer is guaranteed to be convex, and the overall objective functional is explicit, clear and well-defined. With complete flexibility to choose the iterative optimization procedure for minimizing this functional, RED is capable of incorporating any image denoising algorithm as a regularizer, treat general inverse problems very effectively, and is guaranteed to converge to the globally optimal result.

* Joint work with Peyman Milanfar (Google Research) and Yaniv Romano (EE-Technion).

We propose a procedure (the first of its kind) for computing a fully data-dependent interval that traps the mixing time t_mix of a finite reversible ergodic Markov chain at a prescribed confidence level. The interval is computed from a single finite-length sample path from the Markov chain, and does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge.

The interval is constructed around the relaxation time t_relax, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a sqrt{n} rate, where n is the length of the sample path. Upper and lower bounds are given on the number of samples required to achieve constant-factor multiplicative accuracy. The lower bounds indicate that, unless further restrictions are placed on the chain, no procedure can achieve this accuracy level before seeing each state at least \Omega(t_relax) times on the average. Future directions of research are identified. Time permitting, we will mention some recent further developments by D. Levin and Y. Peres.

Joint work with Daniel Hsu and Csaba Szepesvari.

Graphs that are prevalent in current applications (the Internet, Facebook etc.) are not only very large and highly dynamic, but also distributed between many servers, none of which sees the graph in its entirety. The distributed monitoring problem deals with the question of imposing conditions on the local graphs, such that as long as they hold, it is guaranteed that some desired property holds for the global graph.

While defining local conditions for linear properties (e.g. average degree) is relatively easy, they are more difficult to derive for non-linear functions over the graph. We propose a solution and a general definition of solution optimality, and demonstrate how to apply it to two important graph properties -- spectral gap and number of triangles. We also define an absolute lower bound on the communication overhead for distributed monitoring, and compare our algorithm to it, with good results. Performance improves as the graph becomes larger and denser -- that is, when distributing it is more important.

I will talk about my recent adventures with ants. Together with biologists we study P. longicornis ants as they collaboratively transport a large food item to their nest. This collective navigation process is guided by pheromones which are laid by individual ants. Using a new methodology to detect scent marks, we identify a new kind of ant trail characterized by very short and dynamic pheromone markings and highly stochastic navigation response to them. We argue that such a trail can be highly beneficial in conditions in which knowledge of individual ants regarding the underlying topological structure is unreliable. This gives rise to a new theoretical search model on graphs under unreliable guiding instructions, which is of independent computational interest. To illustrate the model, imagine driving a car in an unknown country that is in the aftermath of a major hurricane which has randomly flipped a certain small fraction of the road-signs. Under such conditions of unreliability, how can you still reach your destination fast? I will discuss the limits of unreliability that allow for efficient navigation. In trees, for example, there is a phase transition phenomenon that occurs roughly around the inverse of the square root of the maximal degree. That is, if noise is above this threshold then any algorithm cannot avoid finding the target in exponential time (in the original distance), while below the threshold we identify an optimal, almost linear, walking algorithm. Finally, I will discuss algorithms that under such a noisy model aim to minimize the number of queries to find a target (rather than the number of moves).

This talk is based on joint works with biologists from the Weizmann Institute: Ofer Feinerman, Udi Fonio, and others, and with CS researchers: Lucas Bockowski, Adrian Kosowski, and Yoav Rodeh.

By analytical and numerical studies of Deep Neural Networks (using standard TensorFlow) in the "Information Plane" - the Mutual Information the network layers preserve on the input and the output variables - we obtain the following new insights.

- The training epochs, for each layer, are divided into two phases: (1) fitting the training data - increasing the mutual information on the labels; (2) compressing the representation - reducing the mutual information on the inputs. The layers are learnt hierarchically, from the bottom to the top layer, with some overlaps.
- Most (~80%) of the training time - optimization with SGD - is spent on compressing the representation (the second phase) - NOT on fitting the training data labels, even when the training has no regularization or terms that directly aim at such compression.
- The convergence point, FOR EVERY HIDDEN LAYER, lies on or very close to the Information Bottleneck IB) theoretical bound. Thus, the mappings from the input to the hidden layer and from the hidden layer to the output obey the IB self-consistent equations for some value of the compression-prediction tradeoff.
- The main benefit of adding more hidden layers is in the optimization/training time, as the compression phase for each layer amounts to relaxation to a Maximum conditional Entropy state, subject to the proper constraints on the error/information on the labels. As such relaxation takes super-linear time in the compressed entropy, adding more hidden layers dramatically reduces the training time. There is also benefit in sample complexity to adding hidden layers, but this is a smaller effect.

I will explain these new observations and the benefits of exploring Deep Learning in the "Information Plane", and discuss some of the exciting theoretical and practical consequences of our analysis.

Joint work with Ravid Ziv and Noga Zaslavsky.

We present a novel approach to template matching that is efficient, can handle partial occlusions, and is equipped with provable performance guarantees. A key component of the method is a reduction that transforms the problem of searching a nearest neighbor among N high-dimensional vectors, to searching neighbors among two sets of order sqrt(N) vectors, which can be done efficiently using range search techniques. This allows for a quadratic improvement in search complexity, that makes the method scalable when large search spaces are involved.

For handling partial occlusions, we develop a hashing scheme based on consensus set maximization within the range search component. The resulting scheme can be seen as a randomized hypothesize-and-test algorithm, that comes with guarantees regarding the number of iterations required for obtaining an optimal solution with high probability.

The predicted matching rates are validated empirically and the proposed algorithm shows a significant improvement over the state-of-the-art in both speed and robustness to occlusions.

Joint work with Stefano Soatto.

While there has been a lot of progress in designing efficient custom protocols for computing Private Set Intersection (PSI), there has been less research on using generic MPC protocols for this task. However, there are many variants of the set intersection functionality which seem hard to compute with existing custom protocols and are easy to compute with generic MPC based solutions (for example comparing the cardinality of the intersection with a threshold or measuring ad conversion rates). Generic protocols work over circuits which compute the intersection. For sets of size n the best known circuit constructions compute O(n log n) comparisons. In this work we propose new circuit-based protocols for computing variants of the intersection, with circuits computing only O(n) comparisons. Our constructions are based on a new variant of Cuckoo hashing in two dimensions. We employ several optimizations and determine experimentally the required sizes of tables and circuits, and measure the runtime, showing that our protocol is more efficient in concrete terms than existing constructions. The proof technique is new and can be generalized to analyzing simple Cuckoo hashing as well as new variants.

We study the ecological use of analogies in AI. Specifically, we address the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given representation function f, which accepts inputs in either domains, would remain unchanged. Other than f, the training data is unsupervised and consist of a set of samples from each domain, without any mapping between them. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f preserving component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity.

Joint work with Yaniv Taigman and Adam Polyak

Joint work with Srikanth Srinivasan.

A determinantal point process governed by a locally trace class Hermitian contraction kernel on a measure space $E$ remains determinantal when conditioned on its configuration on an arbitrary measurable subset $B \subset E$. Moreover, the conditional kernel can be chosen canonically in a way that is "local" in a non-commutative sense, i.e. invariant under "restriction" to closed subspaces $L^2(B) \subset P \subset L^2(E)$.

Using the properties of the canonical conditional kernel we establish a conjecture of Lyons and Peres: if $K$ is a projection then almost surely all functions in its image can be recovered by sampling at the points of the process.

I will discuss the asymptotic behaviour (both on and off the diagonal) of the spectral function of a Schroedinger operator with smooth bounded potential when energy becomes large. I formulate the conjecture that the local density of states (i.e. the spectral function on the diagonal) admits the complete asymptotic expansion and discuss the known results, mostly for almost-periodic potentials.

Let X be a regular scheme, projective and flat over Spec Z. We give a conjectural formula in terms of motivic cohomology, singular cohomology and de Rham cohomology for the special value of the zeta-function of X at any rational integer. We will explain how this reduces to the standard formula for the residue of the Dedekind zeta-function at s = 1.

A classical theorem by Anosov states that the slow motion of a slow-fast system where the fast subsystem is ergodic with respect to a smooth invariant measure can be approximated, in a well-defined sense, by the slow subsystem averaged over the fast variables. We address the question of what happens if the fast system is not ergodic. We discuss a theory which is developing in joint works with V. Gelfreich, T. Pereira, V. Rom-Kedar and K. Shah, and suggest that in the non-ergodic case the behavior of the slow variables is approximated by a random process, and not a single, deterministic averaged system. We also discuss the question of the relevance of ergodicity to the foundations of statistical mechanics.

Image processing algorithms often involve a data fidelity penalty, which encourages the solution to comply with the input data. Existing fidelity measures (including perceptual ones) are very sensitive to slight misalignments in the locations and shapes of objects. This is in sharp contrast to the human visual system, which is typically indifferent to such variations. In this work, we propose a new error measure, which is insensitive to small smooth deformations and is very simple to incorporate into existing algorithms. We demonstrate our approach in lossy image compression. As we show, optimal encoding under our criterion boils down to determining how to best deform the input image so as to make it "more compressible". Surprisingly, it turns out that very minor deformations (almost imperceptible in some cases) suffice to make a huge visual difference in methods like JPEG and JPEG2000. Thus, by slightly sacrificing geometric integrity, we gain a significant improvement in preservation of visual information.

We also show how our approach can be used to visualize image priors. This is done by determining how images should be deformed so as to best conform to any given image model. By doing so, we highlight the elementary geometric structures to which the prior resonates. Using this method, we reveal interesting behaviors of popular priors, which were not noticed in the past.

Finally, we illustrate how deforming images to possess desired properties can be used for image "idealization" and for detecting deviations from perfect regularity.

Joint work with Tamar Rott Shaham, Tali Dekel, Michal Irani, and Bill Freeman.

Given a quadratic form Q in d variables over the integers, e.g. Q(x,y,z) = xy - z^2, and a set of positive density E in Z^d, we investigate what kind of structure can be found in the set Q(E-E).

We will see that if d >= 3, and Q is indefinite, then the measure rigidity, due to Bourgain-Furman-Lindenstrauss-Mozes or Benoist-Quint, of the action of the group of the symmetries of Q implies that there exists k >=1 such that k^2*Q(Z^d) is a subset of Q(E-E).

We will give an alternative proof of the theorem for the case Q(x,y,z) = xy - z^2 that uses more classical equidistribution results of Vinogradov, and Weyl, as well as a more recent result by Frantzikinakis-Kra. The latter proof extends the theorem to other polynomials having a much smaller group of symmetries. Based on joint works with M. Bjorklund (Chalmers), and K. Bulinski (Sydney).

The algorithm referred to in the title builds on Luks's powerful group-theoretic divide-and-conquer method (1980) and addresses the bottleneck situation where Luks's method fails to "divide".

Luks will continue to "conquer" if an alternative method "divides"; we develop such a partitioning technique.

In the talk we shall outline the algorithm and explain in some detail its group theoretic core, the "Unaffected Stabilizers Lemma" and the "Local Certificates" routine. The Lemma is used to construct, somewhat implausibly, global automorphisms out of local information -- a key step toward the construction of combinatorial structures to which the partitioning method from the previous day's lecture will be applied, providing the required "divide" step.

Symmetry is defined in terms of structure-preserving transformations (automorphisms); regularity in terms of numerical invariants. Symmetry always implies regularity but there are many highly regular combinatorial objects (such as "strongly regular graphs") with no symmetry. The opposite of irregularity is regularity, not symmetry. Yet we show that in a well-defined sense, *the opposite of hidden irregularity is hidden symmetry*, and in fact hidden symmetry of a particularly robust kind.

The symmetry of a circle is easily destroyed: just "individualize" two non-opposite points -- color one of them red, the other blue -- and all the symmetry is gone. In fact, the resulting structure is completely irregular: every point is uniquely identified by a pair of numerical invariants, namely, its pair of distances to the two individualized points. We shall say that the circle has a high degree of hidden irregularity.

In contrast, *Johnson graphs* are objects with robust symmetry: individualizing a small number of vertices of a Johnson graph hardly makes a dent in its symmetry.

Recent work on the algorithmic problem of Graph Isomorphism has revealed that Johnson graphs are unique in this regard: Every finite relational structure of small arity either has a measurable (say 10%) hidden irregularity (revealed by individualizing a polylogarithmic number of elements) or has a large degree of hidden symmetry, manifested in a canonically embedded Johnson graph on more than 90% of the underlying set.

This dichotomy is the key Divide-and-Conquer tool in recent progress on the worst-case complexity of the Graph Isomorphism problem.

This subject is purely combinatorial and does not require advanced mathematical apparatus. The group theoretic aspects of the new Graph Isomorphism test will be discussed in a follow-up seminar on January 30.

Within the wide field of sparse approximation, convolutional sparse coding (CSC) has gained increasing attention in recent years. This model assumes a structured-dictionary built as a union of banded Circulant matrices. Most attention has been devoted to the practical side of CSC, proposing efficient algorithms for the pursuit problem, and identifying applications that benefit from this model. Interestingly, a systematic theoretical understanding of CSC seems to have been left aside, with the assumption that the existing classical results are sufficient.

In this talk we start by presenting a novel analysis of the CSC model and its associated pursuit. Our study is based on the observation that while being global, this model can be characterized and analyzed locally. We show that uniqueness of the representation, its stability with respect to noise, and successful greedy or convex recovery are all guaranteed assuming that the underlying representation is locally sparse. These new results are much stronger and informative, compared to those obtained by deploying the classical sparse theory.

Armed with these new insights, we proceed by proposing a multi-layer extension of this model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers. This, in turn, is shown to be tightly connected to Convolutional Neural Networks (CNN), so much so that the forward-pass of the CNN is in fact the Thresholding pursuit serving the ML-CSC model. This connection brings a fresh view to CNN, as we are able to attribute to this architecture theoretical claims such as uniqueness of the representations throughout the network, and their stable estimation, all guaranteed under simple local sparsity conditions. Lastly, identifying the weaknesses in the above scheme, we propose an alternative to the forward-pass algorithm, which is both tightly connected to deconvolutional and recurrent neural networks, and has better theoretical guarantees.

Many problems about finite groups (e.g., convergence of random walks, properties of word maps, spectrum of Cayley graphs, etc.) can be approached in terms of sums of group characters. More precisely, what intervenes in such sums are the character ratios:

X_r(g) / dim(r), g in G,

where r is an irreducible representation of G, and X_r is its character. This leads to the quest for good estimates on the character ratios.

In this talk I will introduce a precise notion of "**size**" for representations of finite classical groups and show that it tends to put together those with character ratios of the same order of magnitude.

As an application I will show how one might generalize to classical groups the following result of Diaconis-Shahshahani (for k=2) and Berestycki -Schramm -Zeitouni (for general k): The mixing time for the random walk on the group G=S_n using the cycles of length k is (1/k) n log(n).

The talk should be accessible for beginning graduate students, and is part from our joint project with **Roger Howe (Yale and Texas A&M).**

Detecting genetic adaptation in recent human history is a major challenge of population genetics. The fundamental problem is to infer historical changes in the frequency of genetic variants (alleles), from data of contemporary genomes. With this we can identify unusual changes that are unlikely to have occurred in the absence of selective pressures. However, a generally applicable method to infer recent allele frequency changes is lacking. Instead, present methods can only detect frequency changes under very restrictive assumptions on the model of selection. Moreover, their time resolution is generally limited to prehistoric scales, on the order of the past 25-75 thousand years. To address these gaps we developed a novel statistical method, Singleton Density Score (SDS), that infers the recent changes in allele frequencies from local variation in contemporary genome sequences with specificity to historical timescales. Applied to data of ~3000 genomes from the UK10K project, SDS reveals that human genetic adaptation continued well into historical times. Over the past ~2000-3000 years, ancestors of modern Britons genetically evolved over a range of phenotypes related to diet, immunity, and physical appearance. Notably, we found that polygenic adaptation, whereby selection acting on many small-effect variants across the genome that together determine a single trait, has played a pervasive, previously undetected role in shaping present genotypic and phenotypic variation.

Reference:

Field et al, Science 2016, Detection of human adaptation during the past 2000 years. https://www.ncbi.nlm.nih.gov/pubmed/27738015

I will describe two branches of my work related to algorithms for distributed networks. The main focus will be devoted for Fault-Tolerant (FT) Network Structures.

The undisrupted operation of structures and services is a crucial requirement in modern day communication networks. As the vertices and edges of the network may occasionally fail or malfunction, it is desirable to make those structures robust against failures.

FT Network Structures are low cost highly resilient structures, constructed on top of a given network, that satisfy certain desirable performance requirements concerning, e.g., connectivity, distance or capacity. We will overview some results on fault tolerant graph structures with a special focus on FT Breadth-First-Search.

The second part of the talk will discuss distributed models and algorithms for large-scale networks. Towards the end, we will see some connections between distributed computing and other areas such as EE and Biology.

Let M be a smooth, compact, connected two-dimensional, Riemannian manifold without boundary, and let C_epsilon be the amount of time needed for the Brownian motion to come within (Riemannian) distance epsilon of all points in M. The first order asymptotics of C_epsilon as epsilon goes to 0 are known. We show that for the two dimensional sphere

\sqrt{C_epsilon}-2\sqrt{2}\( \log \epsilon^{-1}- \frac{1}{4}\log\log \epsilon^{-1}\) is tight.

Joint work with David Belius and Ofer Zeitouni.

**First Speaker: Ran Tessler (ETH)**

Time: 11:00

Title: A sharp threshold for Hamiltonian spheres in a random 2-complex.

Abstract: We define the notion of Hamiltonian sphere - a 2-complex homeomorphic to a sphere which uses all vertices. We prove an explicit sharp threshold for the appearance of Hamiltonian spheres in the Linial-Meshulam model for random 2-complexes. The proof combines combinatorial, probabilistic and geometric arguments. Based on a joint work with Zur Luria.

**Second Speaker: Assaf Naor (Princeton)**

Time: 12:00

Title: A new vertical-versus-horizontal isoperimetric inequality on the Heisenberg group, with applications to metric geometry and approximation algorithms

Abstract: In this talk we will show that for every measurable subset of the Heisenberg group of dimension at least 5, an appropriately defined notion of its "vertical perimeter" is at most a constant multiple of its horizontal (Heisenberg) perimeter. We will explain how this new isoperimetric-type inequality solves open questions in analysis (an endpoint estimate for a certain singular integral on W^{1,1}), metric geometry (sharp nonembeddability into L_1) and approximation algorithms (asymptotic evaluation of the performance of the Goemans-Linial algorithm for the Sparsest Cut problem). Joint work with Robert Young.

Let X be a set definable in some o-minimal structure. The Pila-Wilkie theorem (in its basic form) states that the number of rational points in the transcendental part of X grows sub-polynomially with the height of the points. The Wilkie conjecture stipulates that for sets definable in $R_\exp$, one can sharpen this asymptotic to polylogarithmic.

I will describe a complex-analytic approach to the proof of the Pila-Wilkie theorem for subanalytic sets. I will then discuss how this approach leads to a proof of the "restricted Wilkie conjecture", where we replace $R_\exp$ by the structure generated by the restrictions of $\exp$ and $\sin$ to the unit interval (both parts are joint work with Dmitry Novikov). If time permits I will discuss possible generalizations and applications.

Pseudo-deterministic algorithms are randomized search algorithms that on different executions on the same input, output the same solution with high probability.

We will discuss how pseudo-deterministic algorithms bridge the gap between randomized search and decision problems for problems in P and in NC. Next, we will show a pseudo-deterministic NC algorithm for bipartite matching. Finally, we will show how pseudo-determinism can be used to save on random bits used by classical randomized algorithms, and apply the method to obtain an algorithm for RNC depth first search using only O(log^2 n) random bits. This is joint work with Shafi Goldwasser.

Consider the Gaussian Entire Function (GEF) whose Taylor coefficients are independent complex-valued Gaussian variables, and the variance of the k-th coefficient is 1/k!. This random Taylor series is distinguished by the invariance of its zero set with respect to the isometries of the complex plane.

I will show that the law of the zero set, conditioned on the GEF having no zeros in a disk of radius r, and properly normalized, converges to an explicit limiting Radon measure in the plane, as r goes to infinity. A remarkable feature of this limiting measure is the existence of a large 'forbidden region' between a singular part supported on the boundary of the (scaled) hole and the equilibrium measure far from the hole. This answers a question posed by Nazarov and Sodin, and is in stark contrast to the corresponding result known to hold in the random matrix setting, where such a gap does not appear.

The talk is based on a joint work with S. Ghosh.

The Operator Scaling problem asks whether a set of complex matrices can be jointly moved to a certain canonical (isotropic) position. This problem has a remarkable number of myriad incarnations: non-commutative algebra, invariant theory, arithmetic complexity, quantum information theory, analytic inequalities and more. We will describe an efficient algorithm solving all these related problems, and explain how their analysis combines ideas from all these areas.

Through these connections, the algorithm can be shown to solve some non-convex optimization problems, some systems of quadratic equations, and some linear programs with exponentially many inequalities - all these, and concrete examples we will give, suggest that it might be a powerful algorithmic tool via reductions to these problems.

No special background will be assumed!

Joint on two joint works with Ankit Garg, Leonid Gurvits and Rafael Olivera.

This talk is longer than usual and has a two-hour slot.

If f, g are two polynomials in C[x,y] such that J(f,g)=1, but C[f,g] does not coincide with C[x,y], then the mapping given by these polynomials ( (x,y) maps to (f(x,y), g(x,y)) ) has a rather unexpected property which will be discussed in the talk.

The proliferation of data collection in the health, commercial, and economic spheres, brings with it opportunities for extracting new knowledge with concrete policy implications. Examples include individualizing medical practices based on electronic healthcare records, and understanding the implications of job training programs on employment and income.

The scientific challenge lies in the fact that standard prediction models such as supervised machine learning are often not enough for decision making from this so-called "observational data": Supervised learning does not take into account causality, nor does it account for the feedback loops that arise when predictions are turned into actions. On the other hand, existing causal-inference methods are not adapted to dealing with the rich and complex data now available, and often focus on populations, as opposed to individual-level effects.

The problem is most closely related to reinforcement learning and bandit problems in machine learning, but with the important property of having no control over experiments and no direct access to the actor's model.

In my talk I will discuss how we apply recent ideas from machine learning to individual-level causal-inference and action. I will introduce a novel generalization bound for estimating individual-level treatment effect, and further show how we use representation learning and deep temporal generative models to create new algorithms geared towards this problem. Finally, I will show experimental results using data from electronic medical records and data from a job training program.

The sensitivity conjecture is a famous open problem in the theory of boolean functions. Let f be a boolean function defined on the hypercube. The sensitivity of a node x is the number of its neighbours in the hypercube, for which f give the opposite value as that it does on x. The sensitivity conjecture speculates that if all nodes have low sensitivity, then the function f must be simple. Concretely, all its Fourier mass is supported on levels with low hamming weight.

Recently, Gopalan et al [CCC 2016] conjectured a robust analogue of the sensitivity conjecture: if most of the nodes have low sensitivity, then most of the Fourier mass is supported on levels with low hamming weight. They also prove it under the stronger assumption that all nodes have low sensitivity. In this work, we prove this conjecture, with near tight quantitative bounds.

Joint work with Avishay Tal (IAS) and Jiapeng Zhang (UCSD).

Given a dynamical system, a partition of the space induces a mapping to the space of sequences of the partition elements (a point is mapped to the partition elements containing its orbit terms). Such a duality is called Symbolic Dynamics, Markov partitions are an important tool, as the symbolic dynamics they induce enfold many of the important dynamical properties of the original system, and they allow an easier studying of them.

We show that general non uniformly hyperbolic C^{1+epsilon} diffeomorphism on compact manifolds of any dimension admit countable Markov partitions. Previously this was only known in dimension 2.

We study an online learning framework introduced by Mannor and Shamir (2011) in which the feedback is specified by a graph, in a setting where the graph may vary from round to round and is \emph{never fully revealed} to the learner. We show a large gap between the adversarial and the stochastic cases. In the adversarial case, we prove that even for dense feedback graphs, the learner cannot improve upon a trivial regret bound obtained by ignoring any additional feedback besides her own loss. In contrast, in the stochastic case we give an algorithm that achieves $\widetilde \Theta(\sqrt{\alpha T})$ regret over $T$ rounds, provided that the independence numbers of the hidden feedback graphs are at most $\alpha$. completely unlearnable. We also extend our results to a more general feedback model, in which the learner does not necessarily observe her own loss, and show that, even in simple cases, concealing the feedback graphs might render the problem unlearnable.

Cauchy-Riemann maps (shortly: CR-maps) occur in complex analysis as boundary values of maps holomorphic in a domain in complex space. As a rule, CR-mappings of real-analytic hypersurfaces appear to be analytic as well. However, we recently showed in a joint work with Rasul Shafikov the existence of Stokes Phenomenon in CR-geometry: there exist real-analytic hypersurfaces, which are equivalent formally, but not holomorphically.

Despite of this, it appears that in complex dimension 2, CR-maps necessarily posses appropriate weaker regularity properties. Namely, components of such maps necessarily belong to the well known Gevrey classes. The latter statement has the following remarkable application: if two real-analytic hypersurfaces in complex two-space are equivalent formally, then they are also equivalent smoothly.

The proof of all these facts employs the recent multi-summability theory in Dynamical Systems. It as well employs the recent CR-DS technique that we developed, which connects CR-manifolds and certain Dynamical Systems. In this talk, I will outline the technique, as well as some details of the proof.

Liouville's rigidity theorem (1850) states that a map $f:\Omega\subset R^d \to R^d$ that satisfies $Df \in SO(d)$ is an affine map. Reshetnyak (1967) generalized this result and showed that if a sequence $f_n$ satisfies $Df_n \to SO(d)$ in $L^p$, then $f_n$ converges to an affine map.

In this talk I will discuss generalizations of these theorems to mappings between manifolds, present some open questions, and describe how these rigidity questions arise in the theory of elasticity of pre-stressed materials (non-Euclidean elasticity).

If time permits, I will sketch the main ideas of the proof, using Young measures and harmonic analysis techniques, adapted to Riemannian settings.

Based on a joint work with Asaf Shachar and Raz Kupferman.

We show an efficient method for converting a logic circuit of gates with fan-out 1 into an equivalent circuit that works even if some fraction of its gates are short-circuited, i.e., their output is short-circuited to one of their inputs. Our conversion can be applied to any circuit with fan-in k>= 2, yielding a resilient circuit whose size is polynomial in the size of the (non-resilient) input circuit.

The resilient circuit gives the correct output as long as less than 1/3 of the gates in any of its input-to-output paths are corrupted. Furthermore, we prove that a resilience level of 1/3 is optimal (maximal) for this type of faulty gates. This fully answers an open question by Kalai et al. (FOCS 2012).

Joint work with Mark Braverman and Michael Yitayew.

Everyone wants to program with "high-level concepts", rather than meddle with the fine details of the implementation, such as pointers, network packets, and asynchronous callbacks. This is usually achieved by introducing layers of abstraction - but every layer incurs some overhead, and when they accumulate, this overhand becomes significant and sometimes prohibitive. Optimizing the program often requires to break abstractions, which leads to suboptimal design, higher maintenance costs, and subtle hard-to-trace bugs.

I will present two recent projects that attempt to address this difficulty. STNG is an automated lifting compiler that can synthesize high-level graphics code for computing stencils over matrices, from low-level legacy code written in Fortran. Its output is expressed in Halide, a domain-specific language for image processing that can take advantage of modern GPUs. The result is therefore code that is both easier to understand and more efficient than the original code.

Bellmania is a specialized tool for accelerating dynamic-programming algorithms by generating recursive divide-and-conquer implementations of them. Recursive divide-and-conquer is an algorithmic technique that was developed to obtain better memory locality properties. Bellmania includes a high-level language for specifying dynamic programming algorithms and a calculus that facilitates gradual transformation of these specifications into efficient implementations. These transformations formalize the divide-and-conquer technique; a visualization interface helps users to interactively guide the process, while an SMT-based back-end verifies each step and takes care of low-level reasoning required for parallelism.

The lesson is that synthesis techniques are becoming mature enough to play a role in the design and implementation of realistic software systems, by combining the elegance of abstractions with the performance gained by optimizing and tuning the fine details.

We discuss recent progress on hardness of 2-to-2 games, with applications to the inapproximability of the Vertex-Cover problem.

A 2-to-2 game (which is a variant of Khot's well known unique games), is defined by a graph where there is a variable in each node, and a constraint of a specific structure defined on each edge. While in unique games each edge- constraint must be a one-to-one correspondence -- i.e. for each assignment to one node there is exactly one assignent to the other node that satisfies the constraint -- in 2-to-2 games the correspondence has a "two-to-two" structure.

The goal is to distinguish between instances in which almost all of the edge- constraints can be satisfied, and instances in which almost none of them can be satisfied simultaneously.

We present a new combinatorial hypothesis regarding Grassmann graphs, and show that it implies that 2-to-2 games are NP-hard *in a certain sense*. As a consequence, the hypothesis implies that it is NP-hard to distinguish between graphs that have an independent set of fractional size (1- 1/sqrt{2}), and graphs with no independent sets of any constant fractional size. This easily implies that it is NP-hard to approximate the Vertex Cover problem within a factor \sqrt{2} - o(1).

The talk is mostly based on a joint work with Subhash Khot and Muli Safra, nevertheless, we will also mention results from a more recent extension, which is a joint work with Irit Dinur, Subhash Khot, Guy Kindler and Muli Safra.

Selfadjoint extensions of a closed symmetric operator A in a Hilbert space with equal deficiency indices were described by in the 30's by J. von Neumann. Another approach, based on the notion of abstract boundary triple originates in the work of J.W. Calkin and was developed by M. I. Visik, G. Grubb, F. S. Rofe-Beketov, M. L. Gorbachuck, A .N. Kochubei and others.

By Calkin's approach, all selfadjoint extensions of the symmetric operator A can be parametrized via "multivalued" selfadjoint operators in an auxiliary Hilbert space. Spectral properties of these extensions can be characterized in terms of the abstract Weyl function, associated to the boundary triple. In the present talk some recent developments in the theory of boundary triples will be presented. Applications to boundary value problems for Laplacian operators in bounded domains with smooth and rough boundaries will be discussed.

We characterize the communication complexity of truthful mechanisms. Our departure point is the well known taxation principle. The taxation principle asserts that every truthful mechanism can be interpreted as follows: every player is presented with a menu that consists of a price for each bundle (the prices depend only on the valuations of the other players). Each player is allocated a bundle that maximizes his profit according to this menu. We define the taxation complexity of a truthful mechanism to be the logarithm of the maximum number of menus that may be presented to a player.

Our main finding is that in general the taxation complexity essentially equals the communication complexity. The proof consists of two main steps. First, we prove that for rich enough domains the taxation complexity is at most the communication complexity. We then show that the taxation complexity is much smaller than the communication complexity only in "pathological" cases and provide a formal description of these extreme cases.

Our approach yields several applications, including strengthening the solution concept with low communication overhead, fast computation of prices, and hardness of approximation by computationally efficient truthful mechanisms.

In the classical Node-Disjoint Paths (NDP) problem, the input consists of an undirected n-vertex graph G, and a collection M of pairs of its vertices, called source-destination, or demand, pairs. The goal is to route the largest possible number of the demand pairs via node-disjoint paths. The best current approximation for the problem is achieved by a simple greedy algorithm, whose approximation factor is O(\sqrt n), while the best current negative result is a roughly \Omega(log^{1/2}n)-hardness of approximation. Even seemingly simple special cases of the problem are still poorly understood: when the input graph is a grid, the best current algorithm achieves a \tilde{O}(n^{1/4})- approximation, and when it is a general planar graph, the best current approximation ratio of an efficient algorithm is \tilde{O}(n^{9/19}). The best currently known lower bound for both these versions of the problem is APX- hardness.

In this talk we will show that NDP is 2^{\Omega(\log n)}-hard to approximate, unless all problems in NP have algorithms with running time n^{O(\log n)}. Our result holds even when the underlying graph is a planar graph with maximum vertex degree 3, and all source vertices lie on the boundary of a single face. We extend this result to the closely related Edge-Disjoint Paths problem, showing the same hardness of approximation ratio even for sub-cubic planar graphs with all sources lying on the boundary of a single face.

This is joint work with David H.K. Kim and Rachit Nimavat.

One of the most common tasks in cryptography and cryptanalysis is to find some interesting event (a needle) in an exponentially large collection (haystack) of N=2^n possible events, or to demonstrate that no such event is likely to exist. In particular, we are interested in finding needles which are defined as events that happen with an unusually high probability of p>>1/N in a haystack which is an almost uniform distribution on N possible events. When the search algorithm can only sample values from this distribution, the best known time/memory tradeoff for finding such an event requires O(1/Mp^2) time given O(M) memory.

In this talk I will describe much faster needle searching algorithms in the common cryptographic setting in which the distribution is defined by applying some deterministic function f to random inputs. Such a distribution can be modeled by a random directed graph with N vertices in which almost all the vertices have O(1) predecessors while the vertex we are looking for has an unusually large number of O(pN) predecessors. When we are given only a constant amount of memory, we propose a new search methodology which we call **NestedRho**. As p increases, such random graphs undergo several subtle phase transitions, and thus the log-log dependence of the time complexity T on p becomes a piecewise linear curve which bends four times. Our new algorithm is faster than the O(1/p^2) time complexity of the best previous algorithm in the full range of 1/N < p < 1 , and in particular it improves the previous time complexity by a significant factor of \sqrt{N} for any p in the range N^(- 0.75) < p < N^(-0.5). When we are given more memory, we show how to combine the **NestedRho** technique with the parallel collision search technique in order to further reduce its time complexity. Finally, we show how to apply our new search technique to more complicated distributions with multiple peaks when we want to find all the peaks whose probabilities are higher than p.

Joint work with Itai Dinur, Orr Dunkelman and Nathan Keller.

Projected gradient descent (PGD), and its close variants, are often considered the methods of choice for solving a large variety of machine learning optimization problems, including empirical risk minimization, statistical learning, and online convex optimization. This is not surprising, since PGD is often optimal in a very appealing information-theoretic sense. However, for many problems PGD is infeasible both in theory and practice since each step requires to compute an orthogonal projection onto the feasible set. In many important cases, such as when the feasible set is a non-trivial polytope, or a convex surrogate for a low-rank structure, computing the projection is computationally inefficient in high-dimensional settings. An alternative is the conditional gradient method (CG), aka Frank-Wolfe algorithm, that replaces the expensive projection step with a linear optimization step over the feasible set. Indeed in many problems of interest, the linear optimization step admits much more efficient algorithms than the projection step, which is the reason to the substantial regained interest in this method in the past decade. On the downside, the convergence rates of the CG method often fall behind that of PGD and its variants.

In this talk I will survey an ongoing effort to design CG variants that on one hand enjoy the cheap iteration complexity of the original method, and on the other hand converge provably faster, and are applicable to a wider variety of machine learning settings. In particular I will focus on the cases in which the feasible set is either a polytope or a convex surrogate for low-rank matrices. Results will be demonstrated on applications including: LASSO, video co-localization, optical character recognition, matrix completion, and multi-class classification.

Suppose that you have n truly random bits X=(X1,...,Xn) and you wish to use them to generate m>>n pseudorandom bits Y=(Y1,..., Ym) using a local mapping, i.e., each Yi should depend on at most d=O(1) bits of x. In the polynomial regime of m=n^s, s>1, the only known solution, originates from (Goldreich, ECCC 2000), is based on Random Local Functions: Compute Yi by applying some fixed (public) d-ary predicate P to a random (public) tuple of distinct inputs. In this talk, we will try to understand, for any value of s, how the pseudorandomness of the resulting sequence depends on the choice of the underlying predicate.

Based on joint work with Shachar Lovett.

One of the most exciting possibilities opened by deep neural networks is end-to-end learning: the ability to learn tasks without the need for feature engineering or breaking down into sub-tasks. This talk will present three cases illustrating how end-to-end learning can operate in machine perception across the senses (Hearing, Vision) and even for the entire perception-cognition-action cycle.

The talk begins with speech recognition, showing how acoustic models can be learned end-to-end. This approach skips the feature extraction pipeline, carefully designed for speech recognition over decades.

Proceeding to vision, a novel application is described: identification of photographers of wearable video cameras. Such video was previously considered anonymous as it does not show the photographer.

The talk concludes by presenting a new task, encompassing the full perception-cognition-action cycle: visual learning of arithmetic operations using only pictures of numbers. This is done without using or learning the notions of numbers, digits, and operators.

The talk is based on the following papers:

Speech Acoustic Modeling From Raw Multichannel Waveforms, Y. Hoshen, R.J. Weiss, and K.W. Wilson, ICASSP'15

An Egocentric Look at Video Photographer Identity, Y. Hoshen, S. Peleg, CVPR'16

Visual Learning of Arithmetic Operations, Y. Hoshen, S. Peleg, AAAI'16

If $X$ is a finite set, the $p$-biased measure on the power-set of $X$ is defined as follows: choose a subset $S$ of $X$ at random by including each element of $X$ independently with probability $p$. If $\mathcal{F}$ is a family of subsets of $X$, one can consider the {\em $p$-biased measure} of $\mathcal{F}$, denoted by $\mu_p(\mathcal{F})$, as a function of $p$; if $\mathcal{F}$ is closed under taking supersets, then this function is an increasing function of $p$. Seminal results of Friedgut and Friedgut-Kalai give criteria for this function to have a 'sharp threshold'. A careful analysis of the behaviour of this function also yields some rather strong results in extremal combinatorics which do not explicitly mention the $p$-biased measure - in particular, in the field of {\em Erd\H{o}s-Ko-Rado type problems}, which concern the sizes of families of objects in which any two (or three...) of the objects 'agree' or 'intersect' in some way. We will discuss some of these, including a recent proof of an old conjecture of Frankl that a symmetric three-wise intersecting family of subsets of $\{1,2,\ldots,n\}$ has size $o(2^n)$, and some 'stability' results characterizing the structure of 'large' $t$-intersecting families of $k$-element subsets of $\{1,2,\ldots,n\}$. Based on joint work with (subsets of) Nathan Keller, Noam Lifshitz and Bhargav Narayanan.

Interactive proofs have had a dramatic impact on Complexity Theory and Cryptography. The celebrated IP=PSPACE Theorem [LFKN92,Shamir92] allows an all-powerful but untrusted prover to convince a polynomial-time verifier of the validity of extremely complicated statements (as long as they can be evaluated using polynomial space). The interactive proof system designed for this purpose requires a polynomial number of communication rounds. This talk will focus on studying the power of more efficient interactive proof systems.

Our main result is that for every statement that can be evaluated in polynomial time and bounded-polynomial space, there exists an interactive proof that satisfies the following strict efficiency requirements:

(1) The honest prover runs in polynomial time.

(2) The verifier is almost linear time (and under some conditions even sub linear).

(3) The interaction consists of only a constant number of communication rounds.

To obtain this result, we introduce and study several new notions for interactive proofs, which may be of independent interest. One of these notions is that of unambiguous interactive proofs, where the prover has a unique successful strategy. Another notion is that of probabilistically checkable interactive proofs (PCIPs) where the verifier only reads a few bits of the transcript in checking the proof (this could be viewed as an interactive extension of PCPs).

Joint work with Omer Reingold and Ron Rothblum.

Computer science and optics are usually studied separately -- separate people, in separate departments, meet at separate conferences. This is changing. The exciting promise of technologies like virtual reality and self-driving cars demand solutions that draw from the best aspects of computer vision, computer graphics, and optics. Previously, it has proved difficult to bridge these communities. For instance, the laboratory setups in optics are often designed to image millimeter-size scenes in a vibration-free darkroom.

This talk is centered around time of flight imaging, a growing area of research in computational photography. A time of flight camera works by emitting amplitude modulated (AM) light and performing correlations on the reflected light. The frequency of AM is in the radio frequency range (like a Doppler radar system), but the carrier signal is optical, overcoming diffraction limited challenges of full RF systems while providing optical contrast. The obvious use of such cameras is to acquire 3D geometry. By spatially, temporally and spectrally coding light transport we show that it may be possible to go "beyond depth", demonstrating new forms of imaging like photography through scattering media, fast relighting of photographs, real-time tracking of occluded objects in the scene (like an object around a corner), and even the potential to distinguish between biological molecules using fluorescence. We discuss the broader impact of this design paradigm on the future of 3D depth sensors, interferometers, computational photography, medical imaging and many other applications.

Cluster algebras are a class of commutative rings introduced by Fomin and Zelevinsky in 2000. I will give an introductory talk about cluster algebras. The main examples are the cluster algebra of type A2, the coordinate ring of $SL_4/N$, and the homogeneous coordinate ring of the Grassmannian $Gr_{2,n+3}(\mathbb{C})$.

This Thursday we will have two SIGGRAPH rehearsal talks in the Vision Seminar, one by Netalee Efrat and one by Meirav Galun. Abstracts are below. Each talk will be about 15 minutes (with NO interruptions), followed by 10 minutes feedback.

**Talk1 (Netalee Efrat): Cinema 3D: Large scale automultiscopic display **

While 3D movies are gaining popularity, viewers in a 3D cinema still need to wear cumbersome glasses in order to enjoy them. Automultiscopic displays provide a better alternative to the display of 3D content, as they present multiple angular images of the same scene without the need for special eyewear. However, automultiscopic displays cannot be directly implemented in a wide cinema setting due to variants of two main problems: (i) The range of angles at which the screen is observed in a large cinema is usually very wide, and there is an unavoidable tradeoff between the range of angular images supported by the display and its spatial or angular resolutions. (ii) Parallax is usually observed only when a viewer is positioned at a limited range of distances from the screen. This work proposes a new display concept, which supports automultiscopic content in a wide cinema setting. It builds on the typical structure of cinemas, such as the fixed seat positions and the fact that different rows are located on a slope at different heights. Rather than attempting to display many angular images spanning the full range of viewing angles in a wide cinema, our design only displays the narrow angular range observed within the limited width of a single seat. The same narrow range content is then replicated to all rows and seats in the cinema. To achieve this, it uses an optical construction based on two sets of parallax barriers, or lenslets, placed in front of a standard screen. This paper derives the geometry of such a display, analyzes its limitations, and demonstrates a proof-of-concept prototype.

*Joint work with Piotr Didyk, Mike Foshey, Wojciech Matusik, Anat Levin

**Talk 2 (Meirav Galun): Accelerated Quadratic Proxy for Geometric Optimization **

We present the Accelerated Quadratic Proxy (AQP) - a simple first order algorithm for the optimization of geometric energies defined over triangular and tetrahedral meshes. The main pitfall encountered in the optimization of geometric energies is slow convergence. We observe that this slowness is in large part due to a Laplacian-like term existing in these energies. Consequently, we suggest to exploit the underlined structure of the energy and to locally use a quadratic polynomial proxy, whose Hessian is taken to be the Laplacian. This improves stability and convergence, but more importantly allows incorporating acceleration in an almost universal way, that is independent of mesh size and of the specific energy considered. Experiments with AQP show it is rather insensitive to mesh resolution and requires a nearly constant number of iterations to converge; this is in strong contrast to other popular optimization techniques used today such as Accelerated Gradient Descent and Quasi-Newton methods, e.g., L-BFGS. We have tested AQP for mesh deformation in 2D and 3D as well as for surface parameterization, and found it to provide a considerable speedup over common baseline techniques.

*Joint work with Shahar Kovalsky and Yaron Lipman

The Jacquet-Rallis relative trace formula was introduced as a tool towards solving the global conjectures of Gan-Gross-Prasad for unitary groups. I will present some recent progress in developing the full formula.

I will show how to extend the transfer of regular orbital integrals to singular geometric terms using a mix of local and global methods.

(Joint with Pierre-Henri Chaudouard)

Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge-coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. In this paper we prove that these exponential gaps are necessary and establish connections between the deterministic and randomized complexities in the LOCAL model. Each result has a very compelling take-away message:

1. Fast Δ-coloring of trees requires random bits: Building on the recent lower bounds of Brandt et al., we prove that the randomized complexity of Δ-coloring a tree with maximum degree Δ≥55 is Θ(log_Δ log n), whereas its deterministic complexity is Θ(log_Δ n) for any Δ≥3. This also establishes a large separation between the deterministic complexity of Δ-coloring and (Δ+1)-coloring trees.

2. Randomized lower bounds imply deterministic lower bounds: We prove that any deterministic algorithm for a natural class of problems that runs in O(1)+o(log_Δ n) rounds can be transformed to run in O(log*n −log*Δ+1) rounds. If the transformed algorithm violates a lower bound (even allowing randomization), then one can conclude that the problem requires Ω(log_Δ n) time deterministically.

3. Deterministic lower bounds imply randomized lower bounds: We prove that the randomized complexity of any natural problem on instances of size n is at least its deterministic complexity on instances of size √ log n. This shows that a deterministic Ω(log_Δ n) lower bound for any problem implies a randomized Ω(log_Δ log n) lower bound. It also illustrates that the graph shattering technique is absolutely essential to the LOCAL model.

Joint work with Tsvi Kopelowitz and Seth Pettie. http://arxiv.org/abs/1602.08166

There are numerous essentially equivalent characterizations of mixing in $L_1$ of a finite Markov chain. Some of these characterizations involve natural probabilistic concepts such as couplings, stopping times and hitting times. In contrast, while there are several analytic and geometric tools for bounding the $L_2$ mixing time, none of them are tight and they do not have a probabilistic interpretation.

We provide tight probabilistic characterizations in terms of hitting times distributions for mixing in $L_2$ (for normal chains) and (under reversibility) in relative entropy. This is done by assigning appropriate penalty (depending on the size of the set) to the case that the chain did not escape from a certain set.

We also prove a new extremal characterization of the log-sobolev constant in terms of a weighted version of the spectral gap (where the weight depends on the size of the support of the function).

We will discuss representation theory of a symmetric pair (G,H), where G is a complex reductive group, and H is a real form of G. The main objects of study are the G-representations with a non trivial H-invariant functional, called the H-distinguished representations of G.

I will give a necessary condition for a G-representation to be H-distinguished and show that the multiplicity of such representations is less or equal to the number of double cosets B\G/H, where B is a Borel subgroup of G.

Quantum computers are hypothetical devices, based on quantum physics, which would enable us to perform certain computations (among them some that Chaim Leib Pekeris pioneered) hundreds of orders of magnitude faster than digital computers. This feature is coined “quantum supremacy.” We start the lecture with a gentle introduction to computing - classical and quantum, with basic notions of computational complexity, and with the vision of quantum computers.

A main reason for concern regarding the feasibility of quantum computers is that quantum systems are inherently noisy. We will explain what is "noise" and describe an optimistic hypothesis regarding quantum noise that will allow quantum computing and a pessimistic hypothesis that won’t. The remarkable progress witnessed during the past two decades in the field of experimental physics of controlled quantum systems places the decision between the pessimistic and optimistic hypotheses within reach. On the optimistic side, one aspect or another of quantum supremacy might be seen by experiments in the near future: by implementing quantum error-correction or by systems of free bosons or by exotic new phases of matter called anyons or by quantum annealing, or in various other ways.

In the lecture I will explain my pessimistic line of research and here is a brief summary of my view: understanding quantum computers in the presence of noise requires consideration of behavior at different scales. In the small scale, standard models of noise from the mid-90s are suitable, and quantum evolutions and states described by them manifest a very low-level computational power. This small-scale behavior has far-reaching consequences for the behavior of noisy quantum systems at larger scales. On the one hand, it does not allow reaching the starting points for quantum fault tolerance and quantum supremacy, making them both impossible at all scales. On the other hand, it leads to novel implicit ways for modeling noise at larger scales and to various predictions on the behavior of noisy quantum systems.

We will rely on the theory of noise-sensitivity and stability developed with Benjamini and Schramm in the late 90s and on recent work with Guy Kindler related to the mysterious gap between permanents and determinants (or, in other words, between bosons and fermions).

Let F/Q_p be a finite extension, supersingular representations are the irreducible mod p representations of GL_n(F) which do not appear as a subquotient of a principal series representation, and similarly to the complex case, they are the building blocks of the representation theory of GL_n(F). Historically, they were first discovered by L. Barthel and R. Livne some twenty years ago and they are still not understood even for n=2.

For F=Q_p, the supersingular representations of GL_2(F) have been classified by C. Breuil, and a local mod p Langlands correspondence was established between them and certain mod p Galois representations.

When one tries to generalize this connection and move to a non-trivial extension of Q_p, Breuil's method fails; The supersingular representations in that case have complicated structure and instead of two as in the case F=Q_p we get infinitely many such representations, when there are essentially only finitely many on the Galois side.

In this talk we give an exposition of the subject and explore, using what survives from Breuil's methods, the universal modules whose quotients contain all the supersingular representations in the difficult case where F is a non-trivial extension of Q_p.

In distributed systems, communication between the participants in the computation is usually the most expensive part of the computation. Theoretical models of distributed systems usually reflect this by neglecting the cost of local computation, and charging only for messages sent between the participants; in particular, we usually assume that the computation proceeds in rounds, and in each round, each participant can send only a limited number of bits. We are interested in characterizing the number of rounds required to perform various tasks.

In this talk I will describe two sets of results. The first concerns the complexity of distributed subgraph detection: we have n servers, each representing a node in an undirected graph, and each server receives as input its adjacent edges in the graph. The goal of the computation is to determine whether the global input graph contains some fixed subgraph. I will describe upper and lower bounds for several classes of subgraphs, through a connection to Turan numbers. The general case remains open.

In the second part of the talk I will describe recent work on multi- party number-in-hand communication and information complexity, and show a tight upper and lower bound for set disjointness in the shared blackboard model.

Joint work with Mark Braverman, Andrew Drucker and Fabian Kuhn.

Children may learn about the world by pushing, banging, and manipulating things, watching and listening as materials make their distinctive sounds-- dirt makes a thud; ceramic makes a clink. These sounds reveal physical properties of the objects, as well as the force and motion of the physical interaction.

We've explored a toy version of that learning-through-interaction by recording audio and video while we hit many things with a drumstick. We developed an algorithm the predict sounds from silent videos of the drumstick interactions. The algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We demonstrate that the sounds generated by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that the task of predicting sounds allows our system to learn about material properties in the scene.

Joint work with:

Andrew Owens, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson

http://arxiv.org/abs/1512.08512

The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. In a sense, the strongest guarantee available is the L2 guarantee, which requires finding all items that occur at least eps*||f|| times in the stream, where the i-th coordinate of the vector f is the number of occurrences of i in the stream. The first algorithm to achieve the L2 guarantee was the CountSketch (Charikar, Chen, and Farach-Colton ICALP'02), which, for constant eps, requires O(log n) words of memory and O(log n) update time. It is known to be space-optimal if the stream includes deletions.

In this talk I will discuss recent improvements that allow us to find L2 heavy hitters in O(1) memory and O(1) update time in insertion only streams. The improvements rely on a deeper understanding of the AMS sketch (Alon, Matias, and Szegedy STOC'96) and similar sketches and draw on the theory of Gaussian processes. This talk is based on joint work with Vladimir Braverman, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P. Woodruff in arxiv:1511.00661 and arxiv:1603.00759.

There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for essentially similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self-averaging, interpolating algorithm which fits training data without error but is nevertheless somewhat smooth. We show that AdaBoost has the same property. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples and some theoretical justification to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees and without direct regularization or early stopping.

For some rectangular Hardy classes of analytic functions,an optimal method of interpolation has been previously found, within the framework of Optimal Recovery. It will be shown that this method of interpolation, based on the Abel-Jacobi elliptic functions, is also optimal, according to corresponding criteria of Nonparametric Regression and Optimal Design.

In a non-asymptotic setting, the maximal mean squared error of the optimal interpolant is evaluated explicitly, for all noise levels away from 0. In these results, a pivotal role is played by an interference effect, in which both the stochastic and deterministic parts of the interpolant exhibit an oscillating behavior, with the two oscillating processes mutually subduing each other.

In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, we propose the methodology of sampling correctors. Such algorithms use structure that the distribution is purported to have, in order to allow one to make "on-the-fly" corrections to samples drawn from probability distributions. These algorithms may then be used as filters between the noisy data and the end user. We show connections between sampling correctors, distribution learning algorithms, and distribution property testing algorithms. We show that these connections can be utilized to expand the applicability of known distribution learning and property testing algorithms as well as to achieve improved algorithms for those tasks.Warning: This talk contains more questions than answers...

Joint work with Clement Canonne and Themis Gouleakis.

We present a randomized algorithm that computes a Minimum Spanning Tree