## Select Seminar Series

All seminars- Home
- ›
- Studying at the Faculty
- ›
- Seminars ›
- Previous Seminars

# Previous Seminars

We study the non-vanishing gradient-like vector fields $v$ on smooth compact manifolds $X$ with boundary. We call such fields traversing. With the help of a boundary generic field $v$, we divide the boundary $\d X$ of $X$ into two complementary compact manifolds, $\d^+X(v)$ and $\d^-X(v)$. Then we introduce the causality map $C_v: \d^+X(v) \to \d^-X(v)$, a distant relative of the Poincare return map. Let $\mathcal F(v)$ denote the oriented 1-dimensional foliation on $X$, produced by a traversing $v$-flow.

Our main result, the Holography Theorem, claims that, for boundary generic traversing vector fields $v$, the knowledge of the causality map $C_v$ is allows for a reconstruction of the pair $(X, \mathcal F(v))$, up to a homeomorphism $\Phi: X \to X$ which is the identity on the boundary $\d X$. In other words, for a massive class of ODE's, we show that the topology of their solutions, satisfying a given boundary value problem, is rigid. We call these results ``holographic" since the $(n+1)$-dimensional $X$ and the un-parameterized dynamics of the flow on it are captured by a single correspondence $C_v$ between two $n$-dimensional screens, $\d^+X(v)$ and $\d^-X(v)$.

This holography of traversing flows has numerous applications to the dynamics of general flows. Time permitting, we will discuss some applications of the Holography Theorem to the geodesic flows and the inverse scattering problems on Riemannian manifolds with boundary.

Let Rep(GL(m|n)) denote the category of finite-dimensional algebraic representations of the supergroup Gl(m|n). Nowadays the abelian structure (Ext^1 between irreducibles, block description,...) is well understood. Kazhdan-Lusztig theory gives an algorithmic solution for the character problem, and in special cases even explicit character formulas. However we understand the monoidal structure hardly at all (e.g. the decomposition of tensor products into the indecomposable constituents). I will talk about the problem of decomposing tensor products "up to superdimension 0", i.e. about the structure of Rep(GL(m|n))/N where N is the ideal of indecomposable representations of superdimension 0.

When dealing with a highly variable problem such as action recognition, focusing on a small area, such as the hand's region, makes the problem more manageable, and enables us to invest relatively high amount of resources needed for interpretation in a small but highly informative area of the image. In order to detect this region of interest in the image and properly analyze it, I have built a process that includes several steps, starting with a state of the art hand detector, incorporating both detection of the hand by appearance and by estimation of human body pose. The hand detector is built upon a fully convolutional neural network, detecting hands efficiently and accurately. The human body pose estimation starts with a state of the art head detector and continues with a novel approach where each location in the image votes for the position of each body keypoint, utilizing information from the whole image. Using dense, multi-target votes enables us to compute image-dependent joint keypoint probabilities by looking at consensus voting, and accurately estimates the body pose. Once the detection of the hands is complete, an additional step of segmentation of the hand and fingers is made. In this step each hand pixel in the image is labeled using a dense fully convolutional network. Finally, an additional step is made to segment and identify the held object. Understanding the hand-object interaction is an important step toward understanding the action taking place in the image. These steps enable us to perform fine interpretation of hand-object interaction images as an essential step towards understanding the human-object interaction and recognizing human activities.

Recent progress in imaging technologies leads to a continuous growth in biomedical data, which can provide better insight into important clinical and biological questions. Advanced machine learning techniques, such as artificial neural networks are brought to bear on addressing fundamental medical image computing challenges such as segmentation, classification and reconstruction, required for meaningful analysis of the data. Nevertheless, the main bottleneck, which is the lack of annotated examples or 'ground truth' to be used for training, still remains.

In my talk, I will give a brief overview on some biomedical image analysis problems we aim to address, and suggest how prior information about the problem at hand can be utilized to compensate for insufficient or even the absence of ground-truth data. I will then present a framework based on deep neural networks for the denoising of Dynamic contrast-enhanced MRI (DCE-MRI) sequences of the brain. DCE-MRI is an imaging protocol where MRI scans are acquired repetitively throughout the injection of a contrast agent, that is mainly used for quantitative assessment of blood-brain barrier (BBB) permeability. BBB dysfunctionality is associated with numerous brain pathologies including stroke, tumor, traumatic brain injury, epilepsy. Existing techniques for DCE-MRI analysis are error-prone as the dynamic scans are subject to non-white, spatially-dependent and anisotropic noise. To address DCE-MRI denoising challenges we use an ensemble of expert DNNs constructed as deep autoencoders, where each is trained on a specific subset of the input space to accommodate different noise characteristics and dynamic patterns. Since clean DCE-MRI sequences (ground truth) for training are not available, we present a sampling scheme, for generating realistic training sets with nonlinear dynamics that faithfully model clean DCE-MRI data and accounts for spatial similarities. The proposed approach has been successfully applied to full and even temporally down-sampled DCE-MRI sequences, from two different databases, of stroke and brain tumor patients, and is shown to favorably compare to state-of-the-art denoising methods.

We consider the problem of hidden common manifold extraction from multiple data sets, which have observation-specific distortions and artifacts. A new manifold learning method is presented based on alternating products of diffusion operators and local kernels. We provide theoretical analysis showing that our method is able to build a variant of the Laplacian of the hidden common manifold, while suppressing the observation-specific artifacts. The generality of this method is demonstrated in data analysis applications, where different types of devices are used to measure the same activity. In particular, we present applications to problems in biomedicine, neuroscience, and audio analysis.

This is joint work with Roy Lederman and Hau-tieng Wu.

We prove the first super-logarithmic lower bounds on the cell probe complexity of dynamic *boolean* (a.k.a. decision) data structure problems, a long-standing milestone in data structure lower bounds. We introduce a new technique and use it to prove a ~ log^{1.5}(n) lower bound on the operational time of a wide range of boolean data structure problems, most notably, on the query time of dynamic range counting *over F_2* ([Patrascu07]). Proving a super-logarithmic lower bound for this problem was explicitly posed as one of five important open problems in the late Mihai Patrascu's obituary [Tho13]. This result also implies the first super-logarithmic lower bound for the classical 2D range counting problem,one of the most fundamental data structure problems in computational geometry and spatial databases. We derive similar lower bounds for boolean versions of dynamic polynomial evaluation and 2D "rectangle stabbing", and for the (non-boolean) problems of range selection and range median. Our technical centerpiece is a new way of "weakly" simulating dynamic data structures using efficient *one-way* communication protocols with small advantage over random guessing. This simulation involves a surprising excursion to low-degree (Chebychev) polynomials which may be of independent interest, and offers an entirely new algorithmic angle on the "cell sampling" method of Panigrahy et al. [PTW10].

Joint work with Kasper Green-Larsen and Huacheng Yu.

We study the classic bipartite matching problem in the online setting, first introduced in the seminal work of Karp, Vazirani and Vazirani. Specifically, we consider the problem for the well-studied class of regular graphs. Matching in this class of graphs was studied extensively in the offline setting. In the online setting, an optimal deterministic algorithm, as well as efficient algorithms under stochastic input assumptions were known. In this work, we present a novel randomized algorithm with competitive ratio tending to one on this family of graphs, under adversarial arrival order. Our main contribution is a novel algorithm which achieves competitive ratio 1-O(\sqrt{\log d}/\sqrt{d}) in expectation on d-regular graphs. In contrast, we show that all previously-known online algorithms, such as the generally worst-case optimal ranking algorithm of Karp et al., are restricted to a competitive ratio strictly bounded away from one, even as d grows. Moreover, we show the convergence rate of our algorithm's competitive ratio to one is nearly tight, as no algorithm achieves competitive ratio better than 1-O(1/\sqrt{d}). Finally, we show that our algorithm yields a similar competitive ratio with high probability, as well as guaranteeing each offline vertex a probability of being matched tending to one.

The recent success of convolutional neural networks (CNNs) for image processing tasks is inspiring research efforts attempting to achieve similar success for geometric tasks. One of the main challenges in applying CNNs to surfaces is defining a natural convolution operator on surfaces. In this paper we present a method for applying deep learning to sphere-type shapes using a global seamless parameterization to a planar flat-torus, for which the convolution operator is well defined. As a result, the standard deep learning framework can be readily applied for learning semantic, high-level properties of the shape. An indication of our success in bridging the gap between images and surfaces is the fact that our algorithm succeeds in learning semantic information from an input of raw low-dimensional feature vectors.

We demonstrate the usefulness of our approach by presenting two applications: human body segmentation, and automatic landmark detection on anatomical surfaces. We show that our algorithm compares favorably with competing geometric deep-learning algorithms for segmentation tasks, and is able to produce meaningful correspondences on anatomical surfaces where hand-crafted features are bound to fail.

Joint work with: Meirav Galun, Noam Aigerman, Miri Trope, Nadav Dym, Ersin Yumer, Vladimir G. Kim and Yaron Lipman.

The past five years have seen a dramatic increase in the performance of recognition systems due to the introduction of deep architectures for feature learning and classification. However, the mathematical reasons for this success remain elusive. In this talk we will briefly survey some existing theory of deep learning. In particular, we will focus on data structure based theory and discuss two recent developments.

The first work studies the generalization error of deep neural network. We will show how the generalization error of deep networks can be bounded via their classification margin. We will also discuss the implications of our results for the regularization of the networks. For example, the popular weight decay regularization guarantees the margin preservation, but it leads to a loose bound to the classification margin. We show that a better regularization strategy can be obtained by directly controlling the properties of the network's Jacobian matrix.

The second work focuses on solving minimization problems with neural networks. Relying on recent recovery techniques developed for settings in which the desired signal belongs to some low-dimensional set, we show that using a coarse estimate of this set leads to faster convergence of certain iterative algorithms with an error related to the accuracy of the set approximation. Our theory ties to recent advances in sparse recovery, compressed sensing and deep learning. In particular, it provides an explanation for the successful approximation of the ISTA (iterative shrinkage and thresholding algorithm) solution by neural networks with layers representing iterations.

Joint work with Guillermo Sapiro, Miguel Rodrigues, Jure Sokolic, Alex Bronstein and Yonina Eldar.

Computational social choice deals with algorithms for aggregating individual preferences or opinions towards collective decisions. AI researchers (including myself) have long argued that such algorithms could play a crucial role in the design and implementation of multiagent systems. However, in the last few years I have come to realize that the "killer app" of computational social choice is helping people -- not software agents -- make joint decisions. I will illustrate this theme through two recent endeavors: Spliddit.org, a website that offers provably fair solutions to everyday problems; and Robovote.org, which provides optimization-driven voting methods.

Throughout the talk, I will devote special attention to the theoretical foundations and results that make these services possible.

The theory of hypergeometric functions with matrix argument was developed in the 1950s by S. Bochener for Hermitian matrices, and by C. Herz for symmetric matrices. This theory admits a common generalization to the setting of symmetric cones, which is discussed in the book by Faraut-Koranyi. It also has applications to the study of non-central distributions in statistics and to the theory of random matrices.

In the 1980s, I.G. Macdonald introduced a one parameter family of multivariate hypergeometric functions, which, for special values of the parameter, are the *radial* parts of the matrix hypergeometric functions. He also formulated a number of natural conjectures about these functions, which in the matrix case can be proved by appropriate integral formulas. However this technique is unavailable in the general setting and as a result these conjectures have remained open.

In recent work with G. Olafsson we have solved most of these conjectures, using ideas from the theory of Cherednik algebras and Jack polynomials. Among other results we obtain sharp estimates for the exponential kernel that allow us to establish a rigorous theory of the Fourier and Laplace transforms, and we prove an explicit formula for the Laplace transform of a Jack polynomial conjectured by Macdonald. This opens the door for several future developments in the associated harmonic analysis, some of which we also treat. This includes (1) the Paley-Wiener theorem, (2) Laplace transform identities for hypergeometric functions, and (3) the "so-called" Ramanujan master theorem.

**Note the unusual room [De Picciotto ****Building, Room 25]**

We show that there exist binary locally testable codes (for all rates) and locally correctable codes (for low rates) with rate and distance approaching the Gilbert-Varshamov bound (which is the best rate-distance tradeoff known for general binary error-correcting codes). Our constructions use a number of ingredients: Thommesen's random concatenation technique, the Guruswami-Sudan-Indyk strategy for list-decoding concatenated codes, the Alon-Edmonds-Luby distance amplification method, and the local list-decodability and local testability of Reed-Muller codes. Interestingly, this seems to be the first time that local testability is used in the construction of locally correctable codes.

Joint work with Sivakanth Gopi, Rafael Oliveira, Noga Ron-Zewi and Shubhangi Saraf

This is a joint work with Bezrukavnikov and Kazhdan. The goal of my talk is to give an explicit formula for the Bernstein projector to representations of depth $\leq r$. As a consequence, we show that the depth zero Bernstein projector is supported on topologically unipotent elements and it is equal to the restriction of the character of the Steinberg representation. As another application, we deduce that the depth $r$ Bernstein projector is stable. Moreover, for integral depths our proof is purely local.

Image denoising is the most fundamental problem in image enhancement, and it is largely solved: It has reached impressive heights in performance and quality -- almost as good as it can ever get. But interestingly, it turns out that we can solve many other problems using the image denoising "engine". I will describe the Regularization by Denoising (RED) framework: using the denoising engine in defining the regularization of any inverse problem. The idea is to define an explicit image-adaptive regularization functional directly using a high performance denoiser. Surprisingly, the resulting regularizer is guaranteed to be convex, and the overall objective functional is explicit, clear and well-defined. With complete flexibility to choose the iterative optimization procedure for minimizing this functional, RED is capable of incorporating any image denoising algorithm as a regularizer, treat general inverse problems very effectively, and is guaranteed to converge to the globally optimal result.

* Joint work with Peyman Milanfar (Google Research) and Yaniv Romano (EE-Technion).

We propose a procedure (the first of its kind) for computing a fully data-dependent interval that traps the mixing time t_mix of a finite reversible ergodic Markov chain at a prescribed confidence level. The interval is computed from a single finite-length sample path from the Markov chain, and does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge.

The interval is constructed around the relaxation time t_relax, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a sqrt{n} rate, where n is the length of the sample path. Upper and lower bounds are given on the number of samples required to achieve constant-factor multiplicative accuracy. The lower bounds indicate that, unless further restrictions are placed on the chain, no procedure can achieve this accuracy level before seeing each state at least \Omega(t_relax) times on the average. Future directions of research are identified. Time permitting, we will mention some recent further developments by D. Levin and Y. Peres.

Joint work with Daniel Hsu and Csaba Szepesvari.

Graphs that are prevalent in current applications (the Internet, Facebook etc.) are not only very large and highly dynamic, but also distributed between many servers, none of which sees the graph in its entirety. The distributed monitoring problem deals with the question of imposing conditions on the local graphs, such that as long as they hold, it is guaranteed that some desired property holds for the global graph.

While defining local conditions for linear properties (e.g. average degree) is relatively easy, they are more difficult to derive for non-linear functions over the graph. We propose a solution and a general definition of solution optimality, and demonstrate how to apply it to two important graph properties -- spectral gap and number of triangles. We also define an absolute lower bound on the communication overhead for distributed monitoring, and compare our algorithm to it, with good results. Performance improves as the graph becomes larger and denser -- that is, when distributing it is more important.

I will talk about my recent adventures with ants. Together with biologists we study P. longicornis ants as they collaboratively transport a large food item to their nest. This collective navigation process is guided by pheromones which are laid by individual ants. Using a new methodology to detect scent marks, we identify a new kind of ant trail characterized by very short and dynamic pheromone markings and highly stochastic navigation response to them. We argue that such a trail can be highly beneficial in conditions in which knowledge of individual ants regarding the underlying topological structure is unreliable. This gives rise to a new theoretical search model on graphs under unreliable guiding instructions, which is of independent computational interest. To illustrate the model, imagine driving a car in an unknown country that is in the aftermath of a major hurricane which has randomly flipped a certain small fraction of the road-signs. Under such conditions of unreliability, how can you still reach your destination fast? I will discuss the limits of unreliability that allow for efficient navigation. In trees, for example, there is a phase transition phenomenon that occurs roughly around the inverse of the square root of the maximal degree. That is, if noise is above this threshold then any algorithm cannot avoid finding the target in exponential time (in the original distance), while below the threshold we identify an optimal, almost linear, walking algorithm. Finally, I will discuss algorithms that under such a noisy model aim to minimize the number of queries to find a target (rather than the number of moves).

This talk is based on joint works with biologists from the Weizmann Institute: Ofer Feinerman, Udi Fonio, and others, and with CS researchers: Lucas Bockowski, Adrian Kosowski, and Yoav Rodeh.

By analytical and numerical studies of Deep Neural Networks (using standard TensorFlow) in the "Information Plane" - the Mutual Information the network layers preserve on the input and the output variables - we obtain the following new insights.

- The training epochs, for each layer, are divided into two phases: (1) fitting the training data - increasing the mutual information on the labels; (2) compressing the representation - reducing the mutual information on the inputs. The layers are learnt hierarchically, from the bottom to the top layer, with some overlaps.
- Most (~80%) of the training time - optimization with SGD - is spent on compressing the representation (the second phase) - NOT on fitting the training data labels, even when the training has no regularization or terms that directly aim at such compression.
- The convergence point, FOR EVERY HIDDEN LAYER, lies on or very close to the Information Bottleneck IB) theoretical bound. Thus, the mappings from the input to the hidden layer and from the hidden layer to the output obey the IB self-consistent equations for some value of the compression-prediction tradeoff.
- The main benefit of adding more hidden layers is in the optimization/training time, as the compression phase for each layer amounts to relaxation to a Maximum conditional Entropy state, subject to the proper constraints on the error/information on the labels. As such relaxation takes super-linear time in the compressed entropy, adding more hidden layers dramatically reduces the training time. There is also benefit in sample complexity to adding hidden layers, but this is a smaller effect.

I will explain these new observations and the benefits of exploring Deep Learning in the "Information Plane", and discuss some of the exciting theoretical and practical consequences of our analysis.

Joint work with Ravid Ziv and Noga Zaslavsky.

We present a novel approach to template matching that is efficient, can handle partial occlusions, and is equipped with provable performance guarantees. A key component of the method is a reduction that transforms the problem of searching a nearest neighbor among N high-dimensional vectors, to searching neighbors among two sets of order sqrt(N) vectors, which can be done efficiently using range search techniques. This allows for a quadratic improvement in search complexity, that makes the method scalable when large search spaces are involved.

For handling partial occlusions, we develop a hashing scheme based on consensus set maximization within the range search component. The resulting scheme can be seen as a randomized hypothesize-and-test algorithm, that comes with guarantees regarding the number of iterations required for obtaining an optimal solution with high probability.

The predicted matching rates are validated empirically and the proposed algorithm shows a significant improvement over the state-of-the-art in both speed and robustness to occlusions.

Joint work with Stefano Soatto.

While there has been a lot of progress in designing efficient custom protocols for computing Private Set Intersection (PSI), there has been less research on using generic MPC protocols for this task. However, there are many variants of the set intersection functionality which seem hard to compute with existing custom protocols and are easy to compute with generic MPC based solutions (for example comparing the cardinality of the intersection with a threshold or measuring ad conversion rates). Generic protocols work over circuits which compute the intersection. For sets of size n the best known circuit constructions compute O(n log n) comparisons. In this work we propose new circuit-based protocols for computing variants of the intersection, with circuits computing only O(n) comparisons. Our constructions are based on a new variant of Cuckoo hashing in two dimensions. We employ several optimizations and determine experimentally the required sizes of tables and circuits, and measure the runtime, showing that our protocol is more efficient in concrete terms than existing constructions. The proof technique is new and can be generalized to analyzing simple Cuckoo hashing as well as new variants.

We study the ecological use of analogies in AI. Specifically, we address the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given representation function f, which accepts inputs in either domains, would remain unchanged. Other than f, the training data is unsupervised and consist of a set of samples from each domain, without any mapping between them. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f preserving component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity.

Joint work with Yaniv Taigman and Adam Polyak

Joint work with Srikanth Srinivasan.

A determinantal point process governed by a locally trace class Hermitian contraction kernel on a measure space $E$ remains determinantal when conditioned on its configuration on an arbitrary measurable subset $B \subset E$. Moreover, the conditional kernel can be chosen canonically in a way that is "local" in a non-commutative sense, i.e. invariant under "restriction" to closed subspaces $L^2(B) \subset P \subset L^2(E)$.

Using the properties of the canonical conditional kernel we establish a conjecture of Lyons and Peres: if $K$ is a projection then almost surely all functions in its image can be recovered by sampling at the points of the process.

I will discuss the asymptotic behaviour (both on and off the diagonal) of the spectral function of a Schroedinger operator with smooth bounded potential when energy becomes large. I formulate the conjecture that the local density of states (i.e. the spectral function on the diagonal) admits the complete asymptotic expansion and discuss the known results, mostly for almost-periodic potentials.

Let X be a regular scheme, projective and flat over Spec Z. We give a conjectural formula in terms of motivic cohomology, singular cohomology and de Rham cohomology for the special value of the zeta-function of X at any rational integer. We will explain how this reduces to the standard formula for the residue of the Dedekind zeta-function at s = 1.

A classical theorem by Anosov states that the slow motion of a slow-fast system where the fast subsystem is ergodic with respect to a smooth invariant measure can be approximated, in a well-defined sense, by the slow subsystem averaged over the fast variables. We address the question of what happens if the fast system is not ergodic. We discuss a theory which is developing in joint works with V. Gelfreich, T. Pereira, V. Rom-Kedar and K. Shah, and suggest that in the non-ergodic case the behavior of the slow variables is approximated by a random process, and not a single, deterministic averaged system. We also discuss the question of the relevance of ergodicity to the foundations of statistical mechanics.

Image processing algorithms often involve a data fidelity penalty, which encourages the solution to comply with the input data. Existing fidelity measures (including perceptual ones) are very sensitive to slight misalignments in the locations and shapes of objects. This is in sharp contrast to the human visual system, which is typically indifferent to such variations. In this work, we propose a new error measure, which is insensitive to small smooth deformations and is very simple to incorporate into existing algorithms. We demonstrate our approach in lossy image compression. As we show, optimal encoding under our criterion boils down to determining how to best deform the input image so as to make it "more compressible". Surprisingly, it turns out that very minor deformations (almost imperceptible in some cases) suffice to make a huge visual difference in methods like JPEG and JPEG2000. Thus, by slightly sacrificing geometric integrity, we gain a significant improvement in preservation of visual information.

We also show how our approach can be used to visualize image priors. This is done by determining how images should be deformed so as to best conform to any given image model. By doing so, we highlight the elementary geometric structures to which the prior resonates. Using this method, we reveal interesting behaviors of popular priors, which were not noticed in the past.

Finally, we illustrate how deforming images to possess desired properties can be used for image "idealization" and for detecting deviations from perfect regularity.

Joint work with Tamar Rott Shaham, Tali Dekel, Michal Irani, and Bill Freeman.

Given a quadratic form Q in d variables over the integers, e.g. Q(x,y,z) = xy - z^2, and a set of positive density E in Z^d, we investigate what kind of structure can be found in the set Q(E-E).

We will see that if d >= 3, and Q is indefinite, then the measure rigidity, due to Bourgain-Furman-Lindenstrauss-Mozes or Benoist-Quint, of the action of the group of the symmetries of Q implies that there exists k >=1 such that k^2*Q(Z^d) is a subset of Q(E-E).

We will give an alternative proof of the theorem for the case Q(x,y,z) = xy - z^2 that uses more classical equidistribution results of Vinogradov, and Weyl, as well as a more recent result by Frantzikinakis-Kra. The latter proof extends the theorem to other polynomials having a much smaller group of symmetries. Based on joint works with M. Bjorklund (Chalmers), and K. Bulinski (Sydney).

The algorithm referred to in the title builds on Luks's powerful group-theoretic divide-and-conquer method (1980) and addresses the bottleneck situation where Luks's method fails to "divide".

Luks will continue to "conquer" if an alternative method "divides"; we develop such a partitioning technique.

In the talk we shall outline the algorithm and explain in some detail its group theoretic core, the "Unaffected Stabilizers Lemma" and the "Local Certificates" routine. The Lemma is used to construct, somewhat implausibly, global automorphisms out of local information -- a key step toward the construction of combinatorial structures to which the partitioning method from the previous day's lecture will be applied, providing the required "divide" step.

Symmetry is defined in terms of structure-preserving transformations (automorphisms); regularity in terms of numerical invariants. Symmetry always implies regularity but there are many highly regular combinatorial objects (such as "strongly regular graphs") with no symmetry. The opposite of irregularity is regularity, not symmetry. Yet we show that in a well-defined sense, *the opposite of hidden irregularity is hidden symmetry*, and in fact hidden symmetry of a particularly robust kind.

The symmetry of a circle is easily destroyed: just "individualize" two non-opposite points -- color one of them red, the other blue -- and all the symmetry is gone. In fact, the resulting structure is completely irregular: every point is uniquely identified by a pair of numerical invariants, namely, its pair of distances to the two individualized points. We shall say that the circle has a high degree of hidden irregularity.

In contrast, *Johnson graphs* are objects with robust symmetry: individualizing a small number of vertices of a Johnson graph hardly makes a dent in its symmetry.

Recent work on the algorithmic problem of Graph Isomorphism has revealed that Johnson graphs are unique in this regard: Every finite relational structure of small arity either has a measurable (say 10%) hidden irregularity (revealed by individualizing a polylogarithmic number of elements) or has a large degree of hidden symmetry, manifested in a canonically embedded Johnson graph on more than 90% of the underlying set.

This dichotomy is the key Divide-and-Conquer tool in recent progress on the worst-case complexity of the Graph Isomorphism problem.

This subject is purely combinatorial and does not require advanced mathematical apparatus. The group theoretic aspects of the new Graph Isomorphism test will be discussed in a follow-up seminar on January 30.

Within the wide field of sparse approximation, convolutional sparse coding (CSC) has gained increasing attention in recent years. This model assumes a structured-dictionary built as a union of banded Circulant matrices. Most attention has been devoted to the practical side of CSC, proposing efficient algorithms for the pursuit problem, and identifying applications that benefit from this model. Interestingly, a systematic theoretical understanding of CSC seems to have been left aside, with the assumption that the existing classical results are sufficient.

In this talk we start by presenting a novel analysis of the CSC model and its associated pursuit. Our study is based on the observation that while being global, this model can be characterized and analyzed locally. We show that uniqueness of the representation, its stability with respect to noise, and successful greedy or convex recovery are all guaranteed assuming that the underlying representation is locally sparse. These new results are much stronger and informative, compared to those obtained by deploying the classical sparse theory.

Armed with these new insights, we proceed by proposing a multi-layer extension of this model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers. This, in turn, is shown to be tightly connected to Convolutional Neural Networks (CNN), so much so that the forward-pass of the CNN is in fact the Thresholding pursuit serving the ML-CSC model. This connection brings a fresh view to CNN, as we are able to attribute to this architecture theoretical claims such as uniqueness of the representations throughout the network, and their stable estimation, all guaranteed under simple local sparsity conditions. Lastly, identifying the weaknesses in the above scheme, we propose an alternative to the forward-pass algorithm, which is both tightly connected to deconvolutional and recurrent neural networks, and has better theoretical guarantees.

Many problems about finite groups (e.g., convergence of random walks, properties of word maps, spectrum of Cayley graphs, etc.) can be approached in terms of sums of group characters. More precisely, what intervenes in such sums are the character ratios:

X_r(g) / dim(r), g in G,

where r is an irreducible representation of G, and X_r is its character. This leads to the quest for good estimates on the character ratios.

In this talk I will introduce a precise notion of "**size**" for representations of finite classical groups and show that it tends to put together those with character ratios of the same order of magnitude.

As an application I will show how one might generalize to classical groups the following result of Diaconis-Shahshahani (for k=2) and Berestycki -Schramm -Zeitouni (for general k): The mixing time for the random walk on the group G=S_n using the cycles of length k is (1/k) n log(n).

The talk should be accessible for beginning graduate students, and is part from our joint project with **Roger Howe (Yale and Texas A&M).**

Detecting genetic adaptation in recent human history is a major challenge of population genetics. The fundamental problem is to infer historical changes in the frequency of genetic variants (alleles), from data of contemporary genomes. With this we can identify unusual changes that are unlikely to have occurred in the absence of selective pressures. However, a generally applicable method to infer recent allele frequency changes is lacking. Instead, present methods can only detect frequency changes under very restrictive assumptions on the model of selection. Moreover, their time resolution is generally limited to prehistoric scales, on the order of the past 25-75 thousand years. To address these gaps we developed a novel statistical method, Singleton Density Score (SDS), that infers the recent changes in allele frequencies from local variation in contemporary genome sequences with specificity to historical timescales. Applied to data of ~3000 genomes from the UK10K project, SDS reveals that human genetic adaptation continued well into historical times. Over the past ~2000-3000 years, ancestors of modern Britons genetically evolved over a range of phenotypes related to diet, immunity, and physical appearance. Notably, we found that polygenic adaptation, whereby selection acting on many small-effect variants across the genome that together determine a single trait, has played a pervasive, previously undetected role in shaping present genotypic and phenotypic variation.

Reference:

Field et al, Science 2016, Detection of human adaptation during the past 2000 years. https://www.ncbi.nlm.nih.gov/pubmed/27738015

I will describe two branches of my work related to algorithms for distributed networks. The main focus will be devoted for Fault-Tolerant (FT) Network Structures.

The undisrupted operation of structures and services is a crucial requirement in modern day communication networks. As the vertices and edges of the network may occasionally fail or malfunction, it is desirable to make those structures robust against failures.

FT Network Structures are low cost highly resilient structures, constructed on top of a given network, that satisfy certain desirable performance requirements concerning, e.g., connectivity, distance or capacity. We will overview some results on fault tolerant graph structures with a special focus on FT Breadth-First-Search.

The second part of the talk will discuss distributed models and algorithms for large-scale networks. Towards the end, we will see some connections between distributed computing and other areas such as EE and Biology.

Let M be a smooth, compact, connected two-dimensional, Riemannian manifold without boundary, and let C_epsilon be the amount of time needed for the Brownian motion to come within (Riemannian) distance epsilon of all points in M. The first order asymptotics of C_epsilon as epsilon goes to 0 are known. We show that for the two dimensional sphere

\sqrt{C_epsilon}-2\sqrt{2}\( \log \epsilon^{-1}- \frac{1}{4}\log\log \epsilon^{-1}\) is tight.

Joint work with David Belius and Ofer Zeitouni.

**First Speaker: Ran Tessler (ETH)**

Time: 11:00

Title: A sharp threshold for Hamiltonian spheres in a random 2-complex.

Abstract: We define the notion of Hamiltonian sphere - a 2-complex homeomorphic to a sphere which uses all vertices. We prove an explicit sharp threshold for the appearance of Hamiltonian spheres in the Linial-Meshulam model for random 2-complexes. The proof combines combinatorial, probabilistic and geometric arguments. Based on a joint work with Zur Luria.

**Second Speaker: Assaf Naor (Princeton)**

Time: 12:00

Title: A new vertical-versus-horizontal isoperimetric inequality on the Heisenberg group, with applications to metric geometry and approximation algorithms

Abstract: In this talk we will show that for every measurable subset of the Heisenberg group of dimension at least 5, an appropriately defined notion of its "vertical perimeter" is at most a constant multiple of its horizontal (Heisenberg) perimeter. We will explain how this new isoperimetric-type inequality solves open questions in analysis (an endpoint estimate for a certain singular integral on W^{1,1}), metric geometry (sharp nonembeddability into L_1) and approximation algorithms (asymptotic evaluation of the performance of the Goemans-Linial algorithm for the Sparsest Cut problem). Joint work with Robert Young.

Let X be a set definable in some o-minimal structure. The Pila-Wilkie theorem (in its basic form) states that the number of rational points in the transcendental part of X grows sub-polynomially with the height of the points. The Wilkie conjecture stipulates that for sets definable in $R_\exp$, one can sharpen this asymptotic to polylogarithmic.

I will describe a complex-analytic approach to the proof of the Pila-Wilkie theorem for subanalytic sets. I will then discuss how this approach leads to a proof of the "restricted Wilkie conjecture", where we replace $R_\exp$ by the structure generated by the restrictions of $\exp$ and $\sin$ to the unit interval (both parts are joint work with Dmitry Novikov). If time permits I will discuss possible generalizations and applications.

Pseudo-deterministic algorithms are randomized search algorithms that on different executions on the same input, output the same solution with high probability.

We will discuss how pseudo-deterministic algorithms bridge the gap between randomized search and decision problems for problems in P and in NC. Next, we will show a pseudo-deterministic NC algorithm for bipartite matching. Finally, we will show how pseudo-determinism can be used to save on random bits used by classical randomized algorithms, and apply the method to obtain an algorithm for RNC depth first search using only O(log^2 n) random bits. This is joint work with Shafi Goldwasser.

Consider the Gaussian Entire Function (GEF) whose Taylor coefficients are independent complex-valued Gaussian variables, and the variance of the k-th coefficient is 1/k!. This random Taylor series is distinguished by the invariance of its zero set with respect to the isometries of the complex plane.

I will show that the law of the zero set, conditioned on the GEF having no zeros in a disk of radius r, and properly normalized, converges to an explicit limiting Radon measure in the plane, as r goes to infinity. A remarkable feature of this limiting measure is the existence of a large 'forbidden region' between a singular part supported on the boundary of the (scaled) hole and the equilibrium measure far from the hole. This answers a question posed by Nazarov and Sodin, and is in stark contrast to the corresponding result known to hold in the random matrix setting, where such a gap does not appear.

The talk is based on a joint work with S. Ghosh.

The Operator Scaling problem asks whether a set of complex matrices can be jointly moved to a certain canonical (isotropic) position. This problem has a remarkable number of myriad incarnations: non-commutative algebra, invariant theory, arithmetic complexity, quantum information theory, analytic inequalities and more. We will describe an efficient algorithm solving all these related problems, and explain how their analysis combines ideas from all these areas.

Through these connections, the algorithm can be shown to solve some non-convex optimization problems, some systems of quadratic equations, and some linear programs with exponentially many inequalities - all these, and concrete examples we will give, suggest that it might be a powerful algorithmic tool via reductions to these problems.

No special background will be assumed!

Joint on two joint works with Ankit Garg, Leonid Gurvits and Rafael Olivera.

This talk is longer than usual and has a two-hour slot.

If f, g are two polynomials in C[x,y] such that J(f,g)=1, but C[f,g] does not coincide with C[x,y], then the mapping given by these polynomials ( (x,y) maps to (f(x,y), g(x,y)) ) has a rather unexpected property which will be discussed in the talk.

The proliferation of data collection in the health, commercial, and economic spheres, brings with it opportunities for extracting new knowledge with concrete policy implications. Examples include individualizing medical practices based on electronic healthcare records, and understanding the implications of job training programs on employment and income.

The scientific challenge lies in the fact that standard prediction models such as supervised machine learning are often not enough for decision making from this so-called "observational data": Supervised learning does not take into account causality, nor does it account for the feedback loops that arise when predictions are turned into actions. On the other hand, existing causal-inference methods are not adapted to dealing with the rich and complex data now available, and often focus on populations, as opposed to individual-level effects.

The problem is most closely related to reinforcement learning and bandit problems in machine learning, but with the important property of having no control over experiments and no direct access to the actor's model.

In my talk I will discuss how we apply recent ideas from machine learning to individual-level causal-inference and action. I will introduce a novel generalization bound for estimating individual-level treatment effect, and further show how we use representation learning and deep temporal generative models to create new algorithms geared towards this problem. Finally, I will show experimental results using data from electronic medical records and data from a job training program.

The sensitivity conjecture is a famous open problem in the theory of boolean functions. Let f be a boolean function defined on the hypercube. The sensitivity of a node x is the number of its neighbours in the hypercube, for which f give the opposite value as that it does on x. The sensitivity conjecture speculates that if all nodes have low sensitivity, then the function f must be simple. Concretely, all its Fourier mass is supported on levels with low hamming weight.

Recently, Gopalan et al [CCC 2016] conjectured a robust analogue of the sensitivity conjecture: if most of the nodes have low sensitivity, then most of the Fourier mass is supported on levels with low hamming weight. They also prove it under the stronger assumption that all nodes have low sensitivity. In this work, we prove this conjecture, with near tight quantitative bounds.

Joint work with Avishay Tal (IAS) and Jiapeng Zhang (UCSD).

Given a dynamical system, a partition of the space induces a mapping to the space of sequences of the partition elements (a point is mapped to the partition elements containing its orbit terms). Such a duality is called Symbolic Dynamics, Markov partitions are an important tool, as the symbolic dynamics they induce enfold many of the important dynamical properties of the original system, and they allow an easier studying of them.

We show that general non uniformly hyperbolic C^{1+epsilon} diffeomorphism on compact manifolds of any dimension admit countable Markov partitions. Previously this was only known in dimension 2.

We study an online learning framework introduced by Mannor and Shamir (2011) in which the feedback is specified by a graph, in a setting where the graph may vary from round to round and is \emph{never fully revealed} to the learner. We show a large gap between the adversarial and the stochastic cases. In the adversarial case, we prove that even for dense feedback graphs, the learner cannot improve upon a trivial regret bound obtained by ignoring any additional feedback besides her own loss. In contrast, in the stochastic case we give an algorithm that achieves $\widetilde \Theta(\sqrt{\alpha T})$ regret over $T$ rounds, provided that the independence numbers of the hidden feedback graphs are at most $\alpha$. completely unlearnable. We also extend our results to a more general feedback model, in which the learner does not necessarily observe her own loss, and show that, even in simple cases, concealing the feedback graphs might render the problem unlearnable.

Cauchy-Riemann maps (shortly: CR-maps) occur in complex analysis as boundary values of maps holomorphic in a domain in complex space. As a rule, CR-mappings of real-analytic hypersurfaces appear to be analytic as well. However, we recently showed in a joint work with Rasul Shafikov the existence of Stokes Phenomenon in CR-geometry: there exist real-analytic hypersurfaces, which are equivalent formally, but not holomorphically.

Despite of this, it appears that in complex dimension 2, CR-maps necessarily posses appropriate weaker regularity properties. Namely, components of such maps necessarily belong to the well known Gevrey classes. The latter statement has the following remarkable application: if two real-analytic hypersurfaces in complex two-space are equivalent formally, then they are also equivalent smoothly.

The proof of all these facts employs the recent multi-summability theory in Dynamical Systems. It as well employs the recent CR-DS technique that we developed, which connects CR-manifolds and certain Dynamical Systems. In this talk, I will outline the technique, as well as some details of the proof.

Liouville's rigidity theorem (1850) states that a map $f:\Omega\subset R^d \to R^d$ that satisfies $Df \in SO(d)$ is an affine map. Reshetnyak (1967) generalized this result and showed that if a sequence $f_n$ satisfies $Df_n \to SO(d)$ in $L^p$, then $f_n$ converges to an affine map.

In this talk I will discuss generalizations of these theorems to mappings between manifolds, present some open questions, and describe how these rigidity questions arise in the theory of elasticity of pre-stressed materials (non-Euclidean elasticity).

If time permits, I will sketch the main ideas of the proof, using Young measures and harmonic analysis techniques, adapted to Riemannian settings.

Based on a joint work with Asaf Shachar and Raz Kupferman.

We show an efficient method for converting a logic circuit of gates with fan-out 1 into an equivalent circuit that works even if some fraction of its gates are short-circuited, i.e., their output is short-circuited to one of their inputs. Our conversion can be applied to any circuit with fan-in k>= 2, yielding a resilient circuit whose size is polynomial in the size of the (non-resilient) input circuit.

The resilient circuit gives the correct output as long as less than 1/3 of the gates in any of its input-to-output paths are corrupted. Furthermore, we prove that a resilience level of 1/3 is optimal (maximal) for this type of faulty gates. This fully answers an open question by Kalai et al. (FOCS 2012).

Joint work with Mark Braverman and Michael Yitayew.

Everyone wants to program with "high-level concepts", rather than meddle with the fine details of the implementation, such as pointers, network packets, and asynchronous callbacks. This is usually achieved by introducing layers of abstraction - but every layer incurs some overhead, and when they accumulate, this overhand becomes significant and sometimes prohibitive. Optimizing the program often requires to break abstractions, which leads to suboptimal design, higher maintenance costs, and subtle hard-to-trace bugs.

I will present two recent projects that attempt to address this difficulty. STNG is an automated lifting compiler that can synthesize high-level graphics code for computing stencils over matrices, from low-level legacy code written in Fortran. Its output is expressed in Halide, a domain-specific language for image processing that can take advantage of modern GPUs. The result is therefore code that is both easier to understand and more efficient than the original code.

Bellmania is a specialized tool for accelerating dynamic-programming algorithms by generating recursive divide-and-conquer implementations of them. Recursive divide-and-conquer is an algorithmic technique that was developed to obtain better memory locality properties. Bellmania includes a high-level language for specifying dynamic programming algorithms and a calculus that facilitates gradual transformation of these specifications into efficient implementations. These transformations formalize the divide-and-conquer technique; a visualization interface helps users to interactively guide the process, while an SMT-based back-end verifies each step and takes care of low-level reasoning required for parallelism.

The lesson is that synthesis techniques are becoming mature enough to play a role in the design and implementation of realistic software systems, by combining the elegance of abstractions with the performance gained by optimizing and tuning the fine details.

We discuss recent progress on hardness of 2-to-2 games, with applications to the inapproximability of the Vertex-Cover problem.

A 2-to-2 game (which is a variant of Khot's well known unique games), is defined by a graph where there is a variable in each node, and a constraint of a specific structure defined on each edge. While in unique games each edge- constraint must be a one-to-one correspondence -- i.e. for each assignment to one node there is exactly one assignent to the other node that satisfies the constraint -- in 2-to-2 games the correspondence has a "two-to-two" structure.

The goal is to distinguish between instances in which almost all of the edge- constraints can be satisfied, and instances in which almost none of them can be satisfied simultaneously.

We present a new combinatorial hypothesis regarding Grassmann graphs, and show that it implies that 2-to-2 games are NP-hard *in a certain sense*. As a consequence, the hypothesis implies that it is NP-hard to distinguish between graphs that have an independent set of fractional size (1- 1/sqrt{2}), and graphs with no independent sets of any constant fractional size. This easily implies that it is NP-hard to approximate the Vertex Cover problem within a factor \sqrt{2} - o(1).

The talk is mostly based on a joint work with Subhash Khot and Muli Safra, nevertheless, we will also mention results from a more recent extension, which is a joint work with Irit Dinur, Subhash Khot, Guy Kindler and Muli Safra.

Selfadjoint extensions of a closed symmetric operator A in a Hilbert space with equal deficiency indices were described by in the 30's by J. von Neumann. Another approach, based on the notion of abstract boundary triple originates in the work of J.W. Calkin and was developed by M. I. Visik, G. Grubb, F. S. Rofe-Beketov, M. L. Gorbachuck, A .N. Kochubei and others.

By Calkin's approach, all selfadjoint extensions of the symmetric operator A can be parametrized via "multivalued" selfadjoint operators in an auxiliary Hilbert space. Spectral properties of these extensions can be characterized in terms of the abstract Weyl function, associated to the boundary triple. In the present talk some recent developments in the theory of boundary triples will be presented. Applications to boundary value problems for Laplacian operators in bounded domains with smooth and rough boundaries will be discussed.

We characterize the communication complexity of truthful mechanisms. Our departure point is the well known taxation principle. The taxation principle asserts that every truthful mechanism can be interpreted as follows: every player is presented with a menu that consists of a price for each bundle (the prices depend only on the valuations of the other players). Each player is allocated a bundle that maximizes his profit according to this menu. We define the taxation complexity of a truthful mechanism to be the logarithm of the maximum number of menus that may be presented to a player.

Our main finding is that in general the taxation complexity essentially equals the communication complexity. The proof consists of two main steps. First, we prove that for rich enough domains the taxation complexity is at most the communication complexity. We then show that the taxation complexity is much smaller than the communication complexity only in "pathological" cases and provide a formal description of these extreme cases.

Our approach yields several applications, including strengthening the solution concept with low communication overhead, fast computation of prices, and hardness of approximation by computationally efficient truthful mechanisms.

In the classical Node-Disjoint Paths (NDP) problem, the input consists of an undirected n-vertex graph G, and a collection M of pairs of its vertices, called source-destination, or demand, pairs. The goal is to route the largest possible number of the demand pairs via node-disjoint paths. The best current approximation for the problem is achieved by a simple greedy algorithm, whose approximation factor is O(\sqrt n), while the best current negative result is a roughly \Omega(log^{1/2}n)-hardness of approximation. Even seemingly simple special cases of the problem are still poorly understood: when the input graph is a grid, the best current algorithm achieves a \tilde{O}(n^{1/4})- approximation, and when it is a general planar graph, the best current approximation ratio of an efficient algorithm is \tilde{O}(n^{9/19}). The best currently known lower bound for both these versions of the problem is APX- hardness.

In this talk we will show that NDP is 2^{\Omega(\log n)}-hard to approximate, unless all problems in NP have algorithms with running time n^{O(\log n)}. Our result holds even when the underlying graph is a planar graph with maximum vertex degree 3, and all source vertices lie on the boundary of a single face. We extend this result to the closely related Edge-Disjoint Paths problem, showing the same hardness of approximation ratio even for sub-cubic planar graphs with all sources lying on the boundary of a single face.

This is joint work with David H.K. Kim and Rachit Nimavat.

One of the most common tasks in cryptography and cryptanalysis is to find some interesting event (a needle) in an exponentially large collection (haystack) of N=2^n possible events, or to demonstrate that no such event is likely to exist. In particular, we are interested in finding needles which are defined as events that happen with an unusually high probability of p>>1/N in a haystack which is an almost uniform distribution on N possible events. When the search algorithm can only sample values from this distribution, the best known time/memory tradeoff for finding such an event requires O(1/Mp^2) time given O(M) memory.

In this talk I will describe much faster needle searching algorithms in the common cryptographic setting in which the distribution is defined by applying some deterministic function f to random inputs. Such a distribution can be modeled by a random directed graph with N vertices in which almost all the vertices have O(1) predecessors while the vertex we are looking for has an unusually large number of O(pN) predecessors. When we are given only a constant amount of memory, we propose a new search methodology which we call **NestedRho**. As p increases, such random graphs undergo several subtle phase transitions, and thus the log-log dependence of the time complexity T on p becomes a piecewise linear curve which bends four times. Our new algorithm is faster than the O(1/p^2) time complexity of the best previous algorithm in the full range of 1/N < p < 1 , and in particular it improves the previous time complexity by a significant factor of \sqrt{N} for any p in the range N^(- 0.75) < p < N^(-0.5). When we are given more memory, we show how to combine the **NestedRho** technique with the parallel collision search technique in order to further reduce its time complexity. Finally, we show how to apply our new search technique to more complicated distributions with multiple peaks when we want to find all the peaks whose probabilities are higher than p.

Joint work with Itai Dinur, Orr Dunkelman and Nathan Keller.

Projected gradient descent (PGD), and its close variants, are often considered the methods of choice for solving a large variety of machine learning optimization problems, including empirical risk minimization, statistical learning, and online convex optimization. This is not surprising, since PGD is often optimal in a very appealing information-theoretic sense. However, for many problems PGD is infeasible both in theory and practice since each step requires to compute an orthogonal projection onto the feasible set. In many important cases, such as when the feasible set is a non-trivial polytope, or a convex surrogate for a low-rank structure, computing the projection is computationally inefficient in high-dimensional settings. An alternative is the conditional gradient method (CG), aka Frank-Wolfe algorithm, that replaces the expensive projection step with a linear optimization step over the feasible set. Indeed in many problems of interest, the linear optimization step admits much more efficient algorithms than the projection step, which is the reason to the substantial regained interest in this method in the past decade. On the downside, the convergence rates of the CG method often fall behind that of PGD and its variants.

In this talk I will survey an ongoing effort to design CG variants that on one hand enjoy the cheap iteration complexity of the original method, and on the other hand converge provably faster, and are applicable to a wider variety of machine learning settings. In particular I will focus on the cases in which the feasible set is either a polytope or a convex surrogate for low-rank matrices. Results will be demonstrated on applications including: LASSO, video co-localization, optical character recognition, matrix completion, and multi-class classification.

Suppose that you have n truly random bits X=(X1,...,Xn) and you wish to use them to generate m>>n pseudorandom bits Y=(Y1,..., Ym) using a local mapping, i.e., each Yi should depend on at most d=O(1) bits of x. In the polynomial regime of m=n^s, s>1, the only known solution, originates from (Goldreich, ECCC 2000), is based on Random Local Functions: Compute Yi by applying some fixed (public) d-ary predicate P to a random (public) tuple of distinct inputs. In this talk, we will try to understand, for any value of s, how the pseudorandomness of the resulting sequence depends on the choice of the underlying predicate.

Based on joint work with Shachar Lovett.

One of the most exciting possibilities opened by deep neural networks is end-to-end learning: the ability to learn tasks without the need for feature engineering or breaking down into sub-tasks. This talk will present three cases illustrating how end-to-end learning can operate in machine perception across the senses (Hearing, Vision) and even for the entire perception-cognition-action cycle.

The talk begins with speech recognition, showing how acoustic models can be learned end-to-end. This approach skips the feature extraction pipeline, carefully designed for speech recognition over decades.

Proceeding to vision, a novel application is described: identification of photographers of wearable video cameras. Such video was previously considered anonymous as it does not show the photographer.

The talk concludes by presenting a new task, encompassing the full perception-cognition-action cycle: visual learning of arithmetic operations using only pictures of numbers. This is done without using or learning the notions of numbers, digits, and operators.

The talk is based on the following papers:

Speech Acoustic Modeling From Raw Multichannel Waveforms, Y. Hoshen, R.J. Weiss, and K.W. Wilson, ICASSP'15

An Egocentric Look at Video Photographer Identity, Y. Hoshen, S. Peleg, CVPR'16

Visual Learning of Arithmetic Operations, Y. Hoshen, S. Peleg, AAAI'16

If $X$ is a finite set, the $p$-biased measure on the power-set of $X$ is defined as follows: choose a subset $S$ of $X$ at random by including each element of $X$ independently with probability $p$. If $\mathcal{F}$ is a family of subsets of $X$, one can consider the {\em $p$-biased measure} of $\mathcal{F}$, denoted by $\mu_p(\mathcal{F})$, as a function of $p$; if $\mathcal{F}$ is closed under taking supersets, then this function is an increasing function of $p$. Seminal results of Friedgut and Friedgut-Kalai give criteria for this function to have a 'sharp threshold'. A careful analysis of the behaviour of this function also yields some rather strong results in extremal combinatorics which do not explicitly mention the $p$-biased measure - in particular, in the field of {\em Erd\H{o}s-Ko-Rado type problems}, which concern the sizes of families of objects in which any two (or three...) of the objects 'agree' or 'intersect' in some way. We will discuss some of these, including a recent proof of an old conjecture of Frankl that a symmetric three-wise intersecting family of subsets of $\{1,2,\ldots,n\}$ has size $o(2^n)$, and some 'stability' results characterizing the structure of 'large' $t$-intersecting families of $k$-element subsets of $\{1,2,\ldots,n\}$. Based on joint work with (subsets of) Nathan Keller, Noam Lifshitz and Bhargav Narayanan.

Interactive proofs have had a dramatic impact on Complexity Theory and Cryptography. The celebrated IP=PSPACE Theorem [LFKN92,Shamir92] allows an all-powerful but untrusted prover to convince a polynomial-time verifier of the validity of extremely complicated statements (as long as they can be evaluated using polynomial space). The interactive proof system designed for this purpose requires a polynomial number of communication rounds. This talk will focus on studying the power of more efficient interactive proof systems.

Our main result is that for every statement that can be evaluated in polynomial time and bounded-polynomial space, there exists an interactive proof that satisfies the following strict efficiency requirements:

(1) The honest prover runs in polynomial time.

(2) The verifier is almost linear time (and under some conditions even sub linear).

(3) The interaction consists of only a constant number of communication rounds.

To obtain this result, we introduce and study several new notions for interactive proofs, which may be of independent interest. One of these notions is that of unambiguous interactive proofs, where the prover has a unique successful strategy. Another notion is that of probabilistically checkable interactive proofs (PCIPs) where the verifier only reads a few bits of the transcript in checking the proof (this could be viewed as an interactive extension of PCPs).

Joint work with Omer Reingold and Ron Rothblum.

Computer science and optics are usually studied separately -- separate people, in separate departments, meet at separate conferences. This is changing. The exciting promise of technologies like virtual reality and self-driving cars demand solutions that draw from the best aspects of computer vision, computer graphics, and optics. Previously, it has proved difficult to bridge these communities. For instance, the laboratory setups in optics are often designed to image millimeter-size scenes in a vibration-free darkroom.

This talk is centered around time of flight imaging, a growing area of research in computational photography. A time of flight camera works by emitting amplitude modulated (AM) light and performing correlations on the reflected light. The frequency of AM is in the radio frequency range (like a Doppler radar system), but the carrier signal is optical, overcoming diffraction limited challenges of full RF systems while providing optical contrast. The obvious use of such cameras is to acquire 3D geometry. By spatially, temporally and spectrally coding light transport we show that it may be possible to go "beyond depth", demonstrating new forms of imaging like photography through scattering media, fast relighting of photographs, real-time tracking of occluded objects in the scene (like an object around a corner), and even the potential to distinguish between biological molecules using fluorescence. We discuss the broader impact of this design paradigm on the future of 3D depth sensors, interferometers, computational photography, medical imaging and many other applications.

Cluster algebras are a class of commutative rings introduced by Fomin and Zelevinsky in 2000. I will give an introductory talk about cluster algebras. The main examples are the cluster algebra of type A2, the coordinate ring of $SL_4/N$, and the homogeneous coordinate ring of the Grassmannian $Gr_{2,n+3}(\mathbb{C})$.

This Thursday we will have two SIGGRAPH rehearsal talks in the Vision Seminar, one by Netalee Efrat and one by Meirav Galun. Abstracts are below. Each talk will be about 15 minutes (with NO interruptions), followed by 10 minutes feedback.

**Talk1 (Netalee Efrat): Cinema 3D: Large scale automultiscopic display **

While 3D movies are gaining popularity, viewers in a 3D cinema still need to wear cumbersome glasses in order to enjoy them. Automultiscopic displays provide a better alternative to the display of 3D content, as they present multiple angular images of the same scene without the need for special eyewear. However, automultiscopic displays cannot be directly implemented in a wide cinema setting due to variants of two main problems: (i) The range of angles at which the screen is observed in a large cinema is usually very wide, and there is an unavoidable tradeoff between the range of angular images supported by the display and its spatial or angular resolutions. (ii) Parallax is usually observed only when a viewer is positioned at a limited range of distances from the screen. This work proposes a new display concept, which supports automultiscopic content in a wide cinema setting. It builds on the typical structure of cinemas, such as the fixed seat positions and the fact that different rows are located on a slope at different heights. Rather than attempting to display many angular images spanning the full range of viewing angles in a wide cinema, our design only displays the narrow angular range observed within the limited width of a single seat. The same narrow range content is then replicated to all rows and seats in the cinema. To achieve this, it uses an optical construction based on two sets of parallax barriers, or lenslets, placed in front of a standard screen. This paper derives the geometry of such a display, analyzes its limitations, and demonstrates a proof-of-concept prototype.

*Joint work with Piotr Didyk, Mike Foshey, Wojciech Matusik, Anat Levin

**Talk 2 (Meirav Galun): Accelerated Quadratic Proxy for Geometric Optimization **

We present the Accelerated Quadratic Proxy (AQP) - a simple first order algorithm for the optimization of geometric energies defined over triangular and tetrahedral meshes. The main pitfall encountered in the optimization of geometric energies is slow convergence. We observe that this slowness is in large part due to a Laplacian-like term existing in these energies. Consequently, we suggest to exploit the underlined structure of the energy and to locally use a quadratic polynomial proxy, whose Hessian is taken to be the Laplacian. This improves stability and convergence, but more importantly allows incorporating acceleration in an almost universal way, that is independent of mesh size and of the specific energy considered. Experiments with AQP show it is rather insensitive to mesh resolution and requires a nearly constant number of iterations to converge; this is in strong contrast to other popular optimization techniques used today such as Accelerated Gradient Descent and Quasi-Newton methods, e.g., L-BFGS. We have tested AQP for mesh deformation in 2D and 3D as well as for surface parameterization, and found it to provide a considerable speedup over common baseline techniques.

*Joint work with Shahar Kovalsky and Yaron Lipman

The Jacquet-Rallis relative trace formula was introduced as a tool towards solving the global conjectures of Gan-Gross-Prasad for unitary groups. I will present some recent progress in developing the full formula.

I will show how to extend the transfer of regular orbital integrals to singular geometric terms using a mix of local and global methods.

(Joint with Pierre-Henri Chaudouard)

Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge-coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. In this paper we prove that these exponential gaps are necessary and establish connections between the deterministic and randomized complexities in the LOCAL model. Each result has a very compelling take-away message:

1. Fast Δ-coloring of trees requires random bits: Building on the recent lower bounds of Brandt et al., we prove that the randomized complexity of Δ-coloring a tree with maximum degree Δ≥55 is Θ(log_Δ log n), whereas its deterministic complexity is Θ(log_Δ n) for any Δ≥3. This also establishes a large separation between the deterministic complexity of Δ-coloring and (Δ+1)-coloring trees.

2. Randomized lower bounds imply deterministic lower bounds: We prove that any deterministic algorithm for a natural class of problems that runs in O(1)+o(log_Δ n) rounds can be transformed to run in O(log*n −log*Δ+1) rounds. If the transformed algorithm violates a lower bound (even allowing randomization), then one can conclude that the problem requires Ω(log_Δ n) time deterministically.

3. Deterministic lower bounds imply randomized lower bounds: We prove that the randomized complexity of any natural problem on instances of size n is at least its deterministic complexity on instances of size √ log n. This shows that a deterministic Ω(log_Δ n) lower bound for any problem implies a randomized Ω(log_Δ log n) lower bound. It also illustrates that the graph shattering technique is absolutely essential to the LOCAL model.

Joint work with Tsvi Kopelowitz and Seth Pettie. http://arxiv.org/abs/1602.08166

There are numerous essentially equivalent characterizations of mixing in $L_1$ of a finite Markov chain. Some of these characterizations involve natural probabilistic concepts such as couplings, stopping times and hitting times. In contrast, while there are several analytic and geometric tools for bounding the $L_2$ mixing time, none of them are tight and they do not have a probabilistic interpretation.

We provide tight probabilistic characterizations in terms of hitting times distributions for mixing in $L_2$ (for normal chains) and (under reversibility) in relative entropy. This is done by assigning appropriate penalty (depending on the size of the set) to the case that the chain did not escape from a certain set.

We also prove a new extremal characterization of the log-sobolev constant in terms of a weighted version of the spectral gap (where the weight depends on the size of the support of the function).

We will discuss representation theory of a symmetric pair (G,H), where G is a complex reductive group, and H is a real form of G. The main objects of study are the G-representations with a non trivial H-invariant functional, called the H-distinguished representations of G.

I will give a necessary condition for a G-representation to be H-distinguished and show that the multiplicity of such representations is less or equal to the number of double cosets B\G/H, where B is a Borel subgroup of G.

Quantum computers are hypothetical devices, based on quantum physics, which would enable us to perform certain computations (among them some that Chaim Leib Pekeris pioneered) hundreds of orders of magnitude faster than digital computers. This feature is coined “quantum supremacy.” We start the lecture with a gentle introduction to computing - classical and quantum, with basic notions of computational complexity, and with the vision of quantum computers.

A main reason for concern regarding the feasibility of quantum computers is that quantum systems are inherently noisy. We will explain what is "noise" and describe an optimistic hypothesis regarding quantum noise that will allow quantum computing and a pessimistic hypothesis that won’t. The remarkable progress witnessed during the past two decades in the field of experimental physics of controlled quantum systems places the decision between the pessimistic and optimistic hypotheses within reach. On the optimistic side, one aspect or another of quantum supremacy might be seen by experiments in the near future: by implementing quantum error-correction or by systems of free bosons or by exotic new phases of matter called anyons or by quantum annealing, or in various other ways.

In the lecture I will explain my pessimistic line of research and here is a brief summary of my view: understanding quantum computers in the presence of noise requires consideration of behavior at different scales. In the small scale, standard models of noise from the mid-90s are suitable, and quantum evolutions and states described by them manifest a very low-level computational power. This small-scale behavior has far-reaching consequences for the behavior of noisy quantum systems at larger scales. On the one hand, it does not allow reaching the starting points for quantum fault tolerance and quantum supremacy, making them both impossible at all scales. On the other hand, it leads to novel implicit ways for modeling noise at larger scales and to various predictions on the behavior of noisy quantum systems.

We will rely on the theory of noise-sensitivity and stability developed with Benjamini and Schramm in the late 90s and on recent work with Guy Kindler related to the mysterious gap between permanents and determinants (or, in other words, between bosons and fermions).

Let F/Q_p be a finite extension, supersingular representations are the irreducible mod p representations of GL_n(F) which do not appear as a subquotient of a principal series representation, and similarly to the complex case, they are the building blocks of the representation theory of GL_n(F). Historically, they were first discovered by L. Barthel and R. Livne some twenty years ago and they are still not understood even for n=2.

For F=Q_p, the supersingular representations of GL_2(F) have been classified by C. Breuil, and a local mod p Langlands correspondence was established between them and certain mod p Galois representations.

When one tries to generalize this connection and move to a non-trivial extension of Q_p, Breuil's method fails; The supersingular representations in that case have complicated structure and instead of two as in the case F=Q_p we get infinitely many such representations, when there are essentially only finitely many on the Galois side.

In this talk we give an exposition of the subject and explore, using what survives from Breuil's methods, the universal modules whose quotients contain all the supersingular representations in the difficult case where F is a non-trivial extension of Q_p.

In distributed systems, communication between the participants in the computation is usually the most expensive part of the computation. Theoretical models of distributed systems usually reflect this by neglecting the cost of local computation, and charging only for messages sent between the participants; in particular, we usually assume that the computation proceeds in rounds, and in each round, each participant can send only a limited number of bits. We are interested in characterizing the number of rounds required to perform various tasks.

In this talk I will describe two sets of results. The first concerns the complexity of distributed subgraph detection: we have n servers, each representing a node in an undirected graph, and each server receives as input its adjacent edges in the graph. The goal of the computation is to determine whether the global input graph contains some fixed subgraph. I will describe upper and lower bounds for several classes of subgraphs, through a connection to Turan numbers. The general case remains open.

In the second part of the talk I will describe recent work on multi- party number-in-hand communication and information complexity, and show a tight upper and lower bound for set disjointness in the shared blackboard model.

Joint work with Mark Braverman, Andrew Drucker and Fabian Kuhn.

Children may learn about the world by pushing, banging, and manipulating things, watching and listening as materials make their distinctive sounds-- dirt makes a thud; ceramic makes a clink. These sounds reveal physical properties of the objects, as well as the force and motion of the physical interaction.

We've explored a toy version of that learning-through-interaction by recording audio and video while we hit many things with a drumstick. We developed an algorithm the predict sounds from silent videos of the drumstick interactions. The algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We demonstrate that the sounds generated by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that the task of predicting sounds allows our system to learn about material properties in the scene.

Joint work with:

Andrew Owens, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson

http://arxiv.org/abs/1512.08512

The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. In a sense, the strongest guarantee available is the L2 guarantee, which requires finding all items that occur at least eps*||f|| times in the stream, where the i-th coordinate of the vector f is the number of occurrences of i in the stream. The first algorithm to achieve the L2 guarantee was the CountSketch (Charikar, Chen, and Farach-Colton ICALP'02), which, for constant eps, requires O(log n) words of memory and O(log n) update time. It is known to be space-optimal if the stream includes deletions.

In this talk I will discuss recent improvements that allow us to find L2 heavy hitters in O(1) memory and O(1) update time in insertion only streams. The improvements rely on a deeper understanding of the AMS sketch (Alon, Matias, and Szegedy STOC'96) and similar sketches and draw on the theory of Gaussian processes. This talk is based on joint work with Vladimir Braverman, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P. Woodruff in arxiv:1511.00661 and arxiv:1603.00759.

There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for essentially similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self-averaging, interpolating algorithm which fits training data without error but is nevertheless somewhat smooth. We show that AdaBoost has the same property. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples and some theoretical justification to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees and without direct regularization or early stopping.

For some rectangular Hardy classes of analytic functions,an optimal method of interpolation has been previously found, within the framework of Optimal Recovery. It will be shown that this method of interpolation, based on the Abel-Jacobi elliptic functions, is also optimal, according to corresponding criteria of Nonparametric Regression and Optimal Design.

In a non-asymptotic setting, the maximal mean squared error of the optimal interpolant is evaluated explicitly, for all noise levels away from 0. In these results, a pivotal role is played by an interference effect, in which both the stochastic and deterministic parts of the interpolant exhibit an oscillating behavior, with the two oscillating processes mutually subduing each other.

In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, we propose the methodology of sampling correctors. Such algorithms use structure that the distribution is purported to have, in order to allow one to make "on-the-fly" corrections to samples drawn from probability distributions. These algorithms may then be used as filters between the noisy data and the end user. We show connections between sampling correctors, distribution learning algorithms, and distribution property testing algorithms. We show that these connections can be utilized to expand the applicability of known distribution learning and property testing algorithms as well as to achieve improved algorithms for those tasks.Warning: This talk contains more questions than answers...

Joint work with Clement Canonne and Themis Gouleakis.

We present a randomized algorithm that computes a Minimum Spanning Tree (MST) in O(log^* n) rounds, with high probability, in the Congested Clique model of distributed computing. In this model, the input is a graph on n nodes, initially each node knows only its incident edges, and per round each two nodes can exchange O(log n) bits.

Our key technical novelty is an O(log^* n) Graph Connectivity algorithm, the heart of which is a (recursive) forest growth method, based on a combination of two ideas: a sparsity-sensitive sketching aimed at sparse graphs and a random edge sampling aimed at dense graphs.

Our result improves significantly over the $O(\log \log \log n)$ algorithm of Hegeman et al. [PODC 2015] and the $O(\log \log n)$ algorithm of Lotker et al. [SPAA 2003; SICOMP 2005].

Join work with Mohsen Ghaffari.

Estimating the amount of distinct elements in a dataset by examining only a fraction of the data is known to be a hard problem, both theoretically and in practice.

Our work explores a breakthrough theoretical result by Valiant and Valiant from 2011 that presents a provably accurate method for doing such estimations.

Our goal is to put this theory into practice for the important task of estimating the deduplication ratio of a very large dataset. However, deploying this technique in a real world setting runs into significant obstacles.

In the talk I will describe new techniques that help bridging the gap and enable the use of this exciting approach. Our work achieves a major improvement over the current state of the art practical solutions.

The talk is for a general audience, no prior knowledge is assumed.

Based on joint work with Dmitry Sotnikov and Ety Khaitzin that appeared at Usenix FAST 2016.

While co-addition and subtraction of astronomical images stand at the heart of observational astronomy, the existing solutions for them lack rigorous argumentation, are not achieving maximal sensitivity and are often slow. Moreover, there is no widespread agreement on how they should be done, and often different methods are used for different scientific applications. I am going to present rigorous solutions to these problems, deriving them from the most basic statistical principles. These solutions are proved optimal, under well defined and practically acceptable assumptions, and in many cases improve substantially the performance of the most basic operations in astronomy.

For coaddition, we present a coadd image that is:

a) sufficient for any further statistical decision or measurement on the underlying constant sky, making the entire data set redundant.

b) improves both survey speed (by 5-20%) and effective spatial resolution of past and future astronomical surveys.

c) improves substantially imaging through turbulence applications.

d) much faster than many of the currently used coaddition solutions.

For subtraction, we present a subtraction image that is:

a) optimal for transient detection under the assumption of spatially uniform noise.

b) sufficient for any further statistical decision on the differences between the images, including the identification of cosmic rays and other image artifacts.

c) Free of subtraction artifacts, allowing (for the first time) robust transient identification in real time, opening new avenues for scientific exploration.

d) orders of magnitude faster than past subtraction methods.

I will present an elementary proof of the following theorem of Alexander Olshanskii:

Let F be a free group and let A,B be finitely generated subgroups of infinite index in F. Then there exists an infinite index subgroup C of F which contains both A and a finite index subgroup of B.

The proof is carried out by introducing a 'profinite' measure on the discrete group F, and is valid also for some groups which are not free.Some applications of this result will be discussed:

1. Group Theory - Construction of locally finite faithful actions of countable groups.

2. Number Theory - Discontinuity of intersections for large algebraic extensions of local fields.

3. Ergodic Theory - Establishing cost 1 for groups boundedly generated by subgroups of infinite index and finite cost.

A common approach to face recognition relies on using deep learning for extracting a signature. All leading work on the subject use stupendous amounts of processing power and data. In this work we present a method for efficient and compact learning of metric embedding. The core idea allows a more accurate estimation of the global gradient and hence fast and robust convergence. In order to avoid the need for huge amounts of data we include an explicit alignment phase into the network, hence greatly reducing the number of parameters. These insights allow us to efficiently train a compact deep learning model for face recognition in only 12 hours on a single GPU, which can then fit a mobile device.

Joint work with: Oren Tadmor, Tal Rosenwein, Shai Shalev-Schwartz, Amnon Shashua

Expander graphs are widely studied, and various methods are known to obtain bounded degree expander graphs. Recently, there is a growing interest in understanding combinational expansion in higher dimensions (higher dimensional simplicial complexes). However, bounded degree combinatorial expanders (random or explicit) were not known till our work.

We present a local to global criterion on a complex that implies combinatorial expansion. We use our criterion to present explicit bounded degree high dimensional expanders. This solves in the affirmative an open question raised by Gromov, who asked whether bounded degree high dimensional expanders could at all exist.

We expect that the emerging theory of high dimensional expansion is likely to have various application in the theory of computation. Thus, one of the goals of this talk in to introduce this concept to the theory community.

Based on joint works with David Kazhdan and Alex Lubotzky, and with Shai Evra.

Dynamic events such as family gatherings, concerts or sports events are often photographed by a group of people. The set of still images obtained this way is rich in dynamic content. We consider the question of whether such a set of still images, rather than traditional video sequences, can be used for analyzing the dynamic content of the scene. This talk will describe several instances of this problem, their solutions and directions for future studies.

In particular, we will present a method to extend epipolar geometry to predict location of a moving feature in CrowdCam images. The method assumes that the temporal order of the set of images, namely photo-sequencing, is given. We will briefly describe our method to compute photo-sequencing using geometric considerations and rank aggregation. We will also present a method for identifying the moving regions in a scene, which is a basic component in dynamic scene analysis. Finally, we will consider a new vision of developing collaborative CrowdCam, and a first step toward this goal.

This talk will be based on joint works with Tali Dekel, Adi Dafni, Mor Dar, Lior Talked, Ilan Shimshoni, and Shai Avidan.

Projection inequalities bound the volume of a body in terms of a product of the volumes of lower-dimensional projections of the body. Two well-known examples are the Loomis-Whitney inequality and the more general Uniform Cover inequality. We will see how to use information theory to prove stability versions of these inequalities, showing that when they are close to being tight, the body in question is close to being a box (which is the unique case of equality). We will also see how to obtain a stability result for the edge-isoperimetric inequality in the infinite d-dimensional lattice. Namely, that a subset of Z^d with small edge-boundary must be close in symmetric difference to a d-dimensional cube.

Based on work with David Ellis, Ehud Friedgut and Guy Kindler.

It is common practice in multivariate and matrix-valued data analysis to reduce dimensionality by performing a Singular Value Decomposition or Principal Component Analysis, and keeping only $r$ singular values or principal components, the rest being presumably associated with noise. However, the literature does not propose a disciplined criterion to determine $r$; most practitioners still look for the ``elbow in the Scree Plot'', a 50-years-old heuristic performed by eye. I'll review a line of work which develops a systematic approach to eigenvalue and singular value thresholding. This approach assumes that the signal is low-rank and that the noise is rotationally invariant. Recent results derive optimal thresholds in the presence of quite general noise distributions.

Joint work with David Donoho, Iain Johnstone and Edgar Dobriban (Stanford).

Let K be a complete discrete valuation field with finite residue field of characteristic p>0. Let G be the absolute Galois group of K and for a natural M, let G(M) be the maximal quotient of G of nilpotent class <p and period p^M. Then G(M) can be identified with a group obtained from a Lie Z/p^M-algebra L via (truncated) Campbell-Hausdorff composition law. Under this identification the ramification subgroups in upper numbering G(M)^(v)correspond to ideals L^(v) of L. It will be explained an explicit construction of L and the ideals L^(v). The case of fields K of characteristic p was obtained by the author in 1990's (recently refined), the case of fields K of mixed characteristic requires the assumption that K contains a primitive p^M-th root of unity (for the case M=1 cf. Number Theory Archive).

We show that a version of the classical Iterated Log Law of Khinchin, and independently of Kolmogorov from the 1920's, holds for various parameters in the binomial random graph model and in a random 0/1 Bernoulli matrix. In particular, for a constant p, we show that such a law holds for the number of copies of a fixed graph H in G(n,p), we show a similar statement for the number of Hamilton cycles in a random k-uniform hypergraph, provided that k\geq 4. In the graph case (that is, k=2), since the number of Hamilton cycles in G(n,p), denoted by X_n, does not converge to a normal distribution but rather tends to a log-normal distribution (as has been first proved by Janson), we show that a version of the Iterated Log Law holds for \log X_n. We also obtain similar result for the permanent of a 0/1 bernouli random matrix.

No prior knowledge is required.

Joint with Daniel Motealegre and Van Vu.

Heuristically, delocalization for a random matrix means that its normalized eigenvectors look like the vectors uniformly distributed over the unit sphere. This can be made precise in a number of different ways. We show that with high probability, any sufficiently large set of coordinates of an eigenvector carries a non-negligible portion of its Euclidean norm. Our results pertain to a large class of random matrices including matrices with independent entries, symmetric, skew-symmetric matrices, as well as more general ensembles.

Joint work with Roman Vershynin.

Raz's celebrated Parallel Repetition Theorem shows that the probability of simultaneously winning n independent instances of a two-player one-round game G is exponentially small in n, when the maximum success probability of G is less than 1. Though the statement is intuitive, the proof is rather nontrivial and has found important application in hardness of approximation, cryptography, and communication complexity.

There are two major open problems regarding the parallel repetition of games: does an analogue of Raz's theorem hold for (a) games with more than two players, and (b) games with quantumly entangled players? Extending Raz’s theorem to these settings is a challenging problem for a number of reasons: techniques for attacking direct sum/direct product problems in multiparty settings are lacking, and our understanding of quantum entanglement as an information theoretic resource is quite limited.

In this work, we show to sidestep these barriers and make progress on the two open problems. We first prove exponential-decay parallel repetition theorems for a class of games we called "anchored games" in the multiplayer and entangled-player settings. Then, we show how to efficiently transform any game into an equivalent anchored game. Together, our results provide a simple hardness-amplification technique for games in both the classical multiplayer and quantum settings.

Joint work with Mohammad Bavarian and Thomas Vidick.

Erdös-Ko-Rado (EKR) type theorems yield upper bounds on the size of set families under various intersection requirements on the elements. Stability versions of such theorems assert that if the size of a family is close to the maximum possible then the family itself must be close (in appropriate sense) to a maximum family. In this talk we present an approach to stability versions of EKR-type theorems through isoperimetric inequalities for subsets of the hypercube. We use this approach to obtain tight stability versions of the EKR theorem itself and of the Ahlswede-Khachatrian theorem on t-intersecting families (for k < n/(t+1)), and to show that, somewhat surprisingly, both theorems hold when the "intersection" requirement is replaced by a much weaker requirement. Furthermore, we obtain stability versions of several recent EKR-type results, including Frankl's proof of the Erdös matching conjecture for n>(2s+1)k-s.

Joint work with David Ellis and Noam Lifshitz.

We study the discrepancy of the number of visits of a Kronicker sequence on a d dimensional torus to nice sets. We are interested in particular in the question how the answer depends on the geometry of the set.

This is a joint work with Bassam Fayad.

(http://arxiv.org/abs/1211.4323 and http://arxiv.org/abs/1206.4853)

Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) initiated the formal study of this problem, and gave the first upper bounds on the achievable generalization error for adaptive data analysis.

The results of Dwork et al. are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that stable algorithms of the kind guaranteed by differential privacy imply low generalization error. We also show that weaker stability guarantees such as bounded KL divergence and total variation distance lead to correspondingly weaker generalization guarantees.

Joint work with Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, and Jonathan Ullman.

For the past 40 years computer scientists generally believed that NP-complete problems are intractable. In particular, Boolean satisfiability (SAT), as a paradigmatic NP-complete problem, has been considered to be intractable. Over the past 20 years, however, there has been a quiet, but dramatic, revolution, and very large SAT instances are now being solved routinely as part of software and hardware design.

In this talk I will review this amazing development and show that we can leverage SAT solving to accomplish other Boolean reasoning tasks. Counting the number of satisfying truth assignments of a given Boolean formula or sampling such assignments uniformly at random are fundamental computational problems in computer science with numerous applications. While the theory of these problems has been thoroughly investigated in the 1980s, approximation algorithms developed by theoreticians do not scale up to industrial-sized instances. Algorithms used by the industry offer better scalability, but give up certain correctness guarantees to achieve scalability. We describe a novel approach, based on universal hashing and Satisfiability Modulo Theory, that scales to formulas with hundreds of thousands of variable without giving up correctness guarantees.

Homological algebra plays a major role in noncommutative ring theory. One important homological construct related to a noncommutative ring A is the dualizing complex, which is a special kind of complex of A-bimodules. When A is a ring containing a central field K, this concept is well-understood now. However, little is known about dualizing complexes when the ring A does not contain a central field (I shall refer to this as the noncommutative arithmetic setting). The main technical issue is finding the correct derived category of A-bimodules.

In this talk I will propose a promising definition of the derived category of A-bimodules in the noncommutative arithmetic setting. Here A is a (possibly) noncommutative ring, central over a commutative base ring K (e.g. K = Z). The idea is to resolve A: we choose a DG (differential graded) ring A', central and flat over K, with a DG ring quasi-isomorphism A' -> A. Such resolutions exist. The enveloping DG ring A'^{en} is the tensor product over K of A' and its opposite. Our candidate for the "derived category of A-bimodules" is the category D(A'^{en}), the derived category of DG A'^{en}-modules. A recent theorem says that the category D(A'^{en}) is independent of the resolution A', up to a canonical equivalence. This justifies our definition.

Working within D(A'^{en}), it is not hard to define dualizing complexes over A, and to prove all their expected properties (like when K is a field). We can also talk about rigid dualizing complexes in the noncommutative arithmetic setting.

What is noticeably missing is a result about existence of rigid dualizing complexes. When the K is a field, Van den Bergh had discovered a powerful existence result for rigid dualizing complexes. We are now trying to extend Van den Bergh's method to the noncommutative arithmetic setting. This is work in progress, joint with Rishi Vyas.

In this talk I will explain, in broad strokes, what are DG rings, DG modules, and the associated derived categories and derived functors. Also, I will try to go into the details of a few results and examples, to give the flavor of this material.

Stochastic analysis of real-world signals consists of 3 main parts: mathematical representation; probabilistic modeling; statistical inference. For it to be effective, we need mathematically-principled and practical computational tools that take into consideration not only each of these components by itself but also their interplay. This is especially true for a large class of computer-vision and machine-learning problems that involve certain mathematical structures; the latter may be a property of the data or encoded in the representation/model to ensure mathematically-desired properties and computational tractability. For concreteness, this talk will center on structures that are geometric, hierarchical, or topological.

Structures present challenges. For example, on nonlinear spaces, most statistical tools are not directly applicable, and, moreover, computations can be expensive. As another example, in mixture models, topological constraints break statistical independence. Once we overcome the difficulties, however, structures offer many benefits. For example, respecting and exploiting the structure of Riemannian manifolds and/or Lie groups yield better probabilistic models that also support consistent synthesis. The latter is crucial for the employment of analysis-by-synthesis inference methods used within, e.g., a generative Bayesian framework. Likewise, imposing a certain structure on velocity fields yields highly-expressive diffeomorphisms that are also simple and computationally tractable; particularly, this facilitates MCMC inference, traditionally viewed as too expensive in this context.

Time permitting, throughout the talk I will also briefly touch upon related applications such as statistical shape models, transfer learning on manifolds, image warping/registration, time warping, superpixels, 3D-scene analysis, nonparametric Bayesian clustering of spherical data, multi-metric learning, and new machine-learning applications of diffeomorphisms. Lastly, we also applied the (largely model-based) ideas above to propose the first learned data augmentation scheme; as it turns out, when compared with the state-of-the-art schemes, this improves the performance of classifiers of the deep-net variety.

I will describe recent work on building and using rich representations aimed at automatic analysis of visual scenes. In particular, I will describe methods for semantic segmentation (labeling regions of an image according to the category it belongs to), and on semantic boundary detection (recovering accurate boundaries of semantically meaningful regions, such as those corresponding to objects). We focus on feed-forward architectures for these tasks, leveraging recent advances in the art of training deep neural networks. Our approach aims to shift the burden of inducing desirable constraints from explicit structure in the model to implicit structure inherent in computing richer, context-aware representations. I will describe experiments on standard benchmark data sets that demonstrate the success of this approach.

Joint work with Mohammadreza Mostajabi, Payman Yadollahpour, and Harry Yang.

Many questions about properties of a finite group such as random walks, spectrum of Cayley graphs, distribution of word maps etc., can be approached by using “generalized Fourier sum” formulas involving characters of the group. Numerical data show that characters of low dimensional representations of the group contribute the largest terms to these sums. However, relatively little seems to be known about these small representations so a systematic knowledge of them could lead to proofs of some of the properties. The talk will demonstrates, through concrete examples, and numerical simulations, a new method to construct and analyze those small representations, and hence hopefully to solve some of the aforementioned questions.

The talk is intended for non-experts.

This is part from a joint project with Roger Howe (Yale).

Many sequence prediction tasks---such as automatic speech recognition and video analysis---benefit from long-range temporal features. One way of utilizing long-range information is through segmental (semi-Markov) models such as segmental conditional random fields. Such models have had some success, but have been constrained by the computational needs of considering all possible segmentations. We have developed new segmental models with rich features based on neural segment embeddings, trained with discriminative large-margin criteria, that are efficient enough for first-pass decoding. In our initial work with these models, we have found that they can outperform frame-based HMM/deep network baselines on two disparate tasks, phonetic recognition and sign language recognition from video. I will present the models and their results on these tasks, as well as (time permitting) related recent work on neural segmental acoustic word embeddings.

This is joint work with Hao Tang, Weiran Wang, Herman Kamper, Taehwan Kim, and Kevin Gimpel

Finite group theorists have established many formulas that express interesting properties of a finite group in terms of sums of characters of the group. An obstacle to applying these formulas is lack of control over the dimensions of representations of the group. In particular, the representations of small dimension tend to contribute the largest terms to these sums, so a systematic knowledge of these small representations could lead to proofs of some of these facts. This talk will discuss a new method for systematically constructing the small representations of finite classical groups. I will explain the method with concrete examples and applications.

This is part from a joint project with Roger Howe (Yale).

Many sequence prediction tasks---such as automatic speech recognition and video analysis---benefit from long-range temporal features. One way of utilizing long-range information is through segmental (semi-Markov) models such as segmental conditional random fields. Such models have had some success, but have been constrained by the computational needs of considering all possible segmentations. We have developed new segmental models with rich features based on neural segment embeddings, trained with discriminative large-margin criteria, that are efficient enough for first-pass decoding. In our initial work with these models, we have found that they can outperform frame-based HMM/deep network baselines on two disparate tasks, phonetic recognition and sign language recognition from video. I will present the models and their results on these tasks, as well as (time permitting) related recent work on neural segmental acoustic word embeddings.

This is joint work with Hao Tang, Weiran Wang, Herman Kamper, Taehwan Kim, and Kevin Gimpel

Recently, the "information percolation" framework was introduced as a way to obtain sharp estimates on mixing for spin systems at high temperatures, and in particular, to establish cutoff for the Ising model in three dimensions up to criticality from a worst starting state. I will describe how this method can be used to understand the effect of different initial states on the mixing time, both random (''warm start'') and deterministic.

Joint work with Allan Sly.

Let *X* be a compact manifold with the boundary ∂ *X* and *R *(λ) be a Dirichlet-to-Neumann operator: *R *(λ): *f* → *u*|_{∂X} where *u* solves ( _{}+ λ^{2}) *u *= 0, *u*|_{∂X} = *f* . We establish asymptotics as λ→ + ∞ of the number of eigenvalues of λ^{-1 }*R* (λ) between s_{1} and s_{2}.

This is a joint work with Andrew Hassell, Australian National University.

In today’s world there are huge amounts of data that need to get reliably stored or transmitted. However, some amount of noise or corruption is inevitable. An error-correcting code is a scheme for robustly representing data in the form of a codeword that allows one to detect and correct errors in transmission. Locally-testable and locally-decodable codes are special families of error-correcting codes that admit highly efficient algorithms that detect and correct errors in sublinear time with high probability, probing only a small number of entries of the corrupted codeword. While locally-testable and locally-decodable codes have been intensely studied in the past 2 decades, in recent years there has been even further incentive for their study due to their relevance for transmission and storage of massive data and the successful implementation of local codes in cloud storage systems.

In this talk, I will show an exponential improvement on the best-known running time of error detection and correction algorithms for locally-testable and locally-decodable codes. Specifically, I will describe new families of locally-testable codes with constant rate that can detect a constant fraction of errors in time (log n)^{O(log log n)} and new families of locally-decodable codes of constant rate that can correct a constant fraction of errors in time exp(\sqrt{log n}). Prior to that, the best known running time for such codes was n^{epsilon} (for a constant epsilon) using several, quite different, constructions.

(Based on joint work with Swastik Kopparty, Or Meir and Shubhangi Saraf)

Locality-Sensitive Hashing (LSH) is a powerful technique for the approximate nearest neighbor search (ANN) in high dimensions. In this talk I will present two recent results:

1) I will show a data structure for ANN for the Euclidean distance that provably outperforms the best possible LSH-based data structure. We proceed via designing a good *data-dependent* hash family.

2) I will show a practical and optimal LSH family for the cosine similarity (a.k.a. Euclidean distance on a sphere). It substantially outperforms the celebrated Hyperplane LSH family. Along the way, I will try to debunk two popular myths about LSH:

* LSH-based data structures consume too much memory and are thus impractical;

* Optimal LSH constructions are too complicated to be made practical.

The talk is based on two papers: arXiv: 1501.01062 (joint with Alexandr Andoni, STOC 2015) and arXiv: 1509.02897 (joint with Alexandr Andoni, Piotr Indyk, Thijs Laarhoven and Ludwig Schmidt, NIPS 2015).

We show techniques for decreasing the error probability of randomized algorithms and for converting randomized algorithms to deterministic (non-uniform) algorithms. Unlike most existing techniques that involve repetition of the randomized algorithm and hence a slowdown, our techniques produce algorithms with a similar run-time to the original randomized algorithms.

The amplification technique is related to a certain stochastic multi-armed bandit problem. The derandomization technique -- which is the main contribution of this work -- points to an intriguing connection between derandomization and sketching/sparsification.

We demonstrate the techniques by showing applications to max-cut on dense graphs, approximate clique, constraint satisfaction problems on dense bipartite graphs, and list decoding to unique decoding for Reed-Muller code.

This is joint work with Ofer Grossman.

By a Theorem of Aaronson, normalized Birkhoff sums of positive integrable functions in infinite, ergodic systems either tend to 0 almost surely or there is a subsequence along which every further subsequence tends to infinity. This is not true for normalized symmetric Birkhoff sums where the summation is along a symmetric time interval as there are examples of infinite, ergodic systems for which the absolutely normalized symmetric Birkhoff sums of positive integrable functions may be almost surely bounded away from zero and infinity. In this talk I will start by explaining a variety of transformations (of different nature) satisfying this phenomena, discuss the case main result that the absolutely normalized, symmetric Birkhoff sums of positive integrable functions in infinite, ergodic systems never converge point-wise and there even exists a universal divergence statement. Time permits I will show some examples of actions of other groups which converge and some recent (yet unwritten) results on actions by commuting skew products which are related to self intersection local times.

The contents of this talk are a combination of 3 papers, one of which is a joint work with Benjamin Weiss and Jon Aaronson and another one is work in progress with Jon Aaronson.

Parameter estimation is performed by fitting data measurements to a model using Bayesian statistics, assuming additional prior information. The estimation requires a numerical solution of large scale optimization problem, whose objective traditionally includes data fidelity and regularization terms. In this talk I will present numerical solution methods for two such estimation problems.

In the first part of the talk I will concentrate on parameter estimation of physical models, obtained by solving optimization problems that are constrained by partial differential equations (PDEs). I will focus on my recent work on 3D Full Waveform Inversion, which arises in seismic exploration of oil and gas reservoirs, earth sub-surface mapping, ultrasound imaging and more. I will demonstrate how to computationally treat this inverse problem, and improve its solution by using travel time tomography in a joint inversion framework. This includes efficient algorithms for the solution of the Helmholtz and eikonal equations (the two associated PDEs), and a parallel software framework for applying these algorithms for the joint inversion using a Gauss Newton algorithm.

In the second part of the talk, I will consider the estimation of large scale sparse inverse covariance matrices of multivariate Gaussian distribution. Such matrices are often used to characterize and analyze data measurements in fields that range from machine learning, signal processing and computational biology. To estimate these matrices, an l1 regularized log-determinant optimization problem needs to be solved. I will present a block-coordinate descent algorithm that can efficiently solve this problem at large scales with low memory footprint, and a multilevel acceleration framework that is suitable for general sparse optimization problems. These algorithms can be used as a tool for enriching inverse problems by "learning" appropriate prior information, adopting an empirical Bayesian framework.

Line defects appear in the microscopic structure of crystalline materials (e.g. metals) as well as liquid crystals, the latter an intermediate phase of matter between liquids and solids. Mathematically, their study is challenging since they correspond to topological singularities that result in blow-up of total energies of finite bodies when utilizing most commonly used classical models of energy density; as a consequence, formulating nnonlinear dynamical models (especially pde) for the representation and motion of such defects is a challenge as well. I will discuss the development and implications of a single pde model intended to describe equilibrium states and dynamics of these defects. The model alleviates the nasty singularities mentioned above and it will also be shown that incorporating a conservation law for the topological charge of line defects allows for the correct prediction of some important features of defect dynamics that would not be possible just with the knowledge of an energy function.

This is joint work with Chiqun Zhang, Dmitry Golovaty, and Noel Walkington.

The class P attempts to capture the efficiently solvable computational tasks. It is full of practically relevant problems, with varied and fascinating combinatorial structure.

In this talk, I will give an overview of a rapidly growing body of work that seeks a better understanding of the structure within P. Inspired by NP-hardness, the main tool in this approach are combinatorial reductions. Combining these reductions with a small set of plausible conjectures, we obtain tight lower bounds on the time complexity of many of the most important problems in P.

I will present the most recent landscape of P and the conjectures on which this project is based on (e.g. the Strong Exponential Time Hypothesis). I will discuss recent attempts on identifying new conjectures: either more reliable ones, or ones that will get us closer to a full classification of the important problems in P.

Finally, I will highlight a surprising new reduction from Circuit-SAT to natural problems in P like Edit-Distance which proves that minor improvements over the quadratic running time of Edit-Distance are enough to prove major complexity separations.

In many practical parameter estimation problems, such as medical experiments and cognitive radio communications, parameter selection is performed prior to estimation. The selection process has a major impact on subsequent estimation by introducing a selection bias and creating coupling between decoupled parameters. As a result, classical estimation theory may be inappropriate and inaccurate and a new methodology is needed. In this study, the problem of estimating a preselected unknown deterministic parameter, chosen from a parameter set based on a predetermined data-based selection rule, \Psi, is considered. In this talk, I present a general non-Bayesian estimation theory for estimation after parameter selection, includes estimation methods, performance analysis, and adaptive sampling strategies. First, I use the post-selection mean-square-error (PSMSE) criterion as a performance measure instead of the commonly used mean-square-error (MSE). The corresponding Cramér-Rao-type bound on the PSMSE of any \Psi-unbiased estimator is derived, where the \Psi -unbiasedness is in the Lehmann-unbiasedness sense. The post-selection maximum-likelihood (PSML) estimator is presented and its \Psi–efficiency properties are demonstrated. Practical implementations of the PSML estimator are proposed as well. Finally, I discuss the concept of adaptive sampling in a two-sampling stages scheme of selection and estimation.

Transmission rates in broadband optical waveguide systems are enhanced by launching many pulse sequences through the same waveguide. Since pulses from different sequences propagate with different group velocities, intersequence pulse collisions are frequent, and can lead to severe transmission degradation. On the other hand, the energy exchange in pulse collisions can be beneficially used for controlling the transmission.

In this work we show that collision-induced amplitude dynamics of soliton sequences of N perturbed coupled nonlinear Schrödinger (NLS) equations can be described by N-dimensional Lotka-Volterra (LV) models, where the model's form depends on the perturbation. To derive the LV models, we first carry out single-collision analysis, which is based on the method of eigenmode expansion with the eigenmodes of the linear operator describing small perturbations about the fundamental NLS soliton. We use stability and bifurcation analysis for the equilibrium points of the LV models to develop methods for achieving robust transmission stabilization and switching that work well for a variety of waveguides. Further enhancement of transmission stability is obtained in waveguides with a narrowband Ginzburg-Landau gain-loss profile. We also discuss the possibility to use the relation between NLS and LV models to realize transition to spatio-temporal chaos with NLS solitons.

The Max-Cut problem seeks to determine the maximal cut size in a given graph. With no polynomial-time efficient approximation for Max-Cut (unless *P=NP*), its asymptotic for a typical large sparse graph is of considerable interest. We prove that for uniformly random d-regular graph of N vertices, and for the uniformly chosen Erdos-Renyi graph of *M=Nd/2* edges, the leading correction to *M/2 *(the typical cut size), is *P∗sqrt(NM/2)*. Here *P∗* is the ground state energy of the Sherrington-Kirkpatrick model, expressed analytically via Parisi's formula.

This talk is based on a joint work with Subhabrata Sen and Andrea Montanari.

The goal of my talk (based on joint work with Dima Grigoriev, Anatol Kirillov, and Gleb Koshevoy) is to generalize the celebrated Robinson-Schensted-Knuth (RSK) bijection between the set of matrices with nonnegative integer entries, and the set of the planar partitions.

Namely, for any pair of injective valuations on an integral domain we construct a canonical bijection K, which we call the generalized RSK, between the images of the valuations, i.e., between certain ordered abelian monoids.

Given a semisimple or Kac-Moody group, for each reduced word ii=(i_1,...,i_m) for a Weyl group element we produce a pair of injective valuations on C[x_1,...,x_m] and argue that the corresponding bijection K=K_ii, which maps the lattice points of the positive octant onto the lattice points of a convex polyhedral cone in R^m, is the most natural generalization of the classical RSK and, moreover, K_ii can be viewed as a bijection between Lusztig and Kashiwara parametrizations of the dual canonical basis in the corresponding quantum Schubert cell.

Generalized RSKs are abundant in "nature", for instance, any pair of polynomial maps phi,psi:C^m-->C^m with dense images determines a pair of injective valuations on C[x_1,...,x_n] and thus defines a generalized RSK bijection K_{phi,psi} between two sub-monoids of Z_+^m.

When phi and psi are birational isomorphisms, we expect that K_{phi,psi} has a geometric "mirror image", i.e., that there is a rational function f on C^m whose poles complement the image of phi and psi so that the tropicalization of the composition psi^{-1}phi along f equals to K_{phi,psi}. We refer to such a geometric data as a (generalized) geometric RSK, and view f as a "super-potential". This fully applies to each ii-RSK situation, and we find a super-potential f=f_ii which helps to compute K_ii.

While each K_ii has a "crystal" flavor, its geometric (and mirror) counterpart f_ii emerges from the cluster twist of the relevant double Bruhat cell studied by Andrei Zelevinsky, David Kazhdan, and myself.

We consider the fundamental problem of prediction with expert advice where the experts are "optimizable": there is a black-box optimization oracle that can be used to compute, in constant time, the leading expert in retrospect at any point in time. In this setting, we give a novel online algorithm that attains vanishing regret with respect to $N$ experts in total $\sqrt{N}$ computation time. We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is linear in $N$. These results demonstrate an exponential gap between the power of optimization in online learning and its power in statistical learning: in the latter, an optimization oracle---i.e., an efficient empirical risk minimizer---allows to learn a finite hypothesis class of size $N$ in time $\log{N}$.

We also study the implications of our results to learning in repeated zero-sum games, in a setting where the players have access to oracles that compute, in constant time, their best-response to any mixed strategy of their opponent. We show that the runtime required for approximating the minimax value of the game in this setting is $\sqrt{N}$, yielding again a quadratic improvement upon the oracle-free setting, where linear time in $N$ is known to be tight.

Periodically driven systems are of immense interest in plasma physics both from the point of view of plasma confinement as well as plasma heating.

One of the models to explain plasma heating in capacitive RF discharges is Fermi acceleration, which consists of a particle moving in a dynamical billiard with oscillating boundaries. It is well known that the energy growth rate of an ensemble of particles in a strongly chaotic billiard with moving walls is quadratic-in-time whereas it can be exponential-in-time in billiards with multiple ergodic components. Since a real plasma device allows for an exchange of particles with the surroundings, we have now studied Fermi accelerators with a hole (small enough so as not to disturb the statistics). We find that energy gain is significantly higher in a leaky Fermi accelerators with multiple ergodic components and it can be further increased by shrinking the hole size. In the ergodic case, energy gain is found to be independent of the hole size. Work done jointly with V. Gelfreich, V. Rom-Kedar and D. Turaev [Physical Review E 91, 062920 (2015)].

Paul trap is a device used to confine electrons by using time-periodic spatially non-uniform electric fields and a Nobel Prize as awarded for its discovery in 1989. The time-averaged distribution function of plasma in such devices is usually modelled using the concept of an effective potential (ponderomotive theory). For a specific example of the electric field used in Paul traps, we had shown earlier that the exact solutions of the Vlasov equation (collisionless Boltzmann equation) do not agree with solutions obtained by the effective potential approach. Now we have been able to obtain a perturbative solution of the Vlasov equation for a much more general case and find the same discrepancy with conventional theory. These perturbative solutions represent a non-equilibrium steady state and further work needs to be done to understand their statistical evolution. Work done jointly with B. Srinivasan [arXiv:1510.03974].

We study the two-party communication complexity of the geometric problem of finding an approximate Brouwer fixed-point of a composition of two Lipschitz functions g*f, where Alice knows f and Bob knows g.

We prove an essentially tight communication lower bound on this problem, using a novel adaptation of the Raz-McKenzie simulation theorem into geometric settings.

We show that a slightly stronger version of this communication problem would imply an (essentially) tight communication lower bounds on the problem of finding an approximate Nash equilibrium in 2-player (and n-player) games, where each player initially knows only his own payoff matrix.

Joint work with Tim Roughgarden.

We consider random polynomials of degree n whose coefficients are i.i.d. distributed over a finite set of integers, with probability at most 1/2 to take any particular value. We show that the probability that such a polynomial of degree n has a double root is dominated by the probability that 0,1 or -1 are double roots up to an error of o(n^{−2}). Our result generalizes a similar result of Peled, Sen and Zeitouni for Littlewood polynomials.

Joint work with Ron Peled and Arnab Sen.

"Circular inference" is a pejorative coined for methods in which a hypothesis is selected after looking at the data, but the inferential procedures treat it as if it was determined in advance. Unfortunately, many throughput screening experiments in genomics or neuroimaging seek to do exactly this: identify regions (bumps) of high signal in the data **and** evaluate these found regions using the same data. Simple estimators that ignore the selection will be biased; when the data is non-stationary, this bias can vary dramatically between different regions. Nevertheless, methods for evaluating and comparing selected regions are crucial, because typically only a handful of regions can be further explored in tailored follow up studies.

In this talk I describe a new conditional inference approach for characterizing these found regions by estimating their population parameters. Our method explicitly models the selection procedure, and simulates from the conditional distribution to estimate the underlying parameters. Efficient strategies for providing p-value, estimators and intervals will be discussed, as well as power versus accuracy tradeoffs. I will demonstrate the new method for estimating bumps in a comparison of DNA-methylation patterns across tissue type.

This is joint work with Jonathan Taylor and Rafael Irizarry.

In this talk I will consider the problem of local analytic classification of powers of volume forms on manifolds with boundary, i.e. of ordinary volume forms multiplied by the (complex in general) power of a function f, under the action of the group of diffeomorpshims preserving both the boundary and the hypersurface defined by the zero locus of f. In the case where this function defines an isolated boundary singularity in the sense of Arnol'd, I will show how to obtain local normal forms and moduli theorems, analogous to those obtained by Arnol'd, Varchenko, Lando and others for the ordinary, without boundary case. Moreover I will show how these moduli are related to (in fact obtained by) the topological and analytic (Hodge theoretic) invariants of the boundary singularity, such as the relative Picard-Lefschetz monodromy, the relative Brieskorn lattices with their relative Gauss-Manin connection, the relative spectrum and so on, all objects generalising, in the presence of a boundary, the corresponding well known objects already defined for isolated hypersurface singularities.

Online d-dimensional vector packing models many settings such as minimizing resources in data centers where jobs have multiple resource requirements (CPU, Memory, etc.). However, no online d-dimensional vector packing algorithm can achieve a competitive ratio better than d. Fortunately, in many natural applications, vectors are relatively small, and thus the lower bound does not hold. For sufficiently small vectors, an O(log d)-competitive algorithm was known. We improve this to a constant competitive ratio, arbitrarily close to e (where e is the base of the natural logarithm), given that vectors are sufficiently small.

We give improved results for the two dimensional case. For arbitrarily small vectors, the First Fit algorithm for two dimensional vector packing is no better than 2-competitive. We present a natural family of First Fit variants, and for optimized parameters get a competitive ratio of approximately 1.48 for sufficiently small vectors.

We improve upon the 1.48 competitive ratio - not via a First Fit variant - and give a competitive ratio arbitrarily close to 4/3 for packing small, two dimensional vectors. We show that no algorithm can achieve better than a 4/3 competitive ratio for two dimensional vectors, even if one allows the algorithm to split vectors among arbitrarily many bins.

It has long been conjectured that hypothesis spaces suitable for data that is compositional in nature, such as text or images, may be more efficiently represented with deep hierarchical architectures than with shallow ones. Despite the vast empirical evidence, formal arguments to date are limited and do not capture the kind of networks used in practice. Using tensor factorization, we derive a universal hypothesis space implemented by an arithmetic circuit over functions applied to local data structures (e.g. image patches). The resulting networks first pass the input through a representation layer, and then proceed with a sequence of layers comprising sum followed by product-pooling, where sum corresponds to the widely used convolution operator. The hierarchical structure of networks is born from factorizations of tensors based on the linear weights of the arithmetic circuits. We show that a shallow network corresponds to a rank-1 decomposition, whereas a deep network corresponds to a Hierarchical Tucker (HT) decomposition. Log-space computation for numerical stability transforms the networks into SimNets.

In its basic form, our main theoretical result shows that the set of polynomially sized rank-1 decomposable tensors has measure zero in the parameter space of polynomially sized HT decomposable tensors. In deep learning terminology, this amounts to saying that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require an exponential size if one wishes to implement (or approximate) them with a shallow network. Our construction and theory shed new light on various practices and ideas employed by the deep learning community, and in that sense bear a paradigmatic contribution as well.

Joint work with Or Sharir and Amnon Shashua.

The notion of a Lie conformal algebra (LCA) comes from physics, and is related to the operator product expansion. An LCA is a module over a ring of differential operators with constant coefficients, and with a bracket which may be seen as a deformation of a Lie bracket. LCA are related to linearly compact differential Lie algebras via the so-called annihilation functor. Using this observation and the Cartan's classification of linearly compact simple Lie algebras, Bakalov, D'Andrea and Kac classified finite simple LCA in 2000.

I will define the notion of LCA over a ring R of differential operators with not necessarily constant coefficients, extending the known one for R=K[x]. I will explain why it is natural to study such an object and will suggest an approach for the classification of finite simple LCA over arbitrary differential fields.

Much of the theory of mathematical programs for combinatorial optimization can be described in the following way: A polytope of interest has exponentially many (in the dimension) facets, but can be written as the linear projection of a simpler convex body in a higher-dimensional space. Simple might mean a polytope with a much smaller number of facets, or a spectrahedron (the intersection of an affine subspace with the PSD cone) of small dimension. This allows one to optimize linear functionals over the base polytope by instead optimizing a lifted functional over the lifted body.

Unless P=NP, one does not expect certain polytopes--like the convex hull of indicators of traveling salesman tours in a graph--to have a small lift. But it remained open to prove any non-trivial lower bound on the necessary dimension for a spectrahedral lift, i.e. to prove that semi-definite programs do not yield efficient optimization procedures over these polytopes.

We show that the cut, TSP, and stable set polytopes on n-vertex graphs are not the linear image of a spectrahedron of dimension less than exp(n^c) for some constant c > 0. In the process, many interesting phenomena emerge: Factorization of operators through the PSD cone, quantum information theory, discrete Fourier analysis, and real algebraic geometry.

This is based joint work with Prasad Ragahvendra and David Steurer.

It is well-known that Hecke algebras H_q(W) do not have interesting Hopf algebra structures because, first, the only available one would emerge only via an extremely complicated isomorphism with the group algebra of W and, second, this would make H_q(W) into yet another cocommutative Hopf algebra.

The goal of my talk (based on joint work with D. Kazhdan) is to extend each Hecke algebra H_q(W) to a non-cocommutative Hopf algebra (we call it Hecke-Hopf algebra of W) that contains H_q(W) as a coideal.

Our Hecke-Hopf algebras have a number of applications: they generalize Bernstein presentation of Hecke algebras, provide new solutions of quantum Yang-Baxter equation and a large category of endo-functors of H_q(W)-Mod, and suggest further generalizations of Hecke algebras.

In view of the recent huge interest in image classification and object recognition problems and the spectacular success of deep learning and random forests in solving these tasks, it seems astonishing that much more modest efforts are being invested into related, and often more difficult, problems of image and multimodal content-based retrieval, and, more generally, similarity assessment in large-scale databases. These problems, arising as primitives in many computer vision tasks, are becoming increasingly important in the era of exponentially increasing information. Semantic and similarity-preserving hashing methods have recently received considerable attention to address such a need, in part due to their significant memory and computational advantage over other representations.

In this talk, I will overview some of my recent attempts to construct efficient semantic hashing schemes based on deep neural networks and random forests.

Based on joint works with Qiang Qiu, Guillermo Sapiro, Michael Bronstein, and Jonathan Masci.

I will present a 'categorical' way of doing analytic geometry in which analytic geometry is seen as a precise analogue of algebraic geometry. Our approach works for both complex analytic geometry and p-adic analytic geometry in a uniform way. I will focus on the idea of an 'open set' as used in these various areas of math and how it is characterised categorically. In order to do this, we need to study algebras and their modules in the category of Banach spaces. The categorical characterization that we need uses homological algebra in these 'quasi-abelian' categories which is work of Schneiders and Prosmans. In fact, we work with the larger category of Ind-Banach spaces for reasons I will explain. This gives us a way to establish foundations of analytic geometry and to compare with the standard notions such as the theory of affinoid algebras, Grosse-Klonne's theory of dagger algebras (over-convergent functions), the theory of Stein domains and others. I will explain how this extends to a formulation of derived analytic geometry following the relative algebraic geometry approach of Toen, Vaquie and Vezzosi.

This is joint work with Federico Bambozzi (Regensburg) and Kobi Kremnizer (Oxford).

An Oblivious RAM (ORAM), introduced by Goldreich and Ostrovsky (JACM 1996), is a (probabilistic) RAM that hides its access pattern, i.e. for every input the observed locations accessed are similarly distributed. Great progress has been made in recent years in minimizing the overhead of ORAM constructions, with the goal of obtaining the smallest overhead possible.

We revisit the lower bound on the overhead required to obliviously simulate programs, due to Goldreich and Ostrovsky. While the lower bound is fairly general, including the offline case, when the simulator is given the reads and writes ahead of time, it does assume that the simulator behaves in a "balls and bins" fashion. That is, the simulator must act by shuffling data items around, and is not allowed to have sophisticated encoding of the data.

We prove that for the OFFLINE case, showing a lower bound without the above restriction is related to the size of the circuits for sorting. Our proof is constructive, and uses a bit-slicing approach which manipulates the bit representations of data in the simulation. This implies that without obtaining yet unknown superlinear lower bounds on the size of such circuits, we cannot hope to get lower bounds on offline (unrestricted) ORAMs.

What is it that enables learning with multi-layer networks? What causes the network to generalize well? What makes it possible to optimize the error, despite the problem being hard in the worst case? In this talk I will attempt to address these questions and relate between them, highlighting the important role of optimization in deep learning. I will then use the insight to suggest studying novel optimization methods, and will present Path-SGD, a novel optimization approach for multi-layer RELU networks that yields better optimization and better generalization.

Joint work with Behnam Neyshabur, Ryota Tomioka and Russ Salakhutdinov.

We study properties of affine algebras with small Gel'fand-Kirillov dimension, from the points of view of the prime spectrum, gradations and radical theory.

As an application, we are able to prove that Z-graded algebras with quadratic growth, and graded domains with cubic growth have finite (and efficiently bounded) classical Krull dimension; this is motivated by Artin's conjectured geometric classification of non-commutative projective surfaces, and by opposite examples in the non-graded case.

As another application, we prove a graded version of a dichotomy question raised by Braun and Small, between primitive algebras (namely, algebras admitting faithful irreducible representations) and algebras satisfying polynomial identities.

If time permits, we discuss approximations of the well-studied Koethe problem and in particular prove a stability result for certain radicals under suitable growth conditions.

We finally propose further questions and possible directions, which already stimulated new constructions of monomial algebras.

This talk is partially based on a joint work with A. Leroy, A. Smoktunowicz and M. Ziembowski.

The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence that for a given text and pattern, one cannot compute the exact Hamming distance for all locations in the text in time which is polynomially less than o(n\sqrt m). Nevertheless, Karloff showed that if one is willing to suffer a 1+-\epsilon approximation, then it is possible to solve the problem with high probability in O~(n / \epsilon^2) time.

Due to related lower bounds for computing the Hamming distance of two strings in the one-way communication complexity model, it is strongly believed that obtaining an algorithm for solving the approximation version cannot be done much faster as a function of 1 / \epsilon. We will show that this belief is false by introducing a new O~(n / \epsilon) time algorithm that succeeds with high probability.

The main idea behind our algorithm, which is common in sparse recovery problems, is to reduce the variance of a specific randomized experiment by (approximately) separating heavy hitters from non-heavy hitters. However, while known sparse recovery techniques work very well on vectors, they do not seem to apply here, where we are dealing with mismatches between pairs of characters. We introduce two main algorithmic ingredients. The first is a new sparse recovery method that applies for pair inputs (such as in our setting). The second is a new construction of hash/projection functions, which allows to count the number of projections that induce mismatches between two characters exponentially faster than brute force. We expect that these algorithmic techniques will be of independent interest.

We show a 2^{n+o(n)}-time algorithm for the Shortest Vector Problem on n-dimensional lattices (improving on the previous best-known algorithm of Micciancio and Voulgaris, which runs in time 4^{n+o(n)}). The algorithm uses the elementary yet powerful observation that, by properly combining samples from a Gaussian distribution over the lattice, we can produce exact samples from a narrower Gaussian distribution on the lattice. We use such a procedure repeatedly to obtain samples from an arbitrarily narrow Gaussian distribution over the lattice, allowing us to find a shortest vector.

Both the algorithm and the analysis are quite simple in hindsight. (The main technical tool is an identity on Gaussian measures with a simple geometric proof originally due to Riemann.) If time allows and interest merits, we will discuss a more technical variant of this algorithm that solves the Closest Vector Problem (a seemingly harder problem) in the same asymptotic running time.

Based on joint work with Divesh Aggarwal, Daniel Dadush, and Oded Regev. (See http://arxiv.org/abs/1412.7994 and http://arxiv.org/abs/1504.01995.)

The Catalan numbers form a sequence of integers C_t. A collection of sets H_t with |H_t|= C_t for all t is called a Catalan set. Many examples of Catalan sets are known; the triangulations of the (t+2)-gon, the Dyck paths from (0,0) to (0, 2t) and the nilpotent ideals in the Borel subalgebra of sl_t to name but a few. In my talk I will present a new example of a Catalan set, which has a remarkable property: for all t, H_t decomposes into a (non-disjoint) union of C_{t-1} distinct subsets each of cardinality 2^{t-1}. Moreover, one may define certain interesting labelled graphs for H_t and obtain the above decomposition in a natural way. The subgraphs corresponding to the aforementioned subsets are labelled hypercubes with some edges missing. The motivation of this work was the study of the additive structure of the Kashiwara crystal B(infty).

We say that a system possesses a mixed dynamics if

1) it has infinitely many hyperbolic periodic orbits of all possible types (stable, unstable,saddle) and

2) the closures of the sets of orbits of different types have nonempty intersections.

Recall that Newhouse regions are open domains (from the space of smooth dynamical systems) in which systems with homoclinic tangencies are dense. Newhouse regions in which systems with mixed dynamics are generic (compose residual subsets) are called *absolute Newhouse regions* or *Newhouse regions with mixed dynamics*. Their existence was proved in the paper [1] for the case of 2d diffeomorphisms close to a diffeomorphism with a nontransversal heteroclinic cycle containing two fixed (periodic) points with the Jacobians less and greater than 1. Fundamentally, that "mixed dynamics" is the universal property of reversible chaotic systems. Moreover, in this case generic systems from absolute Newhouse regions have infinitely many stable, unstable, saddle and symmetric elliptic periodic orbits [2,3].

As well-known, reversible systems are often met in applications and they can demonstrate a chaotic orbit behavior. However, the phenomenon of mixed dynamics means that this chaos can not be associated with "strange attractor" or "conservative chaos". Attractors and repellers have here a nonempty intersection containing symmetric orbits (elliptic and saddle ones) but do not coincide, since periodic sinks (sources) do not belong to the repeller (attractor). Therefore, " mixed dynamics" should be considered as a new form of dynamical chaos posed between "strange attractor" and "conservative chaos".

These and related questions are discussed in the talk. Moreover, the main attention here is paid to the development of the concept of mixed dynamics for two-dimensional reversible maps. The main elements of this concept are presented in section below.

[1] S.V. Gonchenko, L.P. Shilnikov, D.V. Turaev. On Newhouse regions of two-dimensional diffeomorphisms close to a diffeomorphism with a nontransversal heteroclinic cycle. Proc. Steklov Inst. Math., 216 (1997), 70-118.

[2] Lamb J.S.W. and Stenkin O.V. Newhouse regions for reversible systems with infinitely many stable, unstable and elliptic periodic orbits Nonlinearity, 2004, 17(4), 1217-1244.

[3] Delshams A., Gonchenko S.V., Gonchenko V.S., Lazaro J.T. and Sten'kin O.V. "Abundance of attracting, repelling and elliptic orbits in two-dimensional reversible maps".- Nonlinearity, 2013, v.26(1), 1-35.

In this talk, we study Demazure modules which occur in a level l irreducible integrable representation of an untwisted affine Lie algebra. We also assume that they are stable under the action of the standard maximal parabolic subalgebra of the affine Lie algebra. We prove that such a module is isomorphic to the fusion product of "prime" Demazure modules, where the prime factors are indexed by dominant integral weights which are either a multiple of l or take value less than l on all simple coroots. Our proof depends on a technical result which we prove in all the classical cases and G_2. We do not need any assumption on the underlying simple Lie algebra when the last "prime" factor is too small. This is joint work with Vyjayanthi Chari, Peri Shereen and Jeffrey Wand.

The Hamming and the edit metrics are two common notions of measuring distances between pairs of strings x,y lying in the Boolean hypercube. The edit distance between x and y is defined as the minimum number of character insertion, deletion, and bit flips needed for converting x into y. Whereas, the Hamming distance between x and y is the number of bit flips needed for converting x to y.

In this paper we study a randomized injective embedding of the edit distance into the Hamming distance with a small distortion. This question was studied by Jowhari (ESA 2012) and is mainly motivated by two questions in communication complexity: the document exchange problem and deciding edit distance using a sketching protocol.

We show a randomized embedding with quadratic distortion. Namely, for any $x,y$ satisfying that their edit distance equals $k$, the Hamming distance between the embedding of $x$ and $y$ is $O(k^2)$ with high probability. This improves over the distortion ratio of $O(\log n \log^* n)$ obtained by Jowhari for small values of $k$. Moreover, the embedding output size is linear in the input size and the embedding can be computed using a single pass over the input.

19th century mathematicians (Gauss, Riemann, Markov, to name a few) spent a lot of their time doing tedious numerical computations. Sometimes they were assisted by (human) computers, but they still did a lot themselves. All this became unnecessary with the advent of computers, who made number-crunching million times faster (and more reliable).

20th- and 21st- century mathematicians spent (and still spend) a lot of their time doing tedious symbolic computations. Thanks to the more recent advent of Computer Algebra Systems (e.g. Maple, Mathematica, and the free system SAGE), much of their labor can be delegated to computers, who, of course, can go much faster, much further, and more reliably.

But humans are still needed! First, to teach the computer how to crunch symbols efficiently, but, just as importantly, to inspire them to formulate general conjectures, and methods of proof, for which humans are (still) crucial. I will mention several examples, most notably, a recent proof, by (the human) Guillaume Chapuy, of a conjecture made with the help of my computer Shalosh B. Ekhad (who rigorously proved many special cases), generalizing, to multi-permutations, Amitai Regev's celebrated asymptotic formula for the number of permutations of length n avoiding an increasing subsequence of length d.

Cancer is a highly complex aberrant cellular state where mutations impact a multitude of signalling pathways operating in different cell types. In recent years it has become apparent that in order to understand and fight cancer, it must be viewed as a system, rather than as a set of cellular activities. This mind shift calls for new techniques that will allow us to investigate cancer as a holistic system. In this talk, I will discuss some of the progress made towards achieving such a system-level understanding using computer modelling of biological behaviours, also known as Executable Biology. I will concentrate on our recent attempts to better understand cancer through the following examples: 1) drug target optimization for Chronic Myeloid Leukaemia using an innovative platform called BioModelAnalyzer, which allows to prove stabilization of biological systems; 2) dynamic hybrid modelling of Glioblastoma (brain tumour) development; 3) state-based modelling of cancer signalling pathways and their analysis using model-checking; and 4) synthesis of blood stem cell programs from single-cell gene expression data. Looking forward, inspired by David Harel’s Grand Challenge proposed a decade ago, I will propose a smaller grand challenge for computing and biology that could shed new light on our ability to control cell fates during development and disease and potentially change the way we treat cancer in the future.

The voter model on $\Z^d$ is a particle system that serves as a rough model for changes of opinions among social agents or, alternatively, competition between biological species occupying space. When $d \geq 3$, the set of (extremal) stationary distributions is a family of measures $\mu_\alpha$, for $\alpha$ between 0 and 1. A configuration sampled from $\mu_\alpha$ is a strongly correlated field of 0's and 1's on $\Z^d$ in which the density of 1's is $\alpha$.

We consider such a configuration as a site percolation model on $\Z^d$. We prove that if $d \geq 5$, the probability of existence of an infinite percolation cluster of 1's exhibits a phase transition in $\alpha$. If the voter model is allowed to have sufficiently spread-out interactions, we prove the same result for $d \geq 3$.

These results partially settle a conjecture of Bricmont, Lebowitz and Maes (1987).

Joint work with Daniel Valesin (University of Groningen)

We consider two stationary versions of the Eden model, on the upper half planar lattice, resulting in an infinite forest covering the half plane. Under weak assumptions on the weight distribution and by relying on ergodic theorems, we prove that almost surely all trees are finite. Using the mass transport principle, we generalize the result to Eden model in graphs of the form $G \times Z$, where G is a Cayley graph. This generalizes certain known results on the two-type Richardson model, in particular of Deijfen and Häggström in 2007.

Several years ago I introduced Newton polytopes related to the potential counterexamples to the JC. This approach permitted to obtain some additional information which though interesting, was not sufficient to get a contradiction. It seems that a contradiction can be obtained by comparing Newton polytopes for the left and right side of a (somewhat mysterious) equality G_x=-y_F.

In this talk, I will discuss the backward-in-time behaviors of several nonlinear parabolic and dissipative evolution equations. This study is motivated by the investigation of the Bardos-Tartar conjecture on the 2D Navier-Stokes equations. Besides the rigorous mathematical treatment, we provide physical interpretation of the mechanism of singularity formulation, backward in time, for perturbations of the KdV equation. Finally, I will present the connection between the backward behavior and the energy spectra of the solutions.

This is a joint work with E. S. Titi.

In this talk we will implement the notion of finite number of determining parameters for the long-time dynamics of the Navier-Stokes equations (NSE), such as determining modes, nodes, volume elements, and other determining interpolants, to design finite-dimensional feedback control for stabilizing their solutions. The same approach is found to be applicable for data assimilation of weather prediction. In addition, we will show that the long-time dynamics of the NSE can be imbedded in an infinite-dimensional dynamical system that is induced by an ordinary differential equations, named *determining form*, which is governed by a globally Lipschitz vector field. The NSE are used as an illustrative example, and all the above mentioned results equally hold to other dissipative evolution PDEs.

This is a joint work with A. Azouani, H. Bessaih, A. Farhat, C. Foias, M. Jolly, R. Kravchenko, E. Lunasin and E. Olson.

Let M be a closed Riemannian manifold. There are numerous results that establish the existence of various minimal objects on M, such as periodic geodesics, minimal surfaces, or geodesic nets. We will present some effective versions of these existence theorems.

For example, we will present diameter upper bounds for the lengths of three shortest simple periodic geodesics on a Riemannian 2-sphere, which can be interpreted as an effective version of the existence theorem of Lusternik and Schnirelmann. (Joint with Y. Liokumovich and A. Nabutovsky).

Finding upper bounds for the size of smallest stationary objects is closely related with construction of "optimal" homotopies. We will show that if M is a closed surface of diameter d (with or without boundary), then any simple closed curve on M that can be contracted to a point over free loops of length less than L, can be contracted over based loops of length at most 3L+2d. (Joint with G. Chambers).

Assortment planning is a major operational issue that arises in many industries, such as retailing, airlines and consumer electronics. Given a set of products that are differentiated by price, quality and possibly other attributes, one has to decide on the subset of products and the respective quantities that will be stocked and offered to heterogeneous customers, who exhibit substitution behavior.

The general problem can be shown to be NP-hard to approximate better than a factor linear in the number of products. In this talk we discuss how for a range of practically interesting special cases, one could design conceptually simple policies that admit provably near-optimal solutions. The analysis reveals interesting structural properties, including hidden submodularity and decomposition properties.

The talk is based on several papers which are Joint work with Ali Aouad, Vineet Goyal and Danny Segev

We will talk on the validity of the mean ergodic theorem along left Følner sequences in a countable amenable group G. Although the weak ergodic theorem always holds along any left Følner sequence in G, we will provide examples where the mean ergodic theorem fails in quite dramatic ways. On the other hand, if G does not admit any ICC quotients, e.g. if G is virtually nilpotent, then we will prove that the mean ergodic theorem does indeed hold along any left Følner sequence. Based on the joint work with M. Björklund (Chalmers).

We discuss the collective dynamics of systems driven by the “social engagement” of agents with their local neighbors. Canonical models are based on environmental averaging, with prototype examples in opinion dynamics, flocking, self-organization of biological organisms, and rendezvous in mobile networks. The large time behavior of such systems leads to the formation of clusters, and in particular, the emergence of “consensus of opinions”.

We propose an alternative paradigm, arguing that in many relevant scenarios social interactions involve the tendency of agents “to move ahead”. We introduce a new family of models for collective dynamics with tendency. The large time behavior of these new systems leads to the emergence of “leaders”.

The theory of structure and pseudo-randomness has been very influential in several areas of mathematics, such as number theory, graph theory and harmonic analysis. It is also been influential in theoretical computer science, with applications in complexity theory, cryptography and property testing. At a high level, it allows to analyze arbitrary objects by decomposing them to a "structural" component and a "pseudo-random" component. The pseudo-random component behaves in many ways like random noise, while the structural component has a concise representation which makes it amenable to analysis and algorithmic manipulation.

In this talk, I will describe applications of this paradigm to coding theory. I will describe a new general approach to list decoding, which follows by decomposing an arbitrary received word to a structural received word and pseudo-random noise. This allows for a simplified analysis of the list decoding problem. In particular, I will describe how this approach leads to a resolution of a conjecture by Gopalan, Klivans and Zuckerman [STOC 2008], that the list decoding radius of Reed-Muller codes (in certain regimes) is equal to the minimal distance of the code.

Based on joint work with Abhishek Bhowmick.

Conventional cameras record all light falling onto their sensor regardless of the path that light followed to get there. In this talk I will present an emerging family of video cameras that can be programmed to record just a fraction of the light coming from a controllable source, based on the actual 3D path followed. Live video from these cameras offers a very unconventional view of our everyday world in which refraction and scattering can be selectively blocked or enhanced, visual structures too subtle to notice with the naked eye can become apparent, and object appearance can depend on depth.

I will discuss the unique optical properties and power efficiency of these "transport-aware" cameras, as well as their use for 3D shape acquisition, robust time-of-flight imaging, material analysis, and scene understanding. Last but not least, I will discuss their potential to become our field's "outdoor Kinect" sensor---able to operate robustly even in direct sunlight with very low power.

Kyros Kutulakos is a Professor of Computer Science at the University of Toronto. He received his PhD degree from the University of Wisconsin-Madison in 1994 and his BS degree from the University of Crete in 1988, both in Computer Science. In addition to the University of Toronto, he has held appointments at the University of Rochester (1995-2001) and Microsoft Research Asia (2004-05 and 2011-12). He is the recipient of an Alfred P. Sloan Fellowship, an Ontario Premier's Research Excellence Award, a Marr Prize in 1999, a Marr Prize Honorable Mention in 2005, and three other paper awards (CVPR 1994, ECCV 2006, CVPR 2014). He also served as Program Co-Chair of CVPR 2003, ICCP 2010 and ICCV 2013.

There are at least 2 aspects to learning: predicting the outcome of unseen events, and finding simple explanations of observed systems. We shall discuss 2 formal abstractions of these aspects: PAC learning and sample compression schemes. We shall start with an introduction to these notions, and then discuss the equivalence between them.

Based on a joint project with Shay Moran, Amir Shpilka and Avi Wigderson.

Consider a random coloring of a bounded domain in *Z ^{d}* with the probability of each coloring

*F*proportional to

*exp(−β∗N(F))*, where

*β>0*is a parameter (representing the inverse temperature) and

*N(F)*is the number of nearest neighboring pairs colored by the same color. This is the anti-ferromagnetic 3-state Potts model of statistical physics, used to describe magnetic interactions in a spin system. The Kotecký conjecture is that in such a model, for

*d≥3*and high enough

*β*, a sampled coloring will typically exhibit long-range order, placing the same color at most of either the even or odd vertices of the domain. We give the first rigorous proof of this fact for large

*d*. This extends previous works of Peled and of Galvin, Kahn, Randall and Sorkin, who treated the case

*β=∞*.

The main ingredient in our proof is a new structure theorem for 3-colorings which characterizes the ways in which different "phases" may interact, putting special emphasis on the role of edges connecting vertices of the same color. We also discuss several related conjectures. No background in statistical physics will be assumed and all terms will be explained thoroughly.

Joint work with Ohad Feldheim.

We generalize the technique of smoothed analysis to distributed algorithms in dynamic network models. Whereas standard smoothed analysis studies the impact of small random perturbations of input values on algorithm performance metrics, dynamic graph smoothed analysis studies the impact of random perturbations of the underlying changing network graph topologies. Similar to the original application of smoothed analysis, our goal is to study whether known strong lower bounds in dynamic network models are robust or fragile: do they withstand small (random) perturbations, or do such deviations push the graphs far enough from a precise pathological instance to enable much better performance? Fragile lower bounds are likely not relevant for real-world deployment, while robust lower bounds represent a true difficulty caused by dynamic behavior. We apply this technique to three standard dynamic network problems with known strong worst-case lower bounds: random walks, flooding, and aggregation. We prove that these bounds provide a spectrum of robustness when subjected to smoothing---some are extremely fragile (random walks), some are moderately fragile / robust (flooding), and some are extremely robust (aggregation).

Joint work with Jeremy Fineman (Georgetown), Seth Gilbert (National University of Singapore), and Calvin Newport (Georgetown).

Many scientific and engineering problems are challenged by the fact they involve functions of a very large number of variables. Such problems arise naturally in signal recovery, image processing, learning theory, etc. In addition to the numerical difficulties due to the so-called curse of dimensionality, the resulting optimization problems are often nonsmooth and nonconvex.

We shall survey some of our recent results, illustrating how these difficulties may be handled in the context of well-structured optimization models, highlighting the ways in which problem structures and data information can be beneficially exploited to devise and analyze simple and efficient algorithms.

Consider a one-dimensional semi-infinite system of Brownian particles, starting at Poisson (L) point process on the positive half-line, with the left-most (Atlas) particle endowed a unit drift to the right. We show that for the equilibrium density (L=2), the asymptotic Gaussian space-time particle fluctuations are governed by the stochastic heat equation with Neumann boundary condition at zero. As a by product we resolve a conjecture of Pal and Pitman (2008) about the asympotic (random) fBM trajectory of the Atlas particle.

In a complementary work, we derive and explicitly solve the Stefan (free-boundary) equations for the limiting particle-profile when starting at out of equilibrium density (L other than 2). We thus determine the corresponding (non-random) asymptotic trajectory of the Atlas particle.

This talk is based on joint works with Li-Cheng Tsai, Manuel Cabezas, Andrey Sarantsev and Vladas Sidoravicius.

The Ising model is an archetypical model of the order-disorder phase transition: though simple to formulate, it exhibits a complex behavior, much like the real-world phenomena in solid-state physics, ferromagnetism, chemistry, biology, computer science.

In 1920 the physicist Wilhelm Lenz proposed to model ferromagnetic materials by a lattice of plus-minus spins with interacting neighbors. His student Ernst Ising solved the model in dimension one four years later. The one-dimensional behavior turned out to be trivial, with no phase transition when the interaction strength changes, and for a decade people searched for other possible models. However, a ferromagnetic phase transition was established by Rudolf Peierls in higher dimensions, and in 1944 Lars Onsager famously calculated the free energy for the Ising model in two dimensions.

Since then the Ising model became widely studied, as it exhibits complicated phase transition behavior, yet allows a very detailed analysis in dimension two. We will give a historical introduction to the model, and then describe some recent results.

Let *Q* be a Stein space and *L* a complex Lie group. Then Grauert's Oka Principle states that the canonical map of the isomorphism classes of holomorphic principle *L*-bundles over *Q* to the isomorphism classes of topological principle *L*-bundles over *Q* is an isomorphism. In particular he showed that if *P*, *P'* are holomorphic principle *L*-bundles and a topological isomorphism, then there is a homotopy of topological isomorphisms with and a holomorphic isomorphism.

Let *X* and *Y* be Stein *G*-manifolds where *G* is a reductive complex Lie group. Then there is a quotient Stein space Q_{X}, and a morphism such that . Similarly we have .

Suppose that is a *G*-biholomorphism. Then the induced mapping has the following property: for any , is *G*-isomorphic to (the fibers are actually affine *G*-varieties). We say that is admissible. Now given an admissible , assume that we have a *G*-equivariant homeomorphism lifting . Our goal is to establish an Oka principle, saying that has a deformation with and biholomorphic.

We establish this in two main cases. One case is where is a diffeomorphism that restricts to *G*-isomorphisms on the reduced fibers of and . The other case is where restricts to *G*-isomorphisms on the fibers and *X* satisfies an auxiliary condition, which usually holds. Finally, we give applications to the Holomorphic Linearization Problem. Let *G* act holomorphically on . When is there a change of coordinates such that the action of *G* becomes linear? We prove that this is true, for *X* satisfying the same auxiliary condition as before, if and only if the quotient Q_{X} is admissibly biholomorphic to the quotient of a *G*-module *V*.

The subject of this talk is the problem of estimating service time distribution of the $M/G/\infty$ queue from incomplete data on the queue. The goal is to estimate $G$ from observations of the queue--length process at the points of the regular grid on a fixed time interval. We propose an estimator and analyze its accuracy over a family of target service time distributions. The original $M/G/\infty$ problem is closely related to the problem of estimating derivatives of the covariance function of a stationary Gaussian process. We consider the latter problem and derive lower bounds on the minimax risk. The obtained results strongly suggest that the proposed estimator of the service time distribution is rate optimal.

Let *G* be a connected reductive algebraic group defined over a field *k* of characteristic not 2, an involution of *G* defined over *k*, *H* a *k*-open subgroup of the fixed point group of and G_{k} (resp. H_{k}) the set of *k*-rational points of *G* (resp. *H*). The homogeneous space X_{k}:=G_{k}/H_{k} is a generalization of a real reductive symmetric space to arbitrary fields and is called a generalized symmetric space.

Orbits of parabolic *k*-subgroups on these generalized symmetric spaces occur in various situations, but are especially of importance in the study of representations of G_{k} related to X_{k}. In this talk we present a number of structural results for these parabolic *k*-subgroups that are of importance for the study of these generalized symmetric space and their applications.

Consider a random finite metric space X given by sampling n points in the unit interval uniformly, and a deterministic finite metric space U given by placing n points in the unit interval at uniform distance. With high probability, X will contain some pairs of points at distance roughly 1/n^2, so any bijection from X to U must distort distances by a factor of roughly n. However, with high probability, two of these random spaces, X_1 and X_2, have a bijection which distorts distances by a factor of only about n^2/3. The exponent of 2/3 is optimal.

Given an input $x$, and a search problem $F$, local computation algorithms (LCAs) implement access to specified locations of $y$ in a legal output $y \in F(x)$, using polylogarithmic time and space. Previous work on LCAs restricted its focus to graphs of bounded degree, or degree of bounded expectation that is distributed binomially.

Using a new palette of techniques, we show that it is possible to obtain LCAs for maximal independent set (MIS) and maximal matching (MM) on trees with degree bounded by $\polylog{n}$. Our result immediately extends to all graphs with degree bounded by $\polylog{n}$, as long as they do not contain short cycles (of length $\polylog{n}$).

We define a family of graphs called $d$-light graphs, and show how to convert a large class of online algorithms (including MIS and MM) to LCAs on these graphs. We then show that applying the MIS (or MM) LCA on appropriately selected $d$-light subgraphs, allows us to quickly address all of the vertices of the $\polylog{n}$-degree graph.

In addition to expanding the range of LCAs to graphs of polylogarithmic degree, our new techniques also significantly improve the running times and space requirements, expand the family of graphs, and better define the family of algorithms to which previous results apply. Furthermore our proofs are simpler and more concise than the previous proof methods.

Joint work with Omer Reingold.

We associate to a full flag F in an n-dimensional variety X over a field k, a "symbol map" $\mu_F :K(F_X) \to \Sigma^n K(k)$. Here, F_X is the field of rational functions on X, and K(.) is the K-theory spectrum.

We prove a "reciprocity law" for these symbols: Given a partial flag, the sum of all symbols of full flags refining it is 0. Examining this result on the level of K-groups, we derive the following known reciprocity laws: the degree of a principal divisor is zero, the Weil reciprocity law, the residue theorem, the Contou-Carrère reciprocity law (when X is a smooth complete curve) as well as the Parshin reciprocity law and the higher residue reciprocity law (when

X is higher-dimensional).

This is a joint work with Evgeny Musicantov.

The interplay between **algorithms** and **proofs** has been one of the most fascinating and fruitful themes in theoretical Computer Science. In this talk I will describe a different connection between these two concepts - a general methodology for algorithm design using an automatic transformation of a proof into an algorithm. This methodology yields a systematic way to design and analyze algorithms across many domains. In particular we and others have used it for problems in combinatorial optimization, machine learning, and quantum information theory, and it shows promise for making progress on important open questions such as settling Khot's Unique Games Conjecture. I will demonstrate this methodology by presenting an algorithm for the Sparse Coding / Dictionary Learning problem that handles much more general inputs than was known before, and an algorithm for the Unique Games problem that can solve all the previously known candidate hard instances. Key to our approach is the notion of "pseudo-distributions", which are objects that are emphatically different than actual distributions but behave like them in the eyes of low degree polynomials. We use these pseudo-distributions to "lift" a proof into an algorithm via the Shor-Parrilo-Lasserre "Sum-of-Squares" semidefinite programming hierarchy. The talk will be based on several joint works with (varying subsets of) Fernando Brandao, Aram Harrow, Jonathan Kelner, David Steurer and Yuan Zhou.

Computational anatomy considers spaces of shapes (e.g. medical images) endowed with a Riemannian metric (Sobolev type). The area blends techniques from differential geometry (geometric mechanics), analysis and statistics. The EPDiff equation, which is basically an extension of Euler's equation, without the incompressibility assumption, is often used to match shapes. Time-varying images (4DCA), one of the current research themes in the area. In longitudinal studies (say for Alzheimer's disease) snapshots at given times are interpolated/regressed. The problem arises of comparing two such sequences for classification purposes. For some background, see the recent workshop http://www.mat.univie.ac.at/~shape2015/schedule.html .

In this talk we discuss finite dimensional examples using landmarks. A short process is interpreted as a tangent vector in the space of images. This leads to control problems whose state space is a tangent bundle. Usually, cubic Riemannian splines are taken, ie., minimizing the norm of the acceleration vector, for paths connecting two tangent vectors under a fixed time. We propose as an alternative to cubic splines the time minimal problem under bounded acceleration (morally the norm). We suggest that both splines problems on are completely integrable in the Arnold-Liouville sense. Along the way, we present general technical results about the underlying symplectic structures of control problems whose state space has a bundle structure.

This is joint ongoing work with Paula Balseiro, Alejandro Cabrera, and Teresa Stuchi.

In many settings, people exhibit behavior that is inconsistent across time — we allocate a block of time to get work done and then procrastinate, or put effort into a project and then later fail to complete it. An active line of research in behavioral economics and related fields has developed and analyzed models for this type of time-inconsistent behavior.

Here we propose a graph-theoretic model of tasks and goals, in which dependencies among actions are represented by a directed graph, and a time-inconsistent agent constructs a path through this graph. We first show how instances of this path-finding problem on different input graphs can reconstruct a wide range of qualitative phenomena observed in the literature on time-inconsistency, including procrastination, abandonment of long-range tasks, and the benefits of reduced sets of choices. We then explore a set of analyses that quantify over the set of all graphs; among other results, we find that in any graph, there can be only polynomially many distinct forms of time-inconsistent behavior; and any graph in which a time-inconsistent agent incurs significantly more cost than an optimal agent must contain a large “procrastination” structure as a minor. Finally, we use this graph-theoretic model to explore ways in which tasks can be designed to help motivate agents to reach designated goals.

This is joint work with Jon Kleinberg.

The goal in this work is to produce ‘full interpretation’ for object images, namely to identify and localize all semantic features and parts that are recognized by human observers. We develop a novel approach and tools to study this challenging task, by dividing the interpretation of the complete object to interpretation of so-called 'minimal recognizable configurations’, namely severely reduced but recognizable local regions, that are minimal in the sense that any further reduction would turn them unrecognizable. We show that for the task of full interpretation, such minimal images have unique properties, which make them particularly useful.

For modeling interpretation, we identify primitive components and relations that play a useful role in the interpretation of minimal images by humans, and incorporate them in a structured prediction algorithm. The structure elements can be point, contour, or region primitives, while relations between them range from standard unary and binary potentials based on relative location, to more complex and high dimensional relations. We show experimental results and match them to human performance. We discuss implications of ‘full’ interpretation for difficult visual tasks, such as recognizing human activities or interactions.

For germs of holomorphic functions $f : \mathbf{C}^{m+1} \to \mathbf{C}$, $g : \mathbf{C}^{n+1} \to \mathbf{C}$ having an isolated critical point at 0 with value 0, the classical Thom-Sebastiani theorem describes the vanishing cycles group $\Phi^{m+n+1}(f \oplus g)$ (and its monodromy) as a tensor product $\Phi^m(f) \otimes \Phi^n(g)$, where $(f \oplus g)(x,y) = f(x) + g(y), x = (x_0,...,x_m), y = (y_0,...,y_n)$. I will discuss algebraic variants and generalizations of this result over fields of any characteristic, where the tensor product is replaced by a certain local convolution product, as suggested by Deligne. The main theorem is a Künneth formula for $R\Psi$ in the framework of Deligne's theory of nearby cycles over general bases.

We consider the following problem: given a set of algebraic conditions on an $n$-tuple of functions and their first $l$ derivatives, admitting finitely many solutions (in a differentiably closed field), can one give an upper bound for the number of solutions?

I will present estimates in terms of the degrees of the algebraic conditions, or more generally the volumes of their Newton polytopes (analogous to the Bezout and BKK theorems). The estimates are singly-exponential with respect to $n,l$ and have the natural asymptotic with respect to the degrees or Newton polytopes, sharpening previous doubly-exponential estimates due to Hrushovski and Pillay. I will also discuss some diophantine applications to counting transcendental lattice points on algebraic varieties.

I will describe recent applications of "birthday repetition" that can give (conditional) quasi-polynomial time hardness results for Densest k-Subgraph and for $\epsilon$-Nash in 2-player games. (For the latter result we use a non-standard "PCP for PPAD" assumption which I will also discuss.) Both results are tight by [FS97], [LMM03], [Barman15].

Based on:

DkS- http://eccc.hpi-web.de/report/2015/074/

(joint work with Mark Braverman, Young Kun Ko, and Omri Weinstein)

Nash- http://arxiv.org/abs/1504.02411

(joint work with Yakov Babichenko and Christos Papadimitriou)

In the famous KKL (Kahn-Kalai-Linial) paper of 1988 the authors "imported" to combinatorics and theoretical computer science a hypercontractive inequality known as Beckner's ineqaulity (proven first, independently, by Gross and Bonami). This inequality has since become an extremely useful and influential tool, used in tens of papers, in a wide variety of settings. In many cases there are no proofs known that do not use the inequality.

In this talk I'll try to illuminate the information theoretic nature of both the inequality and its dual, touch upon a proof of the dual version from about a decade ago, (joint with V. Rodl), and a fresh (and unrelated) information theoretic proof of the primal version.

No prior knowledge will be assumed regarding discrete Fourier analysis, Entropy, and hypercontractivity.

Research can be viewed as a search of an accepting-state in an exponential-space. When we look in hind-sight after we find an accepting-state, can we identify a much shorter path than the one that was discovered by trial and error as the search actually proceeded?

In this talk I'll show that this is the case for Distributed-Computability: The discovery that different distributed problems have different levels of difficulty, and identifying the weakest model of distributed-computation that allows to solve a problem. I'll explain the essence of 40 years of research in an hour, by showing that if the right questions were asked at the right time, all the results could have been had in a span of time order-of-magnitude shorter.

Some of the major ideas in the talk were developed in works with Afek (TAU), and Borowsky (Akamai), Lynch (MIT), and Rajsbaum (UNAM).

A common approach to statistical learning on big data is to randomly split it among m machines and calculate the parameter of interest by averaging their m individual estimates.

Focusing on empirical risk minimization, or equivalently M-estimation, we study the statistical error incurred by this strategy.

We consider two asymptotic settings: one where the number of samples per machine n->inf but the number of parameters p is fixed, and a second high-dimensional regime where both p,n-> inf with p/n-> kappa.

Most previous works provided only moment bounds on the error incurred by splitting the data in the fixed p setting. In contrast, we present for both regimes asymptotically exact distributions for this estimation error. In the fixed-p setting, under suitable assumptions, we thus prove that to leading order, averaging is as accurate as the centralized solution. In the high-dimensional setting, we show a qualitatively different behavior: data splitting does incur a first order accuracy loss, which we quantify precisely. In addition, our asymptotic distributions allow the construction of confidence intervals and hypothesis testing on the estimated parameters.

Our main conclusion is that in both regimes, averaging parallelized estimates is an attractive way to speedup computations and save on memory, while incurring a quantifiable and typically moderate excess error.

The spectral flow is a well-known invariant of a 1-parameter family of self-adjoint Fredholm operators. It is defined as the net number of operator’s eigenvalues passing through 0 with the change of parameter.

Let S be a compact surface with non-empty boundary. Consider the space Ell(S) of first order self-adjoint elliptic differential operators on S with local boundary conditions. The first part of the talk is devoted to the computing of the spectral flow along loops in Ell(S), and also along paths with conjugated ends.

After that we consider more general situation: a family of elements of Ell(S) parameterized by points of a compact space X. We define the topological index of such a family and show that it coincides with the analytical index of the family. Both indices take value in K^1(X). When X is a circle, this result turns into the formula for the spectral flow from the first part of the talk.

The PCP theorem (AS,ALMSS 1991) guarantees that every NP language has a Probabilistically Checkable Proof (PCP) system allowing a verifier to check a witness very efficiently using randomness, and allowing for small error.

Most of the talk will not assume prior knowledge, but I will also devote some time to some recent work joint with Harsha and Kindler.

In this work we make (some) progress towards proving the so-called "sliding-scale conjecture". This is a conjecture of BGLR from 1993 about the tradeoff between the number of bits read from the PCP proof and the error of the verifier.

Our work revisits older constructions and analyzes them using the more modern "modular-composition" approach.

Based on joint work with Prahladh Harsha and Guy Kindler.

Branching processes have been subject of intense and fascinating studies for a long time. In my talk I will present two problems in order to highlight their rich structure and various technical approaches in the intersection of probability and analysis.

Firstly, I will present results concerning a branching random walk with the time-inhomogeneous branching law. We consider a system of particles, which at the end of each time unit produce offspring randomly and independently. The branching law, determining the number and locations of the offspring is the same for all particles in a given generation. Until recently, a common assumption was that the branching law does not change over time. In a pioneering work, Fang and Zeitouni (2010) considered a process with two macroscopic time intervals with different branching laws. In my talk I will present the results when the branching law varies at mesoscopic and microscopic scales. In arguably the most interesting case, when the branching law is sampled randomly for every step, I will present a quenched result with detailed asymptotics of the maximal particle. Interestingly, the disorder has a slowing-down effect manifesting itself on the log level.

Secondly, I will turn to the classical branching Brownian motion. Let us assume that particles move according to a Brownian motion with drift μ and split with intensity 1. It is well-know that for μ≥2√ the system escapes to infinity, thus the overall minimum is well-defined. In order to understand it better, we modify the process such that the particles are absorbed at position 0. I will present the results concerning the law of the number of absorbed particles N. In particular I will concentrate on P(N=0) and the maximal exponential moment of N. This reveals new deep connections with the FKPP equation. Finally, I will also consider −2√<μ<2√ and Nxt the number of particles absorbed until the time t when the system starts from x. In this case I will show the convergence to the traveling wave solution of the FKPP equation for an appropriate choice of x,t−>∞.

The results were obtained jointly with B. Mallein and with J. Berestycki, E. Brunet and S. Harris respectively.

In this talk I will present a function field analogue of a conjecture in number theory. This conjecture is a combination of several famous conjectures, including the Hardy-Littlewood prime tuple conjecture, conjectures on the number of primes in arithmetic progressions and in short intervals, and the Goldbach conjecture. I prove an asymptotic formula for the number of simultaneous prime values of *n* linear functions, in the limit of a large finite field.

A key role is played by the computation of some Galois groups.

This talk will describe how Lagrangian particle methods are being used to compute the dynamics of fluid vortices. In these methods the flow map is represented by moving particles that carry vorticity, the velocity is recovered by the Biot-Savart integral, and a tree code is used to reduce the computation time from to , where is the number of particles. I'll present vortex sheet computations in 2D with reference to Kelvin-Helmholtz instability, the Moore singularity, spiral roll-up, and chaotic dynamics. Other examples include vortex rings in 3D, and vortex dynamics on a rotating sphere.

Motivated by the goal of securely searching and updating distributed data, we introduce the notion of function secret sharing (FSS), a form of “additive secret sharing” for {\em functions} f: {0,1}^n → G, where G is an abelian group.

An m-party FSS scheme for function class F allows one to split any function f from F into m succinctly described functions f_i, such that: (1) for every input x, f(x) is equal to the sum of evaluations \sum_i f_i(x), and (2) any strict subset of "share functions" f_i hides f. FSS provides a natural generalization of distributed point functions, as introduced by (Gilboa-Ishai Eurocrypt 2014), which coincide with the special case of two parties and the class F of point functions (which evaluate to 0 at all but one point).

We present two types of results:

- We obtain efficiency improvements and extensions of the original distributed point function construction.

- We then initiate a systematic study of general FSS, providing constructions for richer function classes, and establishing relations with other cryptographic primitives.

Joint work with Niv Gilboa and Yuval Ishai.

The field of Machine Learning has been making huge strides recently. Problems such as visual recognition and classification, that were believed to be open only a few years ago, now seem solvable. The best performers use Artificial Neural Networks, in their reincarnation as "Deep Learning", where huge networks are trained over lots of data. One bottleneck in current schemes is the huge amount of required computation during both training and testing. This limits the usability of these methods when power is an issue, such as with wearable devices.

As a step towards deeper understanding of deep learning mechanisms, I will show how correct conditioning of the back-propagation training iterations results in a much improved convergence. This reduces training time, providing better results. It also allows us to train smaller models, that are harder to optimize.

In this talk I will also discuss the challenges - and describe some of the solutions - in applying Machine Learning on a mobile device that can fit your pocket. The OrCam is a wearable camera that speaks to you. It reads anything, learns and recognizes faces, and much more. It is ready to help through the day, all with a simple pointing gesture. It is already improving the lives of many blind and visually impaired people.

The convex feasibility problem (CFP) is at the core of the modeling of many problems in various areas of science. Subgradient projection methods are important tools for solving the CFP because they enable the use of subgradient calculations instead of orthogonal projections onto the individual sets of the problem. Working in a real Hilbert space, we show that the sequential subgradient projection method is perturbation resilient. By this we mean that under appropriate conditions the sequence generated by the method converges weakly, and sometimes also strongly, to a point in the intersection of the given subsets of the feasibility problem, despite certain perturbations which are allowed in each iterative step. Unlike previous works on solving the convex feasibility problem, the involved functions, which induce the feasibility problem’s subsets, need not be convex. Instead, we allow them to belong to a wider and richer class of functions satisfying a weaker condition, that we call “zero-convexity”. This class, which is introduced and discussed here, holds a promise to solve optimization problems in various areas, especially in non-smooth and non-convex optimization. The relevance of this study to approximate minimization and to the recent superiorization methodology for constrained optimization is explained.

This is a joint work with Yair Censor.

We introduce and construct a pseudo-random object which we call a local correlation breaker (LCB). This is an algorithm that gets as input a sequence of (possibly correlated) random variables and an independent weak source of randomness. The output of the LCB is a sequence of random variables with the following property. If the i'th input random variable is uniform then the i'th output variable is uniform even if a bounded number of any other output variables are given. That is, an LCB uses the weak-source to "break" local correlations between random variables.

Based on LCBs we obtain improved constructions of mergers with weak-seeds and multi-source extractors. In particular, we construct a 3-source extractor for entropies delta*n, O(log n) and O(loglog n), for any constant delta. We further construct a 7-source extractor for poly-logarithmic entropy.

Joint work with Guy Rothblum.

No prior knowledge is assumed.

Imagine that every vertex of a graph contains a sleeping frog. At time 0, the frog at some designated vertex wakes up and begins a simple random walk. When it lands on a vertex, the sleeping frog there wakes up and begins its own simple random walk, which in turn wakes up any sleeping frogs it lands on, and so on. This process is called the frog model.

I'll talk about a question posed by Serguei Popov in 2003: On an infinite d-ary tree, is the frog model recurrent or transient? That is, is each vertex visited infinitely or finitely often by frogs? The answer is that it depends on d: there's a phase transition between recurrence and transience as d grows. Furthermore, if the system starts with Poi(m) sleeping frogs on each vertex independently, there's a phase transition as m grows. This is joint work with Christopher Hoffman and Matthew Junge.

Bloom filters and Counting Bloom Filters (CBFs) are widely used in networking device algorithms. They implement fast set representations to support membership queries with limited error. Unlike Bloom filters, CBFs also support element deletions. In the first part of the talk, I will introduce a new general method based on variable increments to improve the efficiency of CBFs and their variants. I will demonstrate that this method can always achieve a lower false positive rate and a lower overflow probability bound than CBFs in practical systems.

Next, I will present ongoing research on data center networking. In particular, I will introduce a new approach to providing network isolation, so customers can feel alone in shared clouds, without any network contention from other customers. I will also demonstrate theoretical conditions for the isolation algorithm.

An instance of the E2-Lin(2) problem is a system of equations of the form "x_i + x_j = b (mod 2)". Given such a system in which it is possible to satisfy all but an epsilon fraction of the equations, we would like to find an assignment that violates as few equations as possible. In this paper, we show that it is NP-hard to satisfy all but a C*epsilon fraction of the equations, for any C< 11/8 and 0 < epsilon <= 1/8. Our result holds also for the special case of Max-Cut. The previous best NP-hardness result, standing for over 15 years, had 5/4 in place of 11/8.

Our proof is by a modified gadget reduction from a predicate that supports a pairwise independent distribution. We also show an inherent limitation to this type of gadget reduction. In particular, we show that no such reduction can establish a hardness factor C greater than ~2.54.

Joint work with Johan Hastad, Rajsekar Manokaran, Ryan O'Donnell, John Wright.

Examine random walk in a stationary, ergodic, random environment which is bistochastic i.e. the sum of probabilities to enter any fixed vertex is 1. Consider the drift as a function on the probability space on the environments, and assume it belongs to domain of definition of where D is the symmetrized generator of the walk (this is the famous condition). We show that under these conditions the walk satisfies a central limit theorem. The proof uses the "relaxed sector condition" which shows an unexpected connection to the spectral theory of unbounded operators.

All terms will be explained in the talk. This is joint work with Balint Toth.

Many types of multi-dimensional data have a natural division into two "views", such as audio and video or images and text.

Multi-view learning includes a variety of techniques that use multiple views

of data to learn improved models for each of the views. The views can be multiple measurement modalities (like the examples above) but also can be different types of information extracted from the same source (words + context, document text + links) or any division of the data dimensions into subsets satisfying certain learning assumptions. Theoretical and empirical results show that multi-view techniques can improve over single-view ones in certain settings. In many cases multiple views help by reducing noise in some sense (what is noise in one view is not in the other). In this talk, I will focus on multi-view learning of representations (features), especially using canonical correlation analysis (CCA) and related techniques. I will give a tutorial overview of CCA and its relationship with other techniques such as partial least squares (PLS) and linear discriminant analysis (LDA). I will also present extensions developed by ourselves and others, such as kernel, deep, and generalized

("many-view") CCA. Finally, I will give recent results on speech and language tasks, and demonstrate our publicly available code.

Based on joint work with Raman Arora, Weiran Wang, Jeff Bilmes, Galen Andrew, and others.

Small image patches tend to recur at multiple scales within high-quality natural images.

This fractal-like behavior has been used in the past for various tasks including image compression, super-resolution and denoising. In this talk, I will show that this phenomenon can also be harnessed for "blind deblurring" and for "blind super-resolution", that is, for removing blur or increasing resolution without a-priori knowledge of the associated blur kernel. It turns out that the cross-scale patch recurrence property is strong only in images taken under ideal imaging conditions, but significantly diminishes when the imaging conditions deviate from ideal ones. Therefore, the deviations from ideal patch recurrence actually provide information on the unknown camera blur kernel.

More specifically, we show that the correct blur kernel is the one which maximizes the similarity between patches across scales of the image. Extensive experiments indicate that our approach leads to state of the art results, both in deblurring and in super-resolution.

Joint work with Michal Irani.

Cryo-electron microscopy (cryo-EM) is a microscopy technique used to discover the 3D structure of molecules from very noisy images. We discuss how algebra can describe two aspects of cryo-EM datasets. First, we'll describe common lines datasets. Common lines are lines of intersection between cryo-EM images in 3D. They are a crucial ingredient in some 2D3D reconstruction algorithms, and they can be characterized by polynomial equalities and inequalities. Second, we'll discuss how 3D symmetries of a molecule can be detected from only 2D cryo-EM images, without performing full 3D reconstruction.

We present a practical method for establishing dense correspondences between two images with similar content, but possibly different 3D scenes. One of the challenges in designing such a system is the local scale differences of objects appearing in the two images. Previous methods often considered only small subsets of image pixels; matching only pixels for which stable scales may be reliably estimated. More recently, others have considered dense correspondences, but with substantial costs associated with generating, storing and matching scale invariant descriptors.

Our work here is motivated by the observation that pixels in the image have contexts -- the pixels around them -- which may be exploited in order to estimate local scales reliably and repeatably. In practice, we demonstrate that scales estimated in sparse interest points may be propagated to neighboring pixels where this information cannot be reliably determined. Doing so allows scale invariant descriptors to be extracted anywhere in the image, not just in detected interest points. As a consequence, accurate dense correspondences are obtained even between very different images, with little computational costs beyond those required by existing methods.

This is joint work with Moria Tau from the Open University of Israel

On my last visit in 2012, I posed a number of open questions, including how to achieve joint segmentation and tracking, and how to obtain uncertainty estimates for a segmentation.

Some of these questions we have been able to solve [Schiegg ICCV 2013, Schiegg Bioinformatics 2014, Fiaschi CVPR 2014] and I would like to report on this progress.

Given that I will be at Weizmann for another four months, I will also pose new open questions on image processing problems that require a combination of combinatorial optimization and (structured) learning, as an invitation to work together.

The astronomical community's largest technical challenge is coping with the earths atmosphere. In this talk, I will present the popular methods for performing scientific measurement from the ground, coping with the time dependant distortions generated by the earths atmosphere. We will talk about the following topics:

1) Scientific motivation for eliminating the effect of the atmosphere.

2) The statistics of turbulence - the basis for all methods is in deep understanding of the atmospheric turbulence

3) wave-front sensing + adaptive optics - A way to correct it in hardware.

4) lucky imaging + speckle Interferometry - Ways to computationally extract scientifically valuable data despite the turbulent atmosphere.

Quenched invariance principle and heat kernel bounds for random walks on in finite percolation clusters and among i.i.d. random conductances in Z^{d} were proved during the last two decades.The proofs of these results strongly rely on the i.i.d structure of the models and some stochastic domination with respect to super-critical Bernoulli percolation.

Many important models in probability theory and in statistical mechanics, in particular, models which come from real world phenomena, exhibit long range correlations.

In this talk I will present a new quenched invariance principle, for simple random walk on the unique infinite percolation cluster for a general class of percolation models on Z^{d}, d>=2, with long-range correlations. This gives new results for random interlacements in dimension d>=3 at every level, as well as for the vacant set of random interlacements and the level sets of the Gaussian free field in the regime of the so-called local uniqueness (which is believed to coincide with the whole supercritical regime). An essential ingredient of the proof is a new isoperimetric inequality for correlated percolation models.