## Select Seminar Series

All seminars- Home
- ›
- Studying at the Faculty
- ›
- Seminars ›
- Previous Seminars

# Previous Seminars

Ruelle's operator theorem states that the Ruelle operator $L$, which is a positive operator acting on Holder functions, is conjugated to $P+R$ where $R$ is a one-dimensional projection and the norm of $R$ is smaller than 1. This decomposition, also known as spectral gap, is of interest as it allows to characterise the underlying dynamical system through, e.g., central limit theorems or continuous response to perturbations. However, the conjugation depends on the existence of a positive eigenfunction of $L$, which might not exist in more general, fibred situations due to purely functorial reasons. A possibility to circumvent this problem is to consider quotients of operators of the form $f \mapsto \frac{L^m(f L^n (1))}{L^{m+n}(1)}.$ In fact, it is possible to provide reasonable conditions such that their dual operators contract the Wasserstein distance exponentially in $m$. The result gives rise, for example, to a law of the iterated logarithm for continued fractions with sequentially restricted entries or a topology on the set of equilibrium states for semigroups of expanding maps. This is joint work with Paulo Varandas and Xuan Zhang.

We all capture the world around us through digital data such as images, videos and sound. However, in many cases, we are interested in certain properties of the data that are either not available or difficult to perceive directly from the input signal. My goal is to "Re-render Reality", i.e., develop algorithms that analyze digital signals and then create a new version of it that allows us to see and hear better. In this talk, I'll present a variety of methodologies aimed at enhancing the way we perceive our world through modified, re-rendered output. These works combine ideas from signal processing, optimization, computer graphics, and machine learning, and address a wide range of applications. More specifically, I'll demonstrate how we can automatically reveal subtle geometric imperfection in images, visualize human motion in 3D, and use visual signals to help us separate and mute interference sound in a video. Finally, I'll discuss some of my future directions and work in progress.

BIO: Tali is a Senior Research Scientist at Google, Cambridge, developing algorithms at the intersection of computer vision and computer graphics. Before Google, she was a Postdoctoral Associate at the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. William T. Freeman. Tali completed her Ph.D studies at the school of electrical engineering, Tel-Aviv University, Israel, under the supervision of Prof. Shai Avidan, and Prof. Yael Moses. Her research interests include computational photography, image synthesize, geometry and 3D reconstruction.

In this talk we survey some formulations of fairness requirements for decision making under uncertainty. We then discuss results from 3 recent papers:

- Treating individuals fairly is not in conflict with long-term scientific learning goals if the population is sufficiently diverse.
- When there is a pipeline of decisions, end-to-end fairness is impossible to achieve even in a very simple model.
- Exploiting the knowledge acquired by others can unfairly advantage the free rider.

These papers are joint work with a number of co-authors:

Christopher Jung, Neil Lutz, Jamie Morgenstern, Aaron Roth, Bo Waggoner, Steven Wu, and Juba Ziani

Despite their superb empirical performance, deep learning models for natural language processing (NLP) are often considered black boxes, as relatively little is known as to what accounts for their success. This lack of understanding turns model development into a slow and expensive trial-and-error process, which limits many researchers from developing state-of-the-ar

This is joint work with Hao Peng, Sam Thomson and Noah A. Smith

Zero knowledge protocols are spectacular, allowing to prove NP statements without revealing anything but their validity. An essential element that enables this wonder is interaction. But how much interaction exactly is needed? This question has long challenged cryptographers and is yet to be settled under standard assumptions. In fact, the question appears to be equally challenging also for natural relaxations of the zero knowledge requirement. The difficulty in answering the round complexity question stems from that of a foundational question in cryptography --- what is the power of non-black-box reductions?

In this talk, I will explain this difficulty and present a new non-black-box technique that resolves, under standard assumptions, the round complexity of weak zero knowledge protocols (Dwork-Naor-Reingold-Stockmeyer '98). Specifically, assuming quasipolynomial hardness of the Learning with Errors problem and fully-homomorphic encryption, we construct a two message protocol, a setting where (full-fledged) zero knowledge is impossible.

The talk will assume no prior knowledge in cryptography. It is based on joint work with Dakshita Khurana and Omer Paneth (the paper can be found on

A long-standing open problem, known as Hadwiger's covering problem, asks what is the smallest natural number $N(n)$ such that every convex body in can be covered by a union of the interiors of at most $N(n)$ of its translates. Despite continuous efforts, the best general upper bound known for this number remains as it was more than sixty years ago, of the order of ${2n \choose n} \ln n$.

In this talk, I will discuss some history of this problem and present a new result in which we improve this bound by a sub-exponential factor. Our approach combines ideas from previous work, with tools from Asymptotic Geometric Analysis. As a key step, we use thin-shell estimates for isotropic log-concave measures to prove a new lower bound for the maximum volume of the intersection of a convex body $K$ with a translate of $-K$. We further show that the same bound holds for the volume of $K\cap(-K)$ if the center of mass of $K$ is at the origin.

If time permits we shall discuss some other methods and results concerning this problem and its relatives.

Joint work with H. Huang, B. Vritsiou, and T. Tkocz

Visual repetitions are abundant in our surrounding physical world: small image patches tend to reoccur within a natural image, and across different rescaled versions thereof. Similarly, semantic repetitions appear naturally inside an object class within image datasets, as a result of different views and scales of the same object. In my thesis I studied deviations from these expected repetitions, and demonstrated how this information can be exploited to tackle both low-level and high-level vision tasks. These include “blind” image reconstruction tasks (e.g. dehazing, deblurring), image classification confidence estimation, and more.

For fixed t>1 and L>3 we establish sharp asymptotic formula for the log-probability that the number of cycles of length L in the Erdos - Renyi random graph G(N,p) exceeds its expectation by a factor t, assuming only that p >> log N/sqrt(N). We obtain such sharp upper tail bounds also for the Schatten norms of the corresponding adjacency matrices, and in a narrower range of p=p(N), also for general subgraph counts. In this talk, based on a recent joint work with Nick Cook, I will explain our approach and in particular our quantitative refinement of Szemeredi's regularity lemma for sparse random graphs in the large deviations regime.

Deep convolutional network architectures are often assumed to guarantee generalization for small image translations and deformations. In this paper we show that modern CNNs (VGG16, ResNet50, and InceptionResNetV2) can drastically change their output when an image is translated in the image plane by a few pixels, and that this failure of generalization also happens with other realistic small image transformations. Furthermore, we see these failures to generalize more frequently in more modern networks. We show that these failures are related to the fact that the architecture of modern CNNs ignores the classical sampling theorem so that generalization is not guaranteed. We also show that biases in the statistics of commonly used image datasets makes it unlikely that CNNs will learn to be invariant to these transformations. Taken together our results suggest that the performance of CNNs in object recognition falls far short of the generalization capabilities of humans.

Joint work with Aharon Azulay

This talk concerns the well-studied model of "cake-cutting" for studying questions regarding notions of fair division of resources. We focus on discrete versions rather than using infinite-precision real values, specifically, focusing on the communication complexity. Using general discrete simulations of classical infinite-precision protocols (Robertson-Webb and moving-knife), we roughly partition the various fair-allocation problems into 3 classes: "easy" (constant number of rounds of logarithmic many bits), "medium" (poly-log total communication), and "hard".

Our main technical result concerns two of the "medium" problems (perfect allocation for 2 players and equitable allocation for any number of players) which we prove are not in the "easy" class. Our main open problem is to separate the "hard" from the "medium" classes.

Joint work with Simina Branzei

For a vector space F^n over a field F, an (η, ß)-dimension expander of degree d is a collection of d linear maps Γ_j : F^n \to F^n such that for every subspace U of F^n of dimension at most ηn, the image of U under all the maps, ∑_{j=1}^d Γ_j(U), has dimension at least ßdim(U). Over a finite field, a random collection of d=O(1) maps Γ_j over excellent “lossless” expansion with high probability: ß ≈ d for η ≥ Ω(1/\eta). When it comes to a family of explicit constructions (for growing n), however, achieving even expansion factor β = 1 + ε with constant degree is a non-trivial goal. We present an explicit construction of dimension expanders over finite fields based on linearized polynomials and subspace designs, drawing inspiration from recent progress on list decoding in the rank-metric. Our approach yields the following:

- Lossless expansion over large fields; more precisely ß ≥ (1–ε)d and η ≥ (1–ε)/d with d=O_ε(1), when |F| ≥ Ω(n).
- Optimal up to constant factors expansion over fields of arbitrarily small polynomial size; more precisely ß ≥ Ω(δd) and η ≥ Ω(1/(δd)) with d = O_δ(1), when |F| ≥ n^δ.

Previously, an approach reducing to monotone expanders (a form of vertex expansion that is highly non-trivial to establish) gave (Ω(1), 1+Ω(1))-dimension expanders of constant degree over all fields. An approach based on “rank condensing via subspace designs” led to dimension expanders with ß ≥ Ω(√d) over large finite fields. Ours is the first construction to achieve lossless dimension expansion, or even expansion proportional to the degree. Based on joint work with Venkatesan Guruswami and Chaoping Xing.

As algorithmic prediction systems have become more widespread, so too have concerns that these systems may be discriminatory against groups of people protected by laws and ethics. We present a recent line of work that takes a complexity theoretic perspective towards combating discrimination in prediction systems. We'll focus on fair classification within the versatile framework of Dwork et al. [ITCS'12], which assumes the existence of a metric that measures similarity between pairs of individuals. Unlike earlier work, we do not assume that the entire metric is known to the learning algorithm; instead, the learner can query this metric a bounded number of times. We propose a new notion of fairness called *metric multifairness* and show how to achieve this notion in our setting. Metric multifairness is parameterized by a similarity metric d on pairs of individuals to classify and a rich collection C of (possibly overlapping) "comparison sets" over pairs of individuals. At a high level, metric multifairness guarantees that *similar subpopulations are treated similarly*, as long as these subpopulations are identified within the class C.

In this talk I will present an algorithms for approximating the number of k-cliques in a graph when given query access to the graph. This problem was previously studied for the cases of k=2 (edges) and k=3 (triangles). We give an algorithm that works for any k >= 3, and is actually conceptually simpler than the k=3 algorithm. We consider the standard query model for general graphs via (1) degree queries, (2) neighbor queries and (3) pair queries. Let n denote the number of vertices in the graph, m the number of edges, and C_k the number of k-cliques. We design an algorithm that outputs a (1+\epsilon)-approximation (with high probability) for C_k, whose expected query complexity and running time are O (\frac{n}{C_k^{1/k}}+\frac{m^{k/2}}{C_k}) poly (\log n, 1/\epsilon,k).

Hence, the complexity of the algorithm is sublinear in the size of the graph for C_k = \omega(m^{k/2-1}). Furthermore, we prove a lower bound showing that the query complexity of our algorithm is essentially optimal (up to the dependence on \log n, 1/\epsilon and k).

This is joint work with Talya Eden and C. Seshadhri.

In typical imaging systems, an image/video is first acquired, then compressed for transmission or storage, and eventually presented to human observers using different and often imperfect display devices. While the resulting quality of the perceived output image may severely be affected by the acquisition and display processes, these degradations are usually ignored in the compression stage, leading to an overall sub-optimal system performance. In this work we propose a compression methodology to optimize the system's end-to-end reconstruction error with respect to the compression bit-cost. Using the alternating direction method of multipliers (ADMM) technique, we show that the design of the new globally-optimized compression reduces to a standard compression of a "system adjusted" signal. Essentially, we propose a new practical framework for the information-theoretic problem of remote source coding. The main ideas of our method are further explained using rate-distortion theory for Gaussian signals. We experimentally demonstrate our framework for image and video compression using the state-of-the-art HEVC standard, adjusted to several system layouts including acquisition and rendering models. The experiments established our method as the best approach for optimizing the system performance at high bit-rates from the compression standpoint.

In addition, we relate the proposed approach also to signal restoration using complexity regularization, where the likelihood of candidate solutions is evaluated based on their compression bit-costs.

Using our ADMM-based approach, we present new restoration methods relying on repeated applications of standard compression techniques. Thus, we restore signals by leveraging state-of-the-art models designed for compression. The presented experiments show good results for image deblurring and inpainting using the JPEG2000 and HEVC compression standards.

* Joint work with Prof. Alfred Bruckstein and Prof. Michael Elad.

** More details about the speaker and his research work are available at http://ydar.cswp.cs.technion.ac.il/

We consider the goal of social welfare maximization where a single item is to be assigned to one of to n potential agents with interdependent values.

That is, each agent has her own private signal, and the valuation of each agent is a known function of all n private signals. This captures settings such as valuations for artwork, oil drilling rights, broadcast rights, and many more. In the interdependent value setting, all previous work has assumed a so-called single-crossing condition. Single-crossing means that the impact of agent i’s private signal, s_i, on her own valuation is greater than the impact of s_ii on the valuation of any other agent. It is known that without the single-crossing condition an efficient outcome cannot be obtained. We study welfare maximization for interdependent valuations through the lens of approximation.

We show that, in general, without the single-crossing condition, one cannot hope to approximate the optimal social welfare any better than the approximation given by assigning the item to a random bidder. Consequently, we introduce a relaxed version of single-crossing, c-single-crossing, parameterized by c ≥ 1, which means that the impact of s_i on the valuation of agent i is at least 1/c times the impact of s_i on the valuation of any other agent (c = 1 is single-crossing). Using this parameterized notion, we obtain a host of positive results. We also consider interdependent settings when valuations are concave and give improved results.

Joint work with Alon Eden, Michal Feldman, and Kira Goldner.

Given an underlying finite point set P in the plane, we seek a small set Q that would hit any convex set that contains at least an Epsilon-fraction of P. Such a set Q is called a weak Epsilon-net. The study of Epsilon-nets is central to Computational and Combinatorial Geometry, and it bears important connections to Statistical Learning Theory, Extremal Combinatorics, Discrete Optimization, and other areas.

It is an outstanding open problem to determine tight asymptotic bounds on weak Epsilon-nets with respect to convex sets. For any underlying point set in the plane we describe such a net whose cardinality is roughly proportional to Epsilon^{-3/2}. This is the first improvement of the over-25-year-old bound of Alon, Barany, Furedi, and Kleitman.

The tremendous success of the Machine Learning paradigm heavily relies on the development of powerful optimization methods. The canonical algorithm for training learning models is SGD (Stochastic Gradient Descent), yet this method has its limitations. It is often unable to exploit useful statistical/geometric structure, it might degrade upon encountering prevalent non-convex phenomena, and it is hard to parallelize. In this talk I will discuss an ongoing line of research where we develop alternative methods that resolve some of SGD"s limitations. The methods that I describe are as efficient as SGD, and implicitly adapt to the underlying structure of the problem in a data dependent manner.

In the first part of the talk, I will discuss a method that is able to take advantage of hard/easy training samples. In the second part, I will discuss a method that enables an efficient parallelization of SGD. Finally, I will briefly describe a method that implicitly adapts to the smoothness and noise properties of the learning objective.

In the minimum k-edge-connected spanning subgraph (k-ECSS) problem the goal is to find the minimum weight subgraph resistant to up to k-1 edge failures. This is a central problem in network design, and a natural generalization of the minimum spanning tree (MST) problem. In this talk, I will present fast randomized distributed approximation algorithms for k-ECSS in the CONGEST model.

SRB measures are an important object in dynamical systems and mathematical physics. Named after Sinai , Ruelle and Bowen, these measures have important properties of being equilibrium states which describe chaotic behaviour, yet may also describe the asymptotic of ``observable” events in the phase space. An open and important question, is in what generality do systems admit SRB measures?

We present the notion of generalised SRB measures (GSRB in short), mention some of their important properties, and present a new condition to characterise their existence on a general setup.

The first part of the talk will describe some of the motivation leading to define and to study SRB measures; and so we will define GSRB measures and compare their properties with the properties sought for SRB measures. We will also describe a case study of examples motivating to study GSRB measures. Our new result is a characterisation of systems admitting GSRB measures.

In the second part of the talk, as much as time permits, we will present some key steps in the construction of GSRB measures.

The "Duty of Care"is a legal obligation requiring one to adhere to a standard of reasonable care while performing acts that could harm others. We propose a formal, mathematical interpretation of the duty of care in the context of "being cautious while driving". The framework enables tackling moral questions in a formal way and is an example of a new research field which we call regulatory science.

Joint work with Shaked Shammah and Amnon Shashua.

Single-particle cryo-electron microscopy (cryo-EM) is an innovative technology for elucidating structures of biological molecules at atomic-scale resolution. In a cryo-EM experiment, tomographic projections of a molecule, taken at unknown viewing directions, are embedded in highly noisy images at unknown locations. The cryo-EM problem is to estimate the 3-D structure of a molecule from these noisy images.

Inspired by cryo-EM, the talk will focus on two estimation problems: *multi-reference alignment* and *blind deconvolution*. These problems abstract away much of the intricacies of cryo-EM, while retaining some of its essential features. In multi-reference alignment, we aim to estimate a signal from its noisy, rotated observations. While the rotations and the signal are unknown, the goal is only to estimate the signal. In the blind deconvolution problem, the goal is to estimate a signal from its convolution with an unknown, sparse signal in the presence of noise. Focusing on the low SNR regime, I will propose the *method of moments* as a computationally efficient estimation framework for both problems and will introduce its properties. In particular, I will show that the method of moments allows estimating the sought signal accurately in any noise level, provided sufficiently many observations are collected, with only one pass over the data. I will then argue that the same principles carry through to cryo-EM, show examples, and draw potential implications.

The representation theory of GL(n,F), F non-archimedean is a classical subject initiated by Bernstein and Zelevinsky in the 1970s.

I will review some recent results and conjectures which aim to characterize irreducibility of parabolic induction, in terms of geometry. Joint with Alberto Minguez

Ramsey theory assures us that in any graph there is a clique or independent set of a certain size, roughly logarithmic in the graph size. But how difficult is it to find the clique or independent set? If the graph is given explicitly, then it is possible to do so while examining a linear number of edges. If the graph is given by a black-box, where to figure out whether a certain edge exists the box should be queried, then a large number of queries must be issued. But what if one is given a program or circuit for computing the existence of an edge? What if we are assured that the program is small without being given explicitly?

In this talk I will explore recent work on the complexity of search problems with guaranteed solution (the class TFNP) and the tight relationship with cryptographic assumptions and techniques.

Based on joint works with Pavel Hubacek, Ilan Komargodski and Eylon Yogev

In this talk I will survey the way machine learning research is affecting the field of security, and the way security research Is affecting the field of machine learning. After giving several examples of each type, I will discuss some of the latest developments In adversarial machine learning research, which are used by hackers to defeat security mechanisms based on regular machine learning techniques

In the vector balancing problem, we are given N vectors v_1,..., v_N in an n-dimensional normed space, and our goal is to assign signs to them, so that the norm of their signed sum is as small as possible. The balancing constant of the vectors is the smallest number beta, such that any subset of the vectors can be balanced so that their signed sum has norm at most beta.

The vector balancing constant generalizes combinatorial discrepancy, and is related to rounding problems in combinatorial optimization, and to the approximate Caratheodory theorem. We study the question of efficiently approximating the vector balancing constant of any set of vectors, with respect to an arbitrary norm. We show that the vector balancing constant can be approximated in polynomial time to within factors logarithmic in the dimension, and is characterized by (an appropriately optimized version of) a known volumetric lower bound. Our techniques draw on results from geometric functional analysis and the theory of Gaussian processes. Our results also imply an improved approximation algorithm for hereditary discrepancy.

Joint work with Aleksandar Nikolov, Nicole Tomczak-Jaegermann and Kunal Talwar.

In the talk I will tell about a theory of cyclic elements in semisimple Lie algebras. This notion was introduced by Kostant, who associated a cyclic element with the principal nilpotent and proved that it is regular semisimple. In particular, I will tell about the classification of all nilpotents giving rise to semisimple and regular semisimple cyclic elements. The results are from my joint work with V. Kac and E. Vinberg.

There is a movement in design of experiments away from the classic randomization put forward by Fisher, Cochran and others to one based on optimization. In fixed-sample trials comparing two groups, measurements of subjects are known in advance and subjects can be divided optimally into two groups based on a criterion of homogeneity or "imbalance" between the two groups. These designs are far from random. This talk seeks to understand the benefits and the costs over classic randomization in the context of different performance criterions such as Efron's worst-case analysis. In the criterion that we motivate, randomization beats optimization. However, the optimal design is shown to lie between these two extremes. Much-needed further work will provide a procedure to find this optimal designs in different scenarios in practice. Until then, it is best to randomize.

I will define the Enright functor for contragredient Lie superalgebras and discuss its properties. If time permits, we may discuss a proof of Arakawa's Theorem for osp(1|2l).

This work is a joint project with V. Serganova.

The goal of my talk (based on joint work with Vladimir Retakh) is to introduce noncommutative clusters and their mutations, which can be viewed as generalizations of both classical and quantum cluster structures.

Each noncommutative cluster S is built on a torsion-free group G and a certain collection of its automorphisms. We assign to S a noncommutative algebra A(S) related to the group algebra of G, which is an analogue of the cluster algebra, and establish a noncommutative version of Laurent Phenomenon in some algebras A(S).

"Cluster groups" G for which the Noncommutative Laurent Phenomenon holds include triangular groups of marked surfaces (closely related to the fundamental groups of their ramified double covers), free group of rank 2, and principal noncommutative tori which exist for any exchange matrix B.

Various algebraic and topological situations give rise to compatible sequences of representations of different groups, such as the symmetric groups, with stable asymptotic behavior. Representation stability is a recent approach to studying such sequences, which has proved effective for extracting important invariants. Coming from this point of view, I will introduce the associated character theory, which formally explains many of the approach's strengths (in char 0). Central examples are simultaneous characters of all symmetric groups, or of all Gl(n) over some finite field. Their mere existence gives applications to statistics of random matrices over finite fields, and raises many combinatorial questions.

We will show that there exists a metric space that does not admit a coarse embedding into any Alexandrov space of global nonpositive curvature, thus answering a question of Gromov (1993). In contrast, any metric space embeds coarsely into an Alexandorv space of nonnegative curvature. Based on joint works with Andoni and Neiman, and Eskenazis and Mendel.

Let $X$ be an $n$-dimensional smooth projective variety over a non-Archimedean local field $K$, such as the $p$-adic numbers, and let $\omega$ be an global $n$-form on $X$. The set $X(K)$ of $K$-points on $X$ has the structure of a $K$-analytic manifold, and $\omega$ induces a measure $|\omega|$ on $X(K)$. For any finite extension $K'$ of $K$, there is a natural continuous map from $X(K')$ to the Berkovich analytification $X^{\mathrm{an}}$ of $X$. We study the asymptotics of the images of the measures $|\omega\otimes_KK'|$ on $X^{\mathrm{an}}$ as $K'$ runs through towers of finite extensions of $K$. This is joint work with Johannes Nicaise.

Let F be a fixed infinite, vertex-transitive graph. We say a graph G is `r-locally F' if for every vertex v of G, the ball of radius r and centre v in G is isometric to the ball of radius r in F. For each positive integer n, let G_n be a graph chosen uniformly at random from the set of all unlabelled, n-vertex graphs that are r-locally F. We investigate the properties that the random graph G_n has with high probability --- i.e., how these properties depend on the fixed graph F.

We show that if F is a Cayley graph of a torsion-free group of polynomial growth, then there exists a positive integer r_0 such that for every integer r at least r_0, with high probability the random graph G_n = G_n(F,r) defined above has largest component of size between n^{c_1} and n^{c_2}, where 0 < c_1 < c_2 < 1 are constants depending upon F alone, and moreover that G_n has a rather large automorphism group. This contrasts sharply with the random d-regular graph G_n(d) (which corresponds to the case where F is replaced by the infinite d-regular tree).

Our proofs use a mixture of results and techniques from group theory, geometry and combinatorics.

We obtain somewhat more precise results in the case where F is L^d (the standard Cayley graph of Z^d): for example, we obtain quite precise estimates on the number of n-vertex graphs that are r-locally L^d, for r at least linear in d.

Many intriguing open problems remain: concerning groups with torsion, groups with faster than polynomial growth, and what happens for more general structures than graphs.

This is joint work with Itai Benjamini (WIS).

I will describe a new approach to the study of spherical spin glass models via free energy landscapes, defined by associating to interior points of the sphere the free energy computed only over the spherical band around them.

They are closely related to several fundamental objects from spin glass theory: the TAP free energy, pure states decomposition, overlap distribution, and temperature chaos. I will explain some of of those connections.

TBA

TBA

In this talk I explore various limitations of sensors and displays, and suggest new ways to overcome them.

These include:

- The limited 2D displays of today's screens - I will show how we can design new displays that enable us to see in 3D *without* any glasses.
- The limited spatial resolution of images - I will discuss the crucial factors for successful Super-Resolution.
- The poor image quality due to motion blur (due to camera motion or scene motion) - I will present a new approach for Blind Non-Uniform Deblurring.

A classical problem in causal inference is that of matching treatment units to control units in an observational dataset. This problem is distinct from simple estimation of treatment effects as it provides additional practical interpretability of the underlying causal mechanisms that is not available without matching. Some of the main challenges in developing matching methods arise from the tension among (i) inclusion of as many covariates as possible in defining the matched groups, (ii) having matched groups with enough treated and control units for a valid estimate of average treatment effect in each group, (iii) computing the matched pairs efficiently for large datasets, and (iv) dealing with complicating factors such as non-independence among units. We propose the Fast Large-scale Almost Matching Exactly (FLAME) framework to tackle these problems for categorical covariates. At its core this framework proposes an optimization objective for match quality that captures covariates that are integral for making causal statements while encouraging as many matches as possible. We demonstrate that this framework is able to construct good matched groups on relevant covariates and further extend the methodology to incorporate continuous and other complex covariates. related papers: https://arxiv.org/abs/1707.06315, https://arxiv.org/abs/1806.06802

Moosa and Scanlon defined a general notion of "fields with operators'', that generalizes those of difference and differential fields. In the case of "free'' operators in characteristic zero they also analyzed the basic model-theoretic properties of the theory of such fields. In particular, they showed in this case the existence of the model companion, a construction analogous to that of algebraically closed fields for usual fields. In positive characteristic, they provided an example showing that the model companion need not exist.

I will discuss work, joint with Beyarslan, Hoffman and Kowalski, that completes the description of the free case, namely, it provides a full classification of those free operators for which the model companion exists. Though the motivating question is model theoretic, the description and the proof are completely algebraic and geometric. If time permits, I will discuss additional properties, such as quantifier elimination. All notions related to model theory and to fields with operators will be explained (at least heuristically).

SPECIAL NOTE: this will be part of a model theory day. Thus, the talk will be preceded by an introduction to algebraic geometry by the same speaker, 10-10:45 (in Room 1) and followed by a talk by Nick Ramsey " Classification Theory and the Construction of PAC Fields" , 14-16 (in Room 155). See https://mt972.weebly.com/ for more information

Technological advances have changed every aspect of our lives in recent decades, yet, for the most part, the same systems of democratic decision making have been in place for centuries. I will argue that computer scientists can help rethink the practice of democracy, as well as its potential applications. I will focus on three emerging paradigms that go far beyond your run-of-the-mill election: (i) liquid democracy, an approach that allows voters to transitively delegate their votes; (ii) participatory budgeting, whereby residents collectively decide how to spend their local government's budget; and (iii) virtual democracy, which employs instant elections among machine learning models of real voters to address the grand AI challenge of ethical decision making.

We present a joint audio-visual model for isolating a single speech signal from a mixture of sounds such as other speakers and background noise. Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video. In this paper, we present a deep network-based model that incorporates both visual and auditory signals to solve this task. The visual features are used to "focus" the audio on desired speakers in a scene and to improve the speech separation quality. To train our joint audio-visual model, we introduce AVSpeech, a new dataset comprised of thousands of hours of video segments from the Web. We demonstrate the applicability of our method to classic speech separation tasks, as well as real-world scenarios involving heated interviews, noisy bars, and screaming children, only requiring the user to specify the face of the person in the video whose speech they want to isolate. Our method shows clear advantage over state-of-the-art audio-only speech separation in cases of mixed speech. In addition, our model, which is speaker-independent (trained once, applicable to any speaker), produces better results than recent audio-visual speech separation methods that are speaker-dependent (require training a separate model for each speaker of interest).

Joint work with Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, Bill Freeman and Miki Rubinstein of Google Research.

Cryo-electron microscopy (cryo-EM) is an imaging technology that is revolutionizing structural biology, enabling reconstruction of molecules at near-atomic resolution.

Cryo-EM produces a large number of noisy two-dimensional tomographic projection images of a molecule, taken at unknown viewing directions.

The extreme levels of noise make classical tasks in statistics and signal processing, such as alignment, detection and clustering, very challenging.

I will start the talk by studying the multi-reference alignment problem, which can be interpreted as a simplified model for cryo-EM. In multi-reference alignment, we aim to estimate multiple signals from circularly-translated, unlabeled, noisy copies.

In high noise regimes, the measurements cannot be aligned or clustered. Nonetheless, accurate and efficient estimation can be achieved via group-invariant representations (invariant polynomials). Furthermore, such estimators achieve the optimal estimation rate.

Then, I will show how this framework can be applied to the problem of 2-D classification in cryo-EM. In the last part of the talk, I will introduce the analog invariants of the cryo-EM problem and discuss how they can be used for ab initio modeling.

generation of neuronal network oscillations are not well understood. We view this process as the individual neurons' oscillations being communicated among the nodes in the network, mediated by the impedance profiles of the isolated (uncoupled) individual neurons. In order to test this idea, we developed a mathematical tool that we refer to as the Frequency Response Alternating Map (FRAM). The FRAM describes how the impedances of the individual oscillators interact to generate network responses to oscillatory inputs. It consists of decoupling the non-autonomous part of the coupling term and substituting the reciprocal coupling by a sequence of alternating one-directional forcing effects (cell 1 forces cell 2, which in turn forces cell 1 and so on and so forth). The end result is an expression of the network impedance for each node (in the network) as power series, each term involving the product of the impedances of the autonomous part of the individual oscillators. For linear systems we provide analytical expressions of the FRAM and we show that its convergence properties and limitations. We illustrate numerically that this convergence is relatively fast. We apply the FRAM to the phenomenon of network resonance to the simplest type of oscillatory network: two non-oscillatory nodes receiving oscillatory inputs in one or the two nodes. We discuss extensions of the FRAM to include non-linear systems and other types of network architectures.

The use of latent variables in probabilistic modeling is a standard approach in numerous data analysis applications. In recent years, there has been a surge of interest in spectral methods for latent variable models, where inference is done by analyzing the lower order moments of the observed data. In contrast to iterative approaches such as the EM algorithm, under appropriate conditions spectral methods are guaranteed to converge to the true model parameters given enough data samples.

The focus of the seminar is the development of novel spectral based methods for two problems in statistical machine learning. In the first part, we address unsupervised ensemble learning, where one obtains predictions from different sources or classifiers, yet without knowing the reliability and expertise of each source, and with no labeled data to directly assess it. We develop algorithms to estimate the reliability of the classifiers based on a common assumption that different classifiers make statistically independent errors. In addition, we show how one can detect subsets of classifiers that strongly violate the model of independent errors, in a fully unsupervised manner.

In the second part of the seminar we show how one can use spectral methods to learn the parameters of binary latent variable models. This model has many applications such as overlapping clustering and Gaussian-Bernoulli restricted Boltzmann machines. Our methods are based on computing the eigenvectors of both the second and third moments of the observed variables.

For both problems, we show that spectral based methods can be applied effectively, achieving results that are state of the art in various problems in computational biology and population genetics.

Information transfer between triangle meshes is of great importance in computer graphics and geometry processing. To facilitate this process, a smooth and accurate map is typically required between the two meshes. While such maps can sometimes be computed between nearly-isometric meshes, the more general case of meshes with diverse geometries remains challenging. This talk describes a novel approach for direct map computation between triangle meshes without mapping to an intermediate domain, which optimizes for the harmonicity and reversibility of the forward and backward maps. Our method is general both in the information it can receive as input, e.g. point landmarks, a dense map or a functional map, and in the diversity of the geometries to which it can be applied. We demonstrate that our maps exhibit lower conformal distortion than the state-of-the-art, while succeeding in correctly mapping key features of the input shapes.

(joint with Noam Aigerman, Raz Sluzky and Yaron Lipman)

Computing homeomorphisms between surfaces is an important task in shape analysis fields such as computer graphics, medical imaging and morphology. A fundamental tool for these tasks is solving Dirichlet's problem on an arbitrary Jordan domain with disc topology, where the boundary of the domain is mapped homeomorphically to the boundary of a specific target domain: A convex polygon. By the Rado-Kneser-Choquet Theorem such harmonic mappings are homeomorphisms onto the convex polygon. Standard finite element approximations of harmonic mappings lead to discrete harmonic mappings, which have been proven to be homeomorphisms as well. Computing the discrete harmonic mappings is very efficient and reliable as the mappings are obtained as the solution of a sparse linear system of equations.

In this talk we show that the methodology above, can be used to compute *conformal* homeomorphisms, for domains with either disc or sphere topology:

By solving Dirichlet's problem with correct boundary conditions, we can compute conformal homeomorphisms from arbitrary Jordan domains to a specific canonical domain- a triangle. The discrete conformal mappings we compute are homeomorphisms, and approximate the conformal homeomorphism uniformly and in H^1. Similar methodology can also be used to conformally map a sphere type surface to a planar Jordan domain, whose edges are identified so that the planar domain has the topology of a sphere.

In knot theory and more generally the theory of 3-manifolds various "quantum invariants" like the Witten-Reshetikhin-Turaev or the Kashaev invariant have been much studied in recent years, in particular because of the famous "volume conjecture" related to the asymptotic growth of the Kashaev invariant. Rather surprisingly, it transpired a few years ago that these invariants also have very non-trivial number-theoretical properties, including a kind of weak invariance under the modular group SL(2,Z) ("quantum modular forms") and the experimental discovery of the appearance of certain units in cyclotomic extensions as factors in the asymptotic expansions. The talk will report on this and specifically on recent joint work with Frank Calegari and Stavros Garoufalidis that constructs such units in a purely algebraic way starting from elements in algebraic K-theory or in the more elementary "Bloch group". As an unexpected application, this result led to the proof of a well-known conjecture of Nahm on the modularity of certain q-hypergeometric series.

The classical Gaussian isoperimetric inequality, established in the 70's independently by Sudakov-Tsirelson and Borell, states that the optimal way to decompose $\mathbb{R}^n$ into two sets of prescribed Gaussian measure, so that the (Gaussian) area of their interface is minimal, is by using two complementing half-planes. This is the Gaussian analogue of the classical Euclidean isoperimetric inequality, and is therefore referred to as the "single-bubble" case.

A natural generalization is to decompose $\mathbb{R}^n$ into $q \geq 3$ sets of prescribed Gaussian measure. It is conjectured that when $q \leq n+1$, the configuration whose interface has minimal (Gaussian) area is given by the Voronoi cells of $q$ equidistant points. For example, for $q=3$ (the "double-bubble"conjecture) in the plane ($n=2$), the interface is conjectured to be a "tripod" or "Y" - three rays meeting at a single point in 120 degree angles. For $q=4$ (the "triple-bubble" conjecture) in $\mathbb{R}^3$, the interface is conjectured to be a tetrahedral cone.

We confirm the Gaussian double-bubble and, more generally, multi-bubble conjectures for all $3 \leq q \leq n+1$. The double-bubble case $q=3$ is simpler, and we will explain why.

None of the numerous methods discovered over the years for establishing the classical $q=2$ case seem amenable to the $q \geq 3$ cases, and our method consists of establishing a Partial Differential Inequality satisfied by the isoperimetric profile. To treat $q > 3$, we first prove that locally minimimal configurations must have flat interfaces, and thus convex polyhedral cells. Uniqueness of minimizers up to null-sets is also established.

This is joint work with Joe Neeman (UT Austin).

Typically, when semi-discrete approximations to time-dependent partial differential equations (PDE) or schemes for ordinary differential equation (ODE) are constructed they are derived such that they are stable and have a specified truncation error $\tau$. Under these conditions, the Lax-Richtmyer equivalence theorem assures that the scheme converges and that the error is, at most, of the order of $|| \tau ||$. In most cases, the error is in indeed of the order of $|| \tau ||$. We demonstrate that schemes can be constructed, whose truncation errors are $\tau$, however, the actual errors are much smaller. This error reduction is made by constructing the schemes such that they inhibit the accumulation of the local errors; therefore they are called Error Inhibiting Schemes (EIS).

In the Minimum Hypergraph Bisection problem, the vertex set of a hypergraph has to be partitioned into two parts of equal size so that the number of hyperedges intersecting both parts is minimized.

This problem is a natural generalization of the well-studied Minimum Bisection problem in graphs.

We present a sharp distinction between Minimum Bisection in hypergraphs and graphs.

Whereas it is well-known that all bi-criteria approximation algorithms for Minimum Bisection in graphs can be extended to hypergraphs with the exact same guarantees, we prove that this is not the case when considering true (i.e., non bi-criteria) approximation algorithms.

Specifically, we show that Minimum Hypergraph Bisection admits an $\tilde{\mathcal{O}}(\sqrt{n})$ approximation algorithm.

However, we also show that any $\alpha$-approximation algorithm for Minimum Hypergraph Bisection implies an approximation of $\Omega(\alpha^{-2})$ for Densest $k$-Subgraph.

Thus, assuming the exponential time hypothesis there is no efficient approximation algorithm for Minimum Hypergraph Bisection with an approximation ratio $n^{poly(\log{\log{n}})}$.

In particular, Minimum Hypergraph Bisection is much harder to approximate than Minimum Bisection in graphs, for which a logarithmic approximation algorithm exists.

If time permits, the problem of constructing trees that are cut sparsifiers for hypergraph and vertex cuts will also be discussed.

While similar trees lie at the heart of powerful algorithms for Minimum Bisection in graphs, we prove that this is not the case for hypergraphs.

Joint work with Harald R\"{a}cke and Richard Stotz.

**Ohad Feldheim: Convergence of a quantile admission processes**

**Abstract**: Consider the following stochastic model for a growing set. At time 0 the model consists of the singleton S = {-infty}. At every subsequent time, two i.i.d. samples, distributed according to some distribution D on R, are suggested as candidates for S. If the smaller among the two is closer to at least a fraction of r of the current elements of S (in comparison with the larger one), then it is admitted into S.

How will the distribution of the members of S evolve over time as a function of r and D?

This model was suggested by Alon, Feldman, Mansour, Oren and Tennenholtz as a model for the evolution of an exclusive social group. We'll show that the empirical distribution of the elements of S converges to a (not-necessarily deterministic) limit distribution for any r and D.

This we do by relating the process to a random walk in changing environment. The analysis of this random walk involves various classical exponential concentration inequalities as well as a new general inequality concerning mean and minimum of independent random variables.

Joint work with Naomi Feldheim

**Eviatar Procaccia:** **Stabilization of Diffusion Limited Aggregation in a Wedge**

**Abstract**: We prove a discrete Beurling estimate for the harmonic measure in a wedge in $\mathbf{Z}^2$, and use it to show that Diffusion Limited Aggregation (DLA) in a wedge of angle smaller than $\pi/4$ stabilizes. This allows to consider the infinite DLA and questions about the number of arms, growth and dimension. I will present some conjectures and open problems.

Artificial illumination plays a vital role in human civilization. In computer vision, artificial light is extensively used to recover 3D shape, reflectance, and further scene information. However, in most computer vision applications using artificial light, some additional structure is added to the illumination to facilitate the task at hand. In this work, we show that the ubiquitous alternating current (AC) lights already have a valuable inherent structure that stems from bulb flicker. By passively sensing scene flicker, we reveal new scene information which includes: the type of bulbs in the scene, the phases of the electric grid up to city scale, and the light transport matrix. This information yields unmixing of reflections and semi-reflections, nocturnal high dynamic range, and scene rendering with bulbs not observed during acquisition. Moreover, we provide methods that enable capturing scene flicker using almost any off-the-shelf camera, including smartphones.

In underwater imaging, similar structures are added to illumination sources to overcome the limited imaging range. In this setting, we show that by optimizing camera and light positions while taking the effect of light propagation in scattering media into account we achieve superior imaging of underwater scenes while using simple, unstructured illumination.

The fisherman caught a quantum fish. "Fisherman, please let me go", begged the fish, "and I will grant you three wishes". The fisherman agreed. The fish gave the fisherman a quantum computer, three quantum signing tokens and his classical public key. The fish explained: "to sign your three wishes, use the tokenized signature scheme on this quantum computer, then show your valid signature to the king, who owes me a favor".

The fisherman used one of the signing tokens to sign the document "give me a castle!" and rushed to the palace. The king executed the classical verification algorithm using the fish's public key, and since it was valid, the king complied.

The fisherman's wife wanted to sign ten wishes using their two remaining signing tokens. The fisherman did not want to cheat, and secretly sailed to meet the fish. "Fish, my wife wants to sign ten more wishes". But the fish was not worried: "I have learned quantum cryptography following the previous story (The Fisherman and His Wife by the brothers Grimm). The quantum tokens are consumed during the signing. Your polynomial wife cannot even sign four wishes using the three signing tokens I gave you".

"How does it work?" wondered the fisherman. "Have you heard of quantum money? These are quantum states which can be easily verified but are hard to copy. This tokenized quantum signature scheme extends Aaronson and Christiano's quantum money scheme, and a variant by Zhandry, which is why the signing tokens cannot be copied".

"Does your scheme have additional fancy properties?" the fisherman asked. "Yes, the scheme has other security guarantees: revocability, testability and everlasting security. Furthermore, if you're at sea and your quantum phone has only classical reception, you can use this scheme to transfer the value of the quantum money to shore", said the fish, and swam away.

Joint work with Shalev Ben-David. https://arxiv.org/abs/1609.09047

**Bhaswar Bhattacharya**: Large Deviation Variational Problems in Random Combinatorial Structures

The upper tail problem in the Erdos-Renyi random graph $G\sim\mathcal{G}_{n,p}$, where every edge is included independently with probability $p$, is to estimate the probability that the number of copies of a graph $H$ in $G$ exceeds its expectation by a factor of $1+\delta$. The arithmetic analog of this problem counts the number of $k$-term arithmetic progressions in a random subset of $\{1, 2, \ldots, N\}$, where every element is included independently with probability $p$. The recently developed framework of non-linear large deviations (Chatterjee and Dembo (2016) and Eldan (2017)) shows that the logarithm of these tail probabilities can be reduced to a natural variational problem on the space of weighted graphs/functions. In this talk we will discuss methods for solving these variational problems in the sparse regime ($p \rightarrow 0$), and show how the solutions are often related to extremal problems in combinatorics. (This is based on joint work with Shirshendu Ganguly, Eyal Lubetzky, Xuancheng Shao, and Yufei Zhao.)

**Elliot Paquette**: Random matrix point processes via stochastic processes

In 2007, Virag and Valko introduced the Brownian carousel, a dynamical system that describes the eigenvalues of a canonical class of random matrices. This dynamical system can be reduced to a diffusion, the stochastic sine equation, a beautiful probabilistic object requiring no random matrix theory to understand. Many features of the limiting eigenvalue point process, the Sine--beta process, can then be studied via this stochastic process. We will sketch how this stochastic process is connected to eigenvalues of a random matrix and sketch an approach to two questions about the stochastic sine equation: deviations for the counting Sine--beta counting function and a functional central limit theorem.

The celebrated Gan-Gross-Prasad conjectures aim to describe the branching behavior of representations of classical groups, i.e., the decomposition of irreducible representations when restricted to a lower rank subgroup.

These conjectures, whose global/automorphic version bear significance in number theory, have thus far been formulated and resolved for the generic case.

In this talk, I will present a newly formulated rule in the p-adic setting (again conjectured by G-G-P) for restriction of representations in non-generic Arthur packets of GL_n.

Progress towards the proof of the new rule takes the problem into the rapidly developing subject of quantum affine algebras. These techniques use a version of the Schur-Weyl duality for affine Hecke algebras, combined with new combinatorial information on parabolic induction extracted by Lapid-Minguez.

All known algorithms for solving NP-complete problems require exponential time in the worst case; however, these algorithms nevertheless solve many problems of practical importance astoundingly quickly, and are hence relied upon in a broad range of applications. This talk is built around the observation that "Empirical Hardness Models" - statistical models that predict algorithm runtime on novel instances from a given distribution - work surprisingly well. These models can serve as powerful tools for algorithm design, specifically by facilitating automated methods for algorithm design and for constructing algorithm portfolios. They also offer a statistical alternative to beyond-worst-case analysis and a starting point for theoretical investigations.

bio at http://www.cs.ubc.ca/~kevinlb/bio.html

In analysis, a convolution of two functions usually results in a smoother, better behaved function. Given two morphisms f,g from algebraic varieties X,Y to an algebraic group G, one can define a notion of convolution of these morphisms. Analogously to the analytic situation, this operation yields a morphism (from X x Y to G) with improved smoothness properties.

In this talk, I will define a convolution operation and discuss some of its properties. I will then present a recent result; if G is an algebraic group, X is smooth and absolutely irreducible, and f:X-->G is a dominant map, then after finitely many self convolutions of f, we obtain a morphism with the property of being flat with fibers of rational singularities (a property which we call (FRS)).

Moreover, Aizenbud and Avni showed that the (FRS) property has an equivalent analytic characterization, which leads to various applications such as counting points of schemes over finite rings, representation growth of certain compact p-adic groups and arithmetic groups of higher rank, and random walks on (algebraic families of) finite groups. We will discuss some of these applications, and maybe some of the main ideas of the proof of the above result.

Joint with Yotam Hendel.

Random surfaces in statistical physics are commonly modeled by a real-valued function phi on a lattice, whose probability density penalizes nearest-neighbor fluctuations. Precisely, given an even function V, termed the potential, the energy H(phi) is computed as the sum of V over the nearest-neighbor gradients of phi, and the probability density of phi is set proportional to exp(-H(phi)). The most-studied case is when V is quadratic, resulting in the so-called Gaussian free field. Brascamp, Lieb and Lebowitz initiated in 1975 the study of the global fluctuations of random surfaces for other potential functions and noted that understanding is lacking even in the case of the quartic potential, V(x)=x^4. We will review the state of the art for this problem and present recent work with Alexander Magazinov which finally settles the question of obtaining upper bounds for the fluctuations for the quartic and many other potential functions.

Fast and robust three-dimensional reconstruction of facial geometric structure from a single image is a challenging task with numerous applications in computer vision and graphics. We propose to leverage the power of convolutional neural networks (CNNs) to produce highly detailed face reconstruction directly from a single image. For this purpose, we introduce an end-to-end CNN framework which constructs the shape in a coarse-to-fine fashion. The proposed architecture is composed of two main blocks, a network which recovers the coarse facial geometry (CoarseNet), followed by a CNN which refines the facial features of that geometry (FineNet). To alleviate the lack of training data for face reconstruction, we train our model using synthetic data as well as unlabeled facial images collected from the internet. The proposed model successfully recovers facial geometries from real images, even for faces with extreme expressions and under varying lighting conditions. In this talk, I will summarize three papers that were published at 3DV 2016, CVPR 2017 (as an oral presentation), and ICCV 2017.

Bio: Matan Sela holds a Ph.D in Computer Science from the Technion - Israel Institute of Technology. He received B.Sc. and M.Sc. (both with honors) in electrical engineering, both from The Technion - Israel Institute of Technology in 2012 and 2015, respectively. During summer 2017, he was a research intern at Google, Mountain View, California. His interests are Machine Learning, Computer Vision, Computer Graphics, Geometry Processing and any combination of thereof.

The Baum-Bott indexes are important local invariants for singular holomorphic foliations by curves with isolated singularities. On a compact complex manifold, we consider foliations with fixed cotangent bundle. Then, the Baum-Bott map associates to a foliation its Baum-Bott indexes on its singularities. We focus on foliations on the projective space and we are interested in the generic rank of that map. The generic rank for foliations on the projective plane is known. For high-dimensional projectives spaces, we give an upper bound and in some cases we determine the generic rank.

This paper makes progress on the problem of explicitly constructing a binary tree code with constant distance and constant alphabet size.

For every constant delta < 1 we give an explicit binary tree code with distance delta and alphabet size poly(log n), where n is the depth of the tree. This is the first improvement over a two-decade-old construction that has an exponentially larger alphabet of size poly(n).

As part of the analysis, we prove a bound on the number of positive integer roots a real polynomial can have in terms of its sparsity with respect to the Newton basis---a result of independent interest.

Joint work with G. Cohen and B. Haeupler

**David Jerison:** Localization of eigenfunctions via an effective potential

We discuss joint work with Douglas Arnold, Guy David, Marcel Filoche and Svitlana Mayboroda. Consider the Neumann boundary value problem for the operator $L u = - \mbox{div} (A \nabla u) + Vu$ on a Lipschitz domain $\Omega$ and, more generally, on a manifold with or without boundary. The eigenfunctions of $L$ are often localized, as a result of disorder of the potential $V$, the matrix of coefficients $A$, irregularities of the boundary, or all of the above. In earlier work, Filoche and Mayboroda introduced the function $u$ solving $Lu = 1$, and showed numerically that it strongly reflects this localization. Here, we deepen the connection between the eigenfunctions and this {\em landscape} function $u$ by proving that its reciprocal $1/u$ acts as an {\em effective potential}. The effective potential governs the exponential decay of the eigenfunctions of the system and delivers information on the distribution of eigenvalues near the bottom of the spectrum.

**Ron Rosenthal:** Eigenvector correlation in the complex Ginibre ensemble

The complex Ginibre ensemble is a non-Hermitian random matrix on C^N with i.i.d. complex Gaussian entries normalized to have mean zero and variance 1=N. Unlike the Gaussian unitary ensemble, for which the eigenvectors are orthogonal, the geometry of the eigenbases of the Ginibre ensemble are not particularly well understood. We will discuss a some results regarding the analytic and algebraic structure of eigenvector correlations in this matrix ensemble. In particular, we uncover an extended algebraic structure which describes the asymptotic behavior (as N goes to infinity) of these correlations. Our work extends previous results of Chalker and Mehlig [CM98], in which the correlation for pairs of eigenvectors was computed. Based on a joint work with Nick Crawford.

Recent work has shown impressive success in automatically creating new images with desired properties such as transferring painterly style, modifying facial expressions or manipulating the center of attention of the image. In this talk I will discuss two of the standing challenges in image synthesis and how we tackle them:

- I will describe our efforts in making the synthesized images more photo-realistic.

- I will further show how we can broaden the scope of data that can be used for training synthesis networks, and with that provide a solution to new applications.

This talk will include an introduction to the topic of V(D)J rearrangements of particular subsets of T cells and B cells of the adaptive human immune system, in particular of IgG heavy chains. There are many statistical problems that arise in trying to understand these cells. They involve estimating aspects of functionals of discrete probabilities on (random) finite sets. Topics include but are not limited to exchangeability, estimating non-centrality parameters, and estimating covariance matrices from what are called "replicates" that have been amplified by the PCR process and (partially) sequenced.

I have received considerable assistance from Lu Tian, and also Yi Liu; as well, I have been helped considerably by Andrew Fire and Scott Boyd, and also Jorg Goronzy

Single cells and cells in a tissue respond to stimuli by deforming, changing their shape, and/or moving. Some of these responses can be understood from the underlying biochemical signaling, a topic that has been of interest to both biologists and modelers. In our recent work, my group has studied the link between mechanical tension on cells and their internal chemical signaling. (Our primary focus has been, and remains, that of Rho proteins.) Here I will describe a simple "toy model" that captures key aspects of what is known biologically. The model is simple enough to understand mathematically, and yet capable fo displaying several regimes of behavior consistent with experimental observations. I describe how we investigated the model in a single cell, and how this was then used to capture multiple cells that interact with one another mechanically. We find waves of expansion and contraction that sweep through the model "tissue" is certain parameter regimes. This work is joint with Cole Zmurchok and Dhanajay Bhaskar.

We study the problem of identifying correlations in multivariate data, under information constraints:

Either on the amount of memory that can be used by the algorithm, or the amount of communi- cation when the data is distributed across several machines. We prove a tight trade-off between the memory/communication complexity and the sample complexity, implying (for example) that to detect pairwise correlations with optimal sample complexity, the number of required mem-ory/communication bits is at least quadratic in the dimension. Our results substantially improve those of Shamir (2014), which studied a similar question in a much more restricted setting. To the best of our knowledge, these are the first provable sample/memory/communication trade-offs for a practical estimation problem, using standard distributions, and in the natural regime where the memory/communication budget is larger than the size of a single data point. To derive our theorems, we prove a new information-theoretic result, which may be relevant for studying other information-constrained learning problems.

Joint work with Ohad Shamir

Non-signaling strategies are collections of distributions with certain non-local correlations that have been studied recently in the context of delegation of computation.

In this talk I will discuss a recent result showing that the Hadamard based PCP of [ALMSS'92] is sound against non-signaling strategies. As part of the proof, we study the classical problem of linearity testing [BLR'93] in the setting of non-signaling strategies, and prove that any no-signaling strategy that passes the linearity test with high probability must be close to a quasi-distribution over linear functions.

Joint work with Alessandro Chiesa and Peter Manohar (UC Berkeley).

Consider a sequence of growing graphs converging locally weakly to an infinite (possibly random) tree. As there are uncountably many Ising and Potts Gibbs measures on the limiting tree in the low-temperature regime it is not apriori clear whether the local weak limit of such measures exists and if so, identifying the limit remains a challenge. In this talk, I will describe these limits. The talk is based on joint works with Amir Dembo and Allan Sly.

Recently, there has been increasing interest to leverage the competence of neural networks to analyze data. In particular, new clustering methods that employ deep embeddings have been presented.

In this talk, we depart from centroid-based models and suggest a new framework, called Clustering-driven deep embedding with PAirwise Constraints (CPAC), for non-parametric clustering using a neural network. We present a clustering-driven embedding based on a Siamese network that encourages pairs of data points to output similar representations in the latent space. Our pair-based model allows augmenting the information with labeled pairs to constitute a semi-supervised framework. Our approach is based on analyzing the losses associated with each pair to refine the set of constraints.

We show that clustering performance increases when using this scheme, even with a limited amount of user queries.

We present state-of-the-art results on different types of datasets and compare our performance to parametric and non-parametric techniques.

Our first theorem is a hierarchy theorem for the query complexity of testing graph properties with one-sided error; more precisely, we show that for every sufficiently fast-growing function f from (0,1) to the natural numbers, there is a graph property whose one-sided-error query complexity is precisely f(\Theta(\epsilon)). No result of this type was previously known for any f which is super-polynomial. Goldreich [ECCC 2005] asked to exhibit a graph property whose query complexity is exponential in 1/\epsilon. Our hierarchy theorem partially resolves this problem by exhibiting a property whose one-sided-error query complexity is exponential in 1/\epsilon. We also use our hierarchy theorem in order to resolve a problem raised by Alon and Shapira [STOC 2005] regarding testing relaxed versions of bipartiteness.

Our second theorem states that for any function f there is a graph property whose one-sided-error query complexity is at least f(\epsilon) while its two-sided-error query complexity is only polynomial in 1/\epsilon. This is the first indication of the surprising power that two-sided-error testing algorithms have over one-sided-error ones, even when restricted to properties that are testable with one-sided error. Again, no result of this type was previously known for any f that is super-polynomial.

The above theorems are derived from a graph theoretic result which we think is of independent interest, and might have further applications. Alon and Shikhelman [JCTB 2016] introduced the following generalized Turan problem: for fixed graphs H and T, and an integer n, what is the maximum number of copies of T, denoted by ex(n,T,H), that can appear in an n-vertex H-free graph? This problem received a lot of attention recently, with an emphasis on T = C_3, H=C_{2m+1}. Our third theorem gives tight bounds for ex(n,C_k,C_m) for all the remaining values of k and m.

Joint work with Asaf Shapira.

Non-Euclidean, or incompatible elasticity is an elastic theory for bodies that do not have a reference (stress-free) configuration. It applies to many systems, in which the elastic body undergoes inhomogeneous growth (e.g. plants, self-assembled molecules). Mathematically, it is a question of finding the "most isometric" immersion of a Riemannian manifold (M,g) into Euclidean space of the same dimension, by minimizing an appropriate energy functional.

Much of the research in non-Euclidean elasticity is concerned with elastic bodies that have one or more slender dimensions (such as leaves), and finding appropriate dimensionally-reduced models for them.

In this talk I will give an introduction to non-Euclidean elasticity, and then focus on thin bodies and present some recent results on the relations between their elastic behavior and their curvature.

Based on a joint work with Asaf Shachar.

We prove a dichotomy theorem for two-party protocols, and show that for every poly-time two-party protocol with single-bit output, at least one of following holds:

- The protocol can be used to construct a key-agreement protocol.
- For every constant ρ > 0 the parties' output is ρ -uncorrelated: let (X; Y; T) denote the parties' outputs and the protocol's transcript respectively. A protocol is &rho -uncorrelated if there exists an efficient "decorralizer" algorithm Decor, that when given a random transcript T, produces two numbers P
_{A}; P_{B}, such that no efficient algorithm can distinguish (U_{PS};U_{PB}; T) (where Up denotes a biassed coin with bias ρ from (X; Y; T), with distinguishing advantage larger than ρ.

Namely, if the protocol cannot be used to construct key-agreement, then its output distribution (X; Y; T) is trivial: it can be simulated non-interactively by the parties given public randomness (used to sample T). (The precise statement also has qualifiers of the form: "on infinitely many choices of the security parameter").

We use the above characterization to prove that (α= 24ε^{2})-correct differentially private symmetric protocol for computing XOR, implies the existence of key-agreement protocol. The above dependency between α and &epsilon is tight since an θ( ε^{2})-correct "-differentially private protocol for computing XOR is known to exists unconditionally. It also improves, in the ( ε,α)dependency aspect, upon Goyal et al. [ICALP '16] who showed that, for some constant c > 0, a c-correct "-differentially private protocol for computing XOR implies oblivious transfer. Our result extends to a weaker notion of di erential privacy in which the privacy only requires to hold against external observer. Interestingly, the reductions used for proving the above results are non black box.

Joint work with: Eran Omri and Kobbi Nissim and Ronen Shaltiel and Jad Silbak

Talk 1: David Ellis (Queen Mary U)

**Title: The edge-isoperimetric problem for antipodally symmetric subsets of the discrete cube.**

Abstract: A major open problem in geometry is to solve the isoperimetric problem for n-dimensional real projective space, i.e. to determine, for each real number V, the minimum possible size of the boundary of a (well-behaved) set of volume V, in n-dimensional real projective space. We study a discrete analogue of this question: namely, among all antipodally symmetric subsets of {0,1}^n of fixed size, which sets have minimal edge-boundary? We obtain a complete answer to the second question. This is joint work with Imre Leader (Cambridge)

Talk 2: Benjamin Fehrman (Max Planck Institute)

**Title: Well-posedness of stochastic porous media equations with nonlinear, conservative noise.**

Abstract: In this talk, which is based on joint work with Benjamin Gess, I will describe a pathwise well-posedness theory for stochastic porous media equations driven by nonlinear, conservative noise. Such equations arise in the theory of mean field games, as an approximation to the Dean-Kawasaki equation in fluctuating hydrodynamics, to describe the fluctuating hydrodynamics of a zero range process, and as a model for the evolution of a thin film in the regime of negligible surface tension. Our methods are loosely based on the theory of stochastic viscosity solutions, where the noise is removed by considering a class of test functions transported along underlying stochastic characteristics. We apply these ideas after passing to the equation's kinetic formulation, for which the noise enters linearly and can be inverted using the theory of rough paths.

The harnessing of modern computational abilities for many-body wave-function representations is naturally placed as a prominent avenue in contemporary condensed matter physics. Specifically, highly expressive computational schemes that are able to efficiently represent the entanglement properties which characterize many-particle quantum systems are of interest. In the seemingly unrelated field of machine learning, deep network architectures have exhibited an unprecedented ability to tractably encompass the convoluted dependencies which characterize hard learning tasks such as image classification or speech recognition. However, theory is still lagging behind these rapid empirical advancements, and key questions regarding deep learning architecture design have no adequate answers. In the presented work, we establish a Tensor Network (TN) based common language between the two disciplines, which allows us to offer bidirectional contributions. By showing that many-body wave-functions are structurally equivalent to mappings of convolutional and recurrent arithmetic circuits, we construct their TN descriptions in the form of Tree and Matrix Product State TNs, and bring forth quantum entanglement measures as natural quantifiers of dependencies modeled by such networks. Accordingly, we propose a novel entanglement based deep learning design scheme that sheds light on the success of popular architectural choices made by deep learning practitioners, and suggests new practical prescriptions. Specifically, our analysis provides prescriptions regarding connectivity (pooling geometry) and parameter allocation (layer widths) in deep convolutional networks, and allows us to establish a first of its kind theoretical assertion for the exponential enhancement in long term memory brought forth by depth in recurrent networks. In the other direction, we identify that an inherent re-use of information in state-of-the-art deep learning architectures is a key trait that distinguishes them from TN based representations. Therefore, we suggest a new TN manifestation of information re-use, which enables TN constructs of powerful architectures such as deep recurrent networks and overlapping convolutional networks. This allows us to theoretically demonstrate that the entanglement scaling supported by state-of-the-art deep learning architectures can surpass that of commonly used expressive TNs in one dimension, and can support volume law entanglement scaling in two dimensions with an amount of parameters that is a square root of that required by Restricted Boltzmann Machines. We thus provide theoretical motivation to shift trending neural-network based wave-function representations closer to state-of-the-art deep learning architectures.

Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. In this talk I will introduce our neural relational inference model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks.

I will describe work with C.E.Wayne in which we study how dissipation leads to a very slow drift in secular parameters for the Non-linear Schroedinger equation. The nice thing about this model is that one can describe a relatively complex phenomenon in almost explicit terms as a perturbation from an integrable Hamiltonian system

The advancement in quantum computing, where Google, IBM, Microsoft, Intel are competing in the (exponentially growing) number of qubits in their (some already) commercial quantum computers that they produce, requires the reexamination of the Internet Security, and the public key infrastructure. The talk will describe the concept of overlay security together with blockchain based directories for establishing symmetric keys. When combined with nested Lamport signature and Merkle trees for digital signatures the result is a complete, easily implementable architecture with information theoretically secure communication, and hash based signatures.

**Mark Rudelson (UMich)**

**Title**: Invertibility of the adjacency matrices of random graphs.

**Abstract**: Consider an adjacency matrix of a bipartite, directed, or undirected Erdos-Renyi random graph. If the average degree of a vertex is large enough, then such matrix is invertible with high probability. As the average degree decreases, the probability of the matrix being singular increases, and for a sufficiently small average degree, it becomes singular with probability close to 1. We will discuss when this transition occurs, and what the main reason for the singularity of the adjacency matrix is.

This is a joint work with Anirban Basak.

**Yinon Spinka (TAU)**

**Title**: Finitary codings of Markov random fields

**Abstract**: Let X be a stationary Z^d-process. We say that X can be coded by an i.i.d. process if there is a(deterministic and translation-invariant) way to construct a realization of X from i.i.d. variables associated to the sites of Z^d. That is, if there is an i.i.d. process Y and a measurable map F from the underlying space of Y to that of X, which commutes with translations of Z^d and satisfies that F(Y)=X in distribution. Such a coding is called finitary if, in order to determine the value of X at a given site, one only needs to look at a finite (but random) region of Y.

It is known that a phase transition (existence of multiple Gibbs states) is an obstruction for the existence of such a finitary coding. On the other hand, we show that when X is a Markov random field satisfying certain spatial mixing conditions, then X can be coded by an i.i.d. process in a finitary manner. Moreover, the coding radius has exponential tails, so that typically the value of X at a given site is determined by a small region of Y.

We give applications to models such as the Potts model, proper colorings and the hard-core model.

Image restoration algorithms are typically evaluated by some distortion measure (e.g. PSNR, SSIM, IFC, VIF) or by human opinion scores that quantify perceived perceptual quality. In this work, we prove mathematically that distortion and perceptual quality are at odds with each other. Specifically, we study the optimal probability for correctly discriminating the outputs of an image restoration algorithm from real images. We show that as the mean distortion decreases, this probability must increase (indicating worse perceptual quality). As opposed to the common belief, this result holds true for any distortion measure, and is not only a problem of the PSNR or SSIM criteria. However, as we show experimentally, for some measures it is less severe (e.g. distance between VGG features). We also show that generative-adversarial-nets (GANs) provide a principled way to approach the perception-distortion bound. This constitutes theoretical support to their observed success in low-level vision tasks. Based on our analysis, we propose a new methodology for evaluating image restoration methods, and use it to perform an extensive comparison between recent super-resolution algorithms. Our study reveals which methods are currently closest to the theoretical perception-distortion bound.

* Joint work with Yochai Blau.

We present novel oblivious routing algorithms for the splittable and the unsplittable multicommodity flow settings. Our algorithms for both models improve upon the state-of-the-art, in terms of running time and performance with respect to graphs that exhibit good expansion guarantees. As an intermediate step towards the unsplittable setting, we present a novel generalization of Valiant's classical load balancing scheme for packet-switched networks to arbitrary graphs, which is of independent interest. Our approach relies on diffusing traffic throughout the network and then regathering it to its destination, via iterative applications of the random walk operator. Consequently, the performance guarantees of our algorithms are derived from the convergence of the random walk operator to the stationary distribution and are expressed in terms of the spectral gap of the graph (which dominates the mixing time).

Genome-wide forces of mutation and selection create local and global sequence patterns that carry information on functional role of genomic DNA. I will describe algorithmic approaches to genome analysis that decode sequence patterns and identify features of structural gene organization and gene expression (e.g. leaderless transcription). The algorithms make unsupervised inference of the structure of underlying graphical model and its parameters.

The subsequently developed software tools were, among other applications, used at NCBI (Bethesda, MD) to annotate and re-annotate 2,500+ fungal genomes and 130,000+ prokaryotic genomes. Yet another algorithm was employed by DOE JGI (Walnut Creek, CA) to annotate the largest collection of metagenomes.

Speaker s short Bio

Mark Borodovsky, PhD, a Regents' Professor at the Joint Georgia Tech and Emory University Department of Biomedical Engineering and the School of Computational Science and Engineering. Since 1990 when he arrived to Atlanta from Moscow, Dr Borodovsky had also joint faculty appointments at the School of Biology and the School of Mathematics. Dr. Borodovsky is a Founder of the Georgia Tech Bioinformatics graduate program.

Suppose a language L can be decided by a bounded-error randomized algorithm that runs in space S and time n * poly(S). We give a randomized algorithm for L that still runs in space O(S) and time n * poly(S) that uses only O(S) random bits; our algorithm has a low failure probability on all but a negligible fraction of inputs of each length. An immediate corollary is a deterministic algorithm for L that runs in space O(S) and succeeds on all but a negligible fraction of inputs of each length. We also discuss additional complexity-theoretic applications of our technique.

Points of infinite multiplicity are particular points which the Brownian motion visits infinitely often. Following a work of Bass, Burdzy and Khoshnevisan, we construct and study a measure carried by these points. Joint work with Yueyun Hu and Zhan Shi.

Scattering effects in images, including those related to haze, fog, and appearance of clouds, are fundamentally dictated by microphysical characteristics of the scatterers. We define and derive recovery of these characteristics, in a three-dimensional heterogeneous medium. Recovery is based on a novel tomography approach. Multiview (multi-angular) and multi-spectral data are linked to the underlying microphysics using 3D radiative transfer, accounting for multiple-scattering. Despite the nonlinearity of the tomography model, inversion is enabled using a few approximations that we describe. As a case study, we focus on passive remote sensing of the atmosphere, where scatterer retrieval can benefit modeling and forecasting of weather, climate, and pollution.

We present a new way to derive the replica symmetric solution for the free energy in mean-field spin glasses. Only the Sherrington-Kirpatrick case has been worked out in details, but the method also works in other cases, for instance for the perceptron (work in progress), and probably also for the Hopfield net. The method is closely related to the TAP equations (for Thouless-Anderson-Palmer). It does not give any new results, presently, but it gives a new viewpoint, and it looks to be quite promising. As the TAP equations are widely discussed in the physics literature, also at low temperature, it is hoped that the method could be extended to this case, too. But this is open, and probably very difficult.

We will introduce the (new) notion of approximability in triangulated categories and show its power.

The brief summary is that the derived category of quasicoherent sheaves on a separated, quasicompact scheme is an approximable triangulated category.

As relatively easy corollaries one can: (1) prove an old conjecture of Bondal and Van den Bergh, about strong generation in D^{perf}(X), (2) generalize an old theorem of of Rouquier about strong generation in D^b_{coh}(X). Rouquier proved the result only in equal characteristic, we can extend to mixed characteristic, and (3) generalize a representability theorem of Bondal and Van den Bergh,from proper schemes of finite type over fields to proper schemes of finite type over any noetherian rings.

After stating these results and explaining what they mean, we will (time permitting) also mention structural theorems. It turns out that approximable triangulated categories have a fair bit of intrinsic, internal structure that comes for free.

Recent success stories of using machine learning for diagnosing skin cancer, diabetic retinopathy, pneumonia, and breast cancer may give the impression that artificial intelligence (AI) is on the cusp of radically changing all aspects of health care. However, many of the most important problems, such as predicting disease progression, personalizing treatment to the individual, drug discovery, and finding optimal treatment policies, all require a fundamentally different way of thinking. Specifically, these problems require a focus on *causality* rather than simply prediction. Motivated by these challenges, my lab has been developing several new approaches for causal inference from observational data. In this talk, I describe our recent work on the deep Markov model (Krishnan, Shalit, Sontag AAAI '17) and TARNet (Shalit, Johansson, Sontag, ICML '17).

Recent success stories of using machine learning for diagnosing skin cancer, diabetic retinopathy, pneumonia, and breast cancer may give the impression that artificial intelligence (AI) is on the cusp of radically changing all aspects of health care. However, many of the most important problems, such as predicting disease progression, personalizing treatment to the individual, drug discovery, and finding optimal treatment policies, all require a fundamentally different way of thinking. Specifically, these problems require a focus on *causality* rather than simply prediction. Motivated by these challenges, my lab has been developing several new approaches for causal inference from observational data. In this talk, I describe our recent work on the deep Markov model (Krishnan, Shalit, Sontag AAAI '17) and TARNet (Shalit, Johansson, Sontag, ICML '17).

The generalized Jacobian Jac_m(C ') of a smooth hyperelliptic curve C' associated with a module m is an algebraic group that can be described by using lines bundle of the curve C' or by using a symmetric product of the curve C' provided with a law of composition. This second definition of the Jacobian Jac_m(C') is directly related to the fibres of a Mumford system. To be precise it is a subset of the compactified Jac_m(C') which is related to the fibres. This presentation will help us to demystify the relationship of these two mathematical objects.

We show that any area-preserving C^{r}-diffeomorphism of a two-dimensional surface displaying an elliptic fxed point can be C^{r}-perturbed to one exhibiting a chaotic island whose metric entropy is positive, for every 1 ≤ r≤ 1. This proves a conjecture of Herman stating that the identity map of the disk can be C^{ ∞}-perturbed to a conservative di eomorphism with positive metric entropy. This implies also that the Chirikov standard map for large and small parameter values can be C^{∞}- approximated by a conservative diffeomorphisms displaying a positive metric entropy (a weak version of Sinai's positive metric entropy conjecture). Finally, this sheds light onto a Herman's question on the density of C^{r}-conservative di eomorphisms displaying a positive metric entropy: we show the existence of a dense set formed by conservative diffeomorphisms which either are weakly stable (so, conjecturally, uniformly hyperbolic) or display a chaotic island of positive metric entropy.

This is a joint work with Pierre Berger.

Multiple wireless sensing tasks, e.g. radar detection for driver safety, involve estimating the "channel" or relationship between signal transmitted and received.

In this talk I will tell about the standard math model for the radar channel. In the case where the channel is sparse, I will demonstrate a channel estimation algorithm that is sub-linear in sampling and arithmetic complexity (and convince you of the need for such).

The main ingredients in the algorithm will be the use of an intrinsic algebraic structure known as the Heisenberg group and recent developments in the theory of the sparse Fast Fourier Transform (sFFT, due to Indyk et al.)

The talk will assume minimal background knowledge.

Understanding the ways in which the Brain gives rise to the Mind (memory, behavior, cognition, intelligence, language) is the most challengingproblem facing science today. As the answer seems likely to be at least partly computational, computer scientists should work on this problem --- except there is no obvious place to start. I shall recount recent work (with W. Maass and S. Vempala) on a model for the formation and association of memories in humans, and reflect on why it may be a clue about language.

Variational problems in geometry, fabrication, learning, and related applications lead to structured numerical optimization problems for which generic algorithms exhibit inefficient or unstable performance. Rather than viewing the particularities of these problems as barriers to scalability, in this talk we embrace them as added structure that can be leveraged to design large-scale and efficient techniques specialized to applications with geometrically structure variables. We explore this theme through the lens of case studies in surface parameterization, optimal transport, and multi-objective design

A long line of research studies the space complexity of estimating a norm l(x) in the data-stream model, i.e., when x is the frequency vector of an input stream consisting of insertions and deletions of items of n types. I will focus on norms l (in R^n) that are *symmetric*, meaning that l is invariant under sign-flips and coordinate-permutations, and show that the streaming space complexity is essentially determined by the measure-concentration characteristics of l. These characteristics are known to govern other phenomena in high-dimensional spaces, such as the critical dimension in Dvoretzky's Theorem.

The family of symmetric norms contains several well-studied norms, such as all l_p norms, and indeed we provide a new explanation for the disparity in space complexity between p<=2 and p>2. We also obtain bounds for other norms that are useful in applications.

Joint work with Jaroslaw Blasiok, Vladimir Braverman, Stephen R. Chestnut, and Lin F. Yang.

Multi-finger caging offers a robust approach to object grasping. To securely grasp an object, the fingers are first placed in caging regions surrounding a desired immobilizing grasp. This prevents the object from escaping the hand, and allows for great position uncertainty of the fingers relative to the object. The hand is then closed until the desired immobilizing grasp is reached.

While efficient computation of two-finger caging grasps for polygonal objects is well developed, the computation of three-finger caging grasps has remained a challenging open problem. We will discuss the caging of polygonal objects using three-finger hands that maintain similar triangle finger formations during the grasping process. While the configuration space of such hands is four dimensional, their contact space which represents all two and three finger contacts along the grasped object's boundary forms a two-dimensional stratified manifold.

We will present a caging graph that can be constructed in the hand's relatively simple contact space. Starting from a desired immobilizing grasp of the object by a specific triangular finger formation, the caging graph is searched for the largest formation scale value that ensures a three-finger cage about the object. This value determines the caging regions, and if the formation scale is kept below this value, any finger placement within the caging regions will guarantee a robust object grasping.

An extremal metric, as defined by Calabi, is a canonical Kahler metric: it minimizes the curvature within a given Kahler class. According to the Yau-Tian-Donaldson conjecture, polarized Kahler manifolds admitting an extremal metric should correspond to stable manifolds in a Geometric Invariant Theory sense.

In this talk, we will explain that a projective extremal Kahler manifold is asymptotically relatively Chow stable. This fact was conjectured by Apostolov and Huang, and its proof relies on quantization techniques. We will explain various implications, such that unicity or splitting results for extremal metrics.

Joint work with Yuji Sano ( Fukuoka University).

A curious property of randomized log-space search algorithms is that their outputs are often longer than their workspace. One consequence is that there is no clear way to reproduce the same output when running the algorithm twice on the same input. It is not feasible to store the random bits (or the output) of the previous run in log-space, and using new random bits in another execution can result in a different output. This leads to the question: how can we reproduce the results of a randomized log space computation of a search problem?

We will give a precise definition of this notion of "reproducibility". Then we will show that every problem in search-RL has a randomized log-space algorithm where the output can be reproduced. Reproducibility can be thought of as an extension of pseudo-determinism. Indeed, for some problems in search-RL we show pseudo-deterministic algorithms whose running time significantly improve on known deterministic algorithms.

Joint work with Yang Liu.

Oren Louidor (Technion)

**Title: Dynamical freezing in a spin-glass with logarithmic correlations**.

Abstract: We consider a continuous time random walk on the 2D torus, governed by the exponential of the discrete Gaussian free field acting as potential. This process can be viewed as Glauber dynamics for a spin-glass system with logarithmic correlations. Taking temperature to be below the freezing point, we then study this process both at pre-equilibrium and in-equilibrium time scales. In the former case, we show that the system exhibits aging and recover the arcsine law as asymptotics for a natural two point temporal correlation function. In the latter case, we show that the dynamics admits a functional scaling limit, with the limit given by a variant of Kolmogorov's K-process, driven by the limiting extremal process of the field, or alternatively, by a super-critical Liouville Brownian motion. Joint work with A. Cortines, J. Gold and A. Svejda.

Alexander Glazman (Tel Aviv)

**Title: Level lines of a random Lipschitz function**

Abstract: We consider the uniform distribution on Lipschitz functions on the triangular lattice, i.e. all integer-valued functions which differ by 0 or 1 on any two adjacent vertices. We show that with a positive probability such a function exhibits macroscopic level lines. Instead of working directly with Lipschitz functions we map this model to the loop $O(2)$ model with parameter $x=1$. The loop $O(n)$ model is a model for a random collection of non-intersecting loops on the hexagonal lattice, which is believed to be in the same universality class as the spin $O(n)$ model.

A main tool in the proof is a positive association (FKG) property that was recently shown to hold when $n \ge 1$ and $0<x\le\frac{1}{\sqrt{n}}$. Though the case $n=2$, $x=1$ is not in the FKG regime, it turns out that when loops are assigned one of two colours independently the marginal on loops of either of the colours does satisfy the FKG property. The colouring of loops allows to view the loop $O(2)$ model with $x=1$ as coupling of two percolation configurations. Studying each of them independently and using the XOR operation we establish existence of macroscopic loops, i.e. level lines in the original setting.

(Based on a joint work with H. Duminil-Copin, and I. Manolescu)

In unsupervised domain mapping, the learner is given two unmatched datasets A and B. The goal is to learn a mapping G_AB that translates a sample in A to the analog sample in B. Recent approaches have shown that when learning simultaneously both G_AB and the inverse mapping G_BA, convincing mappings are obtained. In this work, we present a method of learning G_AB without learning G_BA. This is done by learning a mapping that maintains the distance between a pair of samples. Moreover, good mappings are obtained, even by maintaining the distance between different parts of the same sample before and after mapping. We present experimental results that the new method not only allows for one sided mapping learning, but also leads to preferable numerical results over the existing circularity-based constraint.

In formal verification, one uses mathematical tools in order to prove that a system satisfies a given specification.

In this talk I consider three limitations of traditional formal verification:

The first is the fact that treating systems as finite-state machines is problematic, as systems become more complex, and thus we must consider infinite-state machines.

The second is that Boolean specifications are not rich enough to specify properties required by today's systems. Thus, we need to go beyond the Boolean setting.

The third concerns certifying the results of verification: while a verifier may answer that a system satisfies its specification, a designer often needs some palpable evidence of correctness.

I will present several works addressing the above limitations, and the mathematical tools involved in overcoming them.

No prior knowledge is assumed. Posterior knowledge is guaranteed.

We develop the theory of semisimplifications of tensor categories defined by Barrett and Westbury. By definition, the semisimplification of a tensor category is its quotient by the tensor ideal of negligible morphisms, i.e., morphisms f such that Tr(fg)=0 for any morphism g in the opposite direction. In particular, we compute the semisimplification of the category of representations of a finite group in characteristic p in terms of representations of the normalizer of its Sylow p-subgroup. This allows us to compute the semisimplification of the representation category of the symmetric group S_{n+p} in characteristic p, where n=0,...,p-1, and of the Deligne category Rep^{ab} S_t, t in N. We also compute the semisimplification of the category of representations of the Kac-De Concini quantum group of the Borel subalgebra of sl_2. Finally, we study tensor functors between Verlinde categories of semisimple algebraic groups arising from the semisimplification construction, and objects of finite type in categories of modular representations of finite groups (i.e., objects generating a fusion category in the semisimplification).

This is joint work with Victor Ostrik.

In the area of distributed graph algorithms a number of network's entities with local views solve some computational task by exchanging messages with their neighbors. Quite unfortunately, an inherent property of most existing distributed algorithms is that throughout the course of their execution, the nodes get to learn not only their own output but rather learn quite a lot on the inputs or outputs of many other entities. This leakage of information might be a major obstacle in settings where the output (or input) of network's individual is a private information (e.g. distributed networks of selfish agents, decentralized digital currency such as Bitcoin, voting systems).

While being quite unfamiliar notion in the classical distributed setting, the notion of secure multi-party computation (MPC) is one of the main themes in the Cryptography community. Yet despite all extensive work in the area, no existing algorithm fits the framework of classical distributed models in which there are no assumptions on the graph topologies and only messages of bounded size are sent on the edges in each round.

In this work, we introduce a new framework for \emph{secure distributed graph algorithms} and provide the first \emph{general compiler} that takes any "natural" non-secure distributed algorithm that runs in $r$ rounds, and turns it into a secure algorithm that runs in $\widetilde{O}(r \cdot D \cdot poly(\Delta))$ rounds where $\Delta$ is the maximum degree in the graph and $D$ is its diameter. This round complexity is nearly optimal for bounded degree graphs.

The main technical part of our compiler is based on a new cycle cover theorem: We show that the edges of every bridgeless graph $G$ of diameter $D$ can be covered by a collection of cycles such that each cycle is of length $\widetilde{O}(D)$ and each edge of the graph $G$ appears in $\widetilde{O}(1)$ many cycles. This provides the basis for additional combinatorial constructions required by our compiler and might be of independent combinatorial and algorithmic interest.

Joint work with Merav Parter.

In recent years, robots have played an active role in everyday life: medical robots assist in complex surgeries, low-cost commercial robots clean houses and fleets of robots are used to efficiently manage warehouses. A key challenge in these systems is motion planning, where we are interested in planning a collision-free path for a robot in an environment cluttered with obstacles. While the general problem has been studied for several decades now, these new applications introduce an abundance of new challenges.

In this talk I will describe some of these challenges as well as algorithms developed to address them. I will overview general challenges such as compression and graph-search algorithms in the context of motion planning. I will show why traditional Computer Science tools are ill-suited for these problems and introduce alternative algorithms that leverage the unique characteristics of robot motion planning. In addition, I will describe domains-specific challenges such as those that arise when planning for assistive robots and for humanoid robots and overview algorithms tailored for these specific domains.

Language technology has become pervasive in everyday life, powering applications like Apple's Siri or Google's Assistant. Neural networks are a key component in these systems thanks to their ability to model large amounts of data. Contrary to traditional systems, models based on deep neural networks (a.k.a. deep learning) can be trained in an end-to-end fashion on input-output pairs, such as a sentence in one language and its translation in another language, or a speech utterance and its transcription. The end-to-end training paradigm simplifies the engineering process while giving the model flexibility to optimize for the desired task. This, however, often comes at the expense of model interpretability: understanding the role of different parts of the deep neural network is difficult, and such models are often perceived as "black-box". In this work, we study deep learning models for two core language technology tasks: machine translation and speech recognition. We advocate an approach that attempts to decode the information encoded in such models while they are being trained. We perform a range of experiments comparing different modules, layers, and representations in the end-to-end models. Our analyses illuminate the inner workings of end-to-end machine translation and speech recognition systems, explain how they capture different language properties, and suggest potential directions for improving them. The methodology is also applicable to other tasks in the language domain and beyond.

In this talk, I present an analogue of the Hardy-Littlewood conjecture on the asymptotic distribution of prime constellations in the setting of short intervals in function fields of smooth projective curves over finite fields.

I will discuss the definition of a "short interval" on a curve as an additive translation of the space of global sections of a sufficiently positive divisor E by a suitable rational function f, and show how this definition generalizes the definition of a short interval in the polynomial setting.

I will give a sketch of the proof which includes a computation of a certain Galois group, and a counting argument, namely, Chebotarev density type theorem.

This is a joint work with Tyler Foster.

Computational problems whose input is a program are central in Cryptography, as well as Complexity, Learning, and Optimization. The nature of such problems crucially depends on the way the program is accessed -- as a black box or explicitly by its implementation.

In which settings can we exploit code to gain an advantage over black-box access? In Cryptography, we explore this question from two opposing perspectives:

Protecting Code: Can we obfuscate a program's code so that its functionality is preserved but it is otherwise unintelligible? Intuitively, such obfuscated code reveals nothing more than black-box access to the program. Obfuscation is, therefore, a powerful tool with numerous applications in software protection and Cryptography.

Exploiting Code: Most security proofs in cryptography consist of a reduction that translates any potential attacker into an algorithm solving some underlying hard problem. While most security reductions only require black-box access to the attacker, for many applications black-box reductions are provably insufficient. Can we exploit the attacker's code to prove security where black-box reductions fail?

In this talk, I will describe new techniques for protecting and exploiting code, taking advantage of the inherent tension between these two tasks. I will also demonstrate applications of these techniques in and beyond cryptography.

Sebastien Bubeck (Microsoft Research)

**Title: k-server via multiscale entropic regularization**

Abstract: I will start by describing how mirror descent is a natural strategy for online decision making, specifically in online learning and metrical task systems. To motivate the k-server problem I will also briefly recall what we know and what we don't know for structured state/action spaces in these models. Using the basic mirror descent calculations I will show how to easily obtain a log(k)-competitive algorithm for k-paging. I will then introduce our new parametrization of fractional k-server on a tree, and explain how to analyze the movement cost of entropy-regularized mirror descent on this parametrization. This leads to a depth*log(k)-competitive (fractional) algorithm for general trees, and log^2(k) for HSTs. I will also briefly mention dynamic embeddings to go beyond the standard log(n) loss in the reduction from general metrics to HSTs.

Joint work with Michael B. Cohen, James R. Lee, Yin Tat Lee, and Aleksander Madry.

Percy Deift (Courant Institute )

**Title: Universality in numerical analysis with some examples of cryptographic algorithms.**

Abstract: We show that a wide variety of numerical algorithms with random data exhibit universality. Most of the results are computational, but in some important cases universality is established rigorously. We also discuss universality for some cryptographic algorithms.

Joint work with C. Pfrany, G. Menon, S. Olver, T. Trogdan and S. Miller.

Given a p-adic group G, number theorists are interested in producing admissible representations of G: representations which have a well-defined character functional. One way to produce such representations is by "Jacquet induction" from smaller groups, whose characters can be understood inductively. The complementary space of "new" characters which are not obtained by induction (complementary with respect to a natural metric on the space of characters) is given by what is called "elliptic" characters. Given a representation V of G, the "new" input from its character is captured by the operator Ax(V), with A (the Bernstein-Deligne-Kazhdan A-operator) the projector to the elliptic component (note that this is different from the component of the character lattice valued in elliptic elements). I will talk about my ongoing work with Xuhua He on extending this operator to a trace functional Ax(V) for V a finitely-generated representation (whose Grothendieck group is well understood), which works by first constructing a virtual elliptic admissible representation from any finitely generated representation.

Being the gradient flow of the area functional, the mean curvature flow can be thought of as a greedy algorithm for simplifying embedded shapes. But how successful is this algorithm?

In this talk, I will describe three examples for how mean curvature flow, as well as its variants and weak solutions, can be used to achieve this desired simplification.

The first is a short time smoothing effect of the flow, allowing to smooth out some rough, potentially fractal initial data.

The second is an application of mean curvature flow with surgery to smooth differential topology, allowing to conclude Schoenflies-type theorems about the moduli space of smooth embedded spheres and tori, satisfying some curvature conditions.

The third is an application of (weak,modified) mean curvature flow to differential geometry, allowing to relate bounds on the gaussian entropy functional to the topology of a closed hypersurface.

In this talk, which will assume no prior knowledge in PDE or mean curvature flow, I will try to highlight the relation between the analysis of the flow and in particular, its singularity formation, to both ''time dependent'' and "classical" geometry.

Some of the results described in this talk are joint works with Reto Buzano, Robert Haslhofer and Brian White.

We recall the notion of a hall algebra associated to a category, and explain how this construction can be done in a way that naturally includes a higher algebra structure, motivated by work of Toen and Dyckerhoff-Kapranov. We will then explain how this leads to new insights about the bi-algebra structure and related concepts.

The problem of computational super-resolution asks to recover high-frequency features of an object from the noisy and blurred/band-limited samples of its Fourier transform, based on some a-priori information about the object class. A core theoretical question is to quantify the possible amount of bandwidth extension and the associated stability of recovering the missing frequency components, as a function of the sample perturbation level and the object prior.

In this work we assume that the object has a compact space/time extent in one dimension (but otherwise can be fairly arbitrary), while the low-pass window can have a super-exponentially decaying "soft" shape (such as a Gaussian). In contrast, previously known results considered only the ideal "hard" window (a characteristic function of the band) and objects of finite energy. The super-resolution problem in this case is equivalent to a stable analytic continuation of an entire function of finite exponential type. We show that a weighted least-squares polynomial approximation with equispaced samples and a judiciously chosen number of terms allows one to have a super-resolution factor which scales logarithmically with the noise level, while the pointwise extrapolation error exhibits a Holder-type continuity with an exponent derived from weighted potential theory. The algorithm is asymptotically minimax, in the sense that there is essentially no better algorithm yielding meaningfully lower error over the same smoothness class.

The results can be considered as a first step towards analyzing the much more realistic model of a sparse combination of compactly-supported "approximate spikes", which appears in applications such as synthetic aperture radar, seismic imaging and direction of arrival estimation, and for which only limited special cases are well-understood.

Joint work with L.Demanet and H.Mhaskar.

The use of a computational PIR scheme has been instrumental in reducing interaction from interactive proofs, and in converting multi-prover interactive proofs to (single prover) 2-message computationally sound proofs (also known as arguments).

In this talk we will focus on the secrecy guarantees of this transformation.

We show that if we start with an interactive proof which is only *honest-verifier* zero-knowledge, and we use a quasi-poly secure *symmetric* PIR scheme (or a 2-message OT scheme) to reduce interaction, then the resulting 2-message argument is witness indistinguishable, and in the delayed-input setting it is distributional weak zero-knowledge (which implies strong witness indistinguishable and witness hiding in the delayed input setting). Moreover, under the same assumption (which can be instantiated from quasi-poly DDH/QR/N'th residuosity assumption), we construct a two-message argument with (similar) *statistical* secrecy guarantees. For the latter, we apply the PIR heuristic on a computationally sound proof, which is honest-verifier statistical zero-knowledge.

This is based on joint works with Abhishek Jain, Dakshita Khurana, Ron Rothblum and Amit Sahai.

Graph matching is one of the most well-studied problems in combinatorial optimization, with applications ranging from scheduling and object recognition to numerical analysis and computational chemistry.

Nevertheless, until recently very little was unknown about this problem in real-life **dynamic networks**, which aim to model the constantly changing physical world.

In the first part of the talk we'll discuss our work on dynamic graph matching, and in the second part we'll highlight our work on a few related problems.

Talk 1: Amir Dembo (Stanford)

**Title: Large deviations theory for chemical reaction networks.**

The microscopic dynamics of well-stirred networks of chemical reactions are modeled as jump Markov processes. At large volume, one may expect in this framework to have a straightforward application of large deviation theory. This is not at all true, for the jump rates are typically neither globally Lipschitz, nor bounded away from zero, with both blowup and absorption as quite possible scenarios. In joint work with Andrea Agazzi and Jean-Pierre Eckman, we utilize Lyapunov stability theory to bypass this challenge and to characterize a large class of network topologies that satisfy the full Wentzell-Freidlin theory of asymptotic rates of exit from domains of attraction.

Talk 2: Yuval Peres (Microsoft)

Title: **Trace reconstruction for the deletion channel**

Abstract: In the trace reconstruction problem, an unknown string $x$ of $n$ bits is observed through the deletion channel, which deletes each bit with some constant probability q, yielding a contracted string. How many independent outputs (traces) of the deletion channel are needed to reconstruct $x$ with high probability?

The best lower bound known is linear in $n$. Until 2016, the best upper bound was exponential in the square root of $n$. We improve the square root to a cube root using statistics of individual output bits and some inequalities for Littlewood polynomials on the unit circle. This bound is sharp for reconstruction algorithms that only use this statistical information. (Similar results were obtained independently and concurrently by De O'Donnell and Servedio). If the string $x$ is random, we can show a subpolynomial number of traces suffices by comparison to a random walk. (Joint works with Fedor Nazarov, STOC 2017, with Alex Zhai, FOCS 2017 and with Nina Holden & Robin Pemantle, preprint (2017).)

Image captioning -- automatic production of text describing a visual scene -- has received a lot of attention recently. However, the objective of captioning, evaluation metrics, and the training protocol remain somewhat unsettled. The general goal seems to be for machines to describe visual signal like humans do. We pursue this goal by incorporating a discriminability loss in training caption generators. This loss is explicitly "aware" of the need for the caption to convey information, rather than appear fluent or reflect word distribution in the human captions. Specifically, the loss in our work is tied to discriminative tasks: producing a referring expression (text that allows a recipient to identify a region in the given image) or producing a discriminative caption which allows the recipient to identify an image within a set of images. In both projects, use of the dscriminability loss does not require any additional human annotations, and relies on collaborative training between the caption generator and a comprehension model, which is a proxy for a human recipient. In experiments on standard benchmarks, we show that adding discriminability objectives not only improves the discriminative quality of the generated image captions, but, perhaps surprisingly, also makes the captions better under a variety of traditional metrics.

Joint work with Ruotian Luo (TTIC), Brian Price and Scott Cohen (Adobe).

Let K be a commutative ring. Consider the groups GLn(K). Bernstein and Zelevinsky have studied the representations of the general linear groups in case the ring K is a nite eld. Instead of studying the representations of GLn(K) for each n separately, they have studied all the representations of all the groups GLn(K) si- multaneously. They considered on R := nR(GLn(K)) structures called parabolic (or Harish-Chandra) induction and restriction, and showed that they enrich R with a structure of a so called positive self adjoint Hopf algebra (or PSH algebra). They use this structure to reduce the study of representations of the groups GLn(K) to the following two tasks:

1. Study a special family of representations of GLn(K), called cuspidal representa- tions. These are representations which do not arise as direct summands of parabolic induction of smaller representations.

2. Study representations of the symmetric groups. These representation also has a nice combinatorial description, using partitions.

In this talk I will discuss the study of representations of GLn(K) where K is a nite quotient of a discrete valuation ring (such as Z=pr or k[x]=xr, where k is a nite eld). One reason to study such representation is that all continuous complex representations of the groups GLn(Zp) and GLn(k[[x]]) (where Zp denotes the p-adic integers) arise from these nite quotients. I will explain why the natural generalization of the Harish-Chandra functors do not furnish a PSH algebra in this case, and how is this related to the Bruhat decomposition and Gauss elimination. In order to overcome this issue we have constructed a generalization of the Harish-Chandra functors. I will explain this generalization, describe some of the new functors properties, and explain how can they be applied to studying complex representations.

The talk will be based on a joint work with Tyrone Crisp and Uri Onn.

Today's technological world is increasingly dependent upon the reliability, robustness, quality of service and timeliness of networks including those of power distribution, financial, transportation, communication, biological, and social. For the time-critical functionality in transferring resources and information, a key requirement is the ability to adapt and reconfigure in response to structural and dynamic changes, while avoiding disruption of service and catastrophic failures. We will outline some of the major problems for the development of the necessary theory and tools that will permit the understanding of network dynamics in a multiscale manner.

Many interesting networks consist of a finite but very large number of nodes or agents that interact with each other. The main challenge when dealing with such networks is to understand and regulate the collective behavior. Our goal is to develop mathematical models and optimization tools for treating the ** Big Data** nature of large scale networks while providing the means to understand and regulate the collective behavior and the dynamical interactions (short and long-range) across such networks.

The key mathematical technique will be based upon the use optimal mass transport theory and resulting notions of curvature applied to weighted graphs in order to characterize network robustness. Examples will be given from biology, finance, and transportation.

I shall review the framework of algebraic families of Harish-Chandra modules, introduced recently, by Bernstein, Higson, and the speaker. Then, I shall describe three of their applications.

The first is contraction of representations of Lie groups. Contractions are certain deformations of representations with applications in mathematical physics.

The second is the Mackey bijection, this is a (partially conjectural) bijection between the admissible dual of a real reductive group and the admissible dual of its Cartan motion group.

The third is the hidden symmetry of the hydrogen atom as an algebraic family of Harish-Chandra modules.

Consider a random sequence of n bits that has entropy at least n-k, where k << n. A commonly used observation is that an average coordinate of this random sequence is close to being uniformly distributed, that is, the coordinate "looks random''. In this work, we prove a stronger result that says, roughly, that the average coordinate looks random to an adversary that is allowed to query about n/k other coordinates of the sequence, even if the adversary is non-deterministic.

As an application of this result, we prove a new result on depth-3 circuits, which recovers as a direct corollary the known lower bounds for the parity and majority functions, as well as a lower bound on sensitive functions due to Boppana.

In this talk we discuss the fine scale $L^2$-mass distribution of toral Laplace eigenfunctions with respect to random position. For the 2-dimensional torus, under certain flatness assumptions on the Fourier coefficients of the eigenfunctions and generic restrictions on energy levels, both the asymptotic shape of the variance and the limiting Gaussian law are established, in the optimal Planck-scale regime. We also discuss the 3-dimensional case, where the asymptotic behaviour of the variance is analysed in a more restrictive scenario. This is joint work with Igor Wigman.

Deep Learning has led to a dramatic leap in Super-Resolution (SR) performance in the past few years. However, being supervised, these SR methods are restricted to specific training data, where the acquisition of the low-resolution (LR) images from their high-resolution (HR) counterparts is predetermined (e.g., bicubic downscaling), without any distracting artifacts (e.g., sensor noise, image compression, non-ideal PSF, etc). Real LR images, however, rarely obey these restrictions, resulting in poor SR results by SotA (State of the Art) methods. In this paper we introduce "Zero-Shot" SR, which exploits the power of Deep Learning, but does not rely on prior training. We exploit the internal recurrence of information inside a single image, and * train a small image-specific CNN at test time, on examples extracted solely from the input image itself.* As such, it can adapt itself to different settings per image. This allows to perform SR of real old photos, noisy images, biological data, and other images where the acquisition process is unknown or non-ideal. On such images, our method outperforms SotA CNN-based SR methods, as well as previous unsupervised SR methods. To the best of our knowledge, this is the first unsupervised CNN-based SR method.

Abstract. As is well known and easy to prove the Weyl algebras A_n over a field of characteristic zero are simple. Hence any non-zero homomorphism from A_n to A_m is an embedding and m \geq n. V. Bavula conjectured that the same is true over the fields with finite characteristic. It turned out that exactly one half of his conjecture is correct (m \geq n but there are homomorphisms which are not embeddings).

If we replace the Weyl algebra by its close relative symplectic Poisson algebra (polynomial algebra with F[x_1, ..., x_n; y_1, ..., y_n] variables and Poisson bracket given by {x_i, y_i} =1 and zero on the rest of the pairs), then independently of characteristic all homomorphisms are embeddings.

In this talk I will describe a family of integral representations for the standard twisted L-function of a cuspidal representation of the exceptional group of type G_2. This integral representations. These integral representations are unusual in the sense that they unfold with a non-unique model. A priori this integral is not Eulerian but using remarkable machinery proposed by I. Piatetski-Shapiro and S. Rallis we prove that in fact the integral does factor. In the course of the plocal unramified calculation we use another non-standard method, approximations of generating functions. I will then describe a few applications of these integral representations to the study of the analytic behaviour of the this L-function and to various functorial lifts associated with the group G_2.

We present an explicit pseudorandom generator with polylog(n) seed length for read-once constant-width branching programs that can read their $n$ input bits in any order. This improves upon the work of Impagliazzo, Meka, and Zuckerman (FOCS, 2012), where they required seed length $n^{1/2+o(1)}$.

A central ingredient in our work is a bound on the Fourier spectrum of constant-width branching programs, settling a conjecture posed by Reingold, Steinke, and Vadhan (RANDOM, 2013).

Our analysis crucially uses a notion of local monotonicity on the edge labeling of the branching program. We carry critical parts of our proof under the assumption of local monotonicity and show how to deduce our results for unrestricted branching programs.

(Joint work with Eshan Chattopadhyay, Pooya Hatami, and Omer Reingold)

Machine learning has recently been revolutionized by the introduction of Deep Neural Networks. However, from a theoretical viewpoint these methods are still poorly understood. Indeed the key challenge in Machine Learning today is to derive rigorous results for optimization and generalization in deep learning. In this talk I will present several tractable approaches to training neural networks. At the second part I will discuss a new sequential algorithm for decision making that can take into account the structure in the action space and is more tuned with realistic decision making scenarios.

I will present our work that provides some of the first positive results and yield new, provably efficient, and practical algorithms for training certain types of neural networks. In a second work I will present a new online algorithm that learns by sequentially sampling random networks and asymptotically converges, in performance, to the optimal network. Our approach improves on previous random features based learning in terms of sample/computational complexity, and expressiveness. In a more recent work we take a different perspective on this problem. I will provide sufficient conditions that guarantee tractable learning, using the notion of refutation complexity. I will then discuss how this new idea can lead to new interesting generalization bounds that can potentially explain generalization in settings that are not always captured by classical theory.

In the setting of reinforcement learning I will present a recently developed new algorithm for decision making in a metrical action space. As an application, we consider a dynamic pricing problem in which a seller is faced with a stream of patient buyers. Each buyer buy at the lowest price in a certain time window. We use our algorithm to achieve an optimal regret, improving on previously known regret bound.

Schwartz functions are classically defined as smooth functions such that they, and all their (partial) derivatives, decay at infinity faster than the inverse of any polynomial. This was formulated on $\mathbb{R}^n$ by Laurent Schwartz, and later on Nash manifolds (smooth semi-algebraic varieties) by Fokko du Cloux and by Rami Aizenbud and Dima Gourevitch. In a joint work with Boaz Elazar we have extended the theory of Schwartz functions to the category of (possibly singular) real algebraic varieties. The basic idea is to define Schwartz functions on a (closed) algebraic subset of $\mathbb{R}^n$ as restrictions of Schwartz functions on $\mathbb{R}^n$.

Both in the Nash and the algebraic categories there exists a very useful characterization of Schwartz functions on open subsets, in terms of Schwartz functions on the embedding space: loosely speaking, Schwartz functions on an open subset are exactly restrictions of Schwartz functions on the embedding space, which are zero "to infinite order" on the complement to this open subset. This characterization suggests a very intuitive way to attach a space of Schwartz functions to an arbitrary (not necessarily semi-algebraic) open subset of $\mathbb{R}^n$.

In this talk, I will explain this construction, and more generally the construction of the category of tempered smooth manifolds. This category is in a sense the "largest" category whose objects "look" locally like open subsets of $\mathbb{R}^n$ (for some $n$), and on which Schwartz functions may be defined. In the development of this theory some classical results of Whitney are used, mainly Whitney type partition of unity (this will also be explained in the talk). As time permits, I will show some properties of Schwartz functions, and describe some possible applications. This is a work in progress.

Over the past two decades, machine learning has rapidly evolved and emerged as a highly influential discipline of computer science and engineering. One of the pillars of machine learning is mathematical optimization, and the connection between the two fields has been a primary focus of research. In this talk, I will present two recent works that contribute to this study, focusing on online learning---a central model in machine learning for sequential decision making and learning under uncertainty. In the first part of the talk, I will describe a foundational result concerned with the power of optimization in online learning, and give answer to the question: does there exist a generic and efficient reduction from online learning to black-box optimization? In the second part, I will discuss a recent work that employs online learning techniques to design a new efficient and adaptive preconditioned algorithm for large-scale optimization. Despite employing preconditioning, the algorithm is practical even in modern optimization scenarios such as those arising in training state-of-the-art deep neural networks. I will present the new algorithm along with its theoretical guarantees and demonstrate its performance empirically.

The random-cluster model is a dependent percolation model where the weight of a configuration is proportional to q to the power of the number of connected components. It is highly related to the ferromagnetic q-Potts model, where every vertex is assigned one of q colors, and monochromatic neighbors are encouraged. Through non-rigorous means, Baxter showed that the phase transition is first-order whenever $q > 4$ - i.e. there are multiple Gibbs measures at criticality. We provide a rigorous proof of this claim. Like Baxter, our proof uses the correspondence between the above models and the six-vertex model, which we analyze using the Bethe ansatz and transfer matrix techniques. We also prove Baxter's formula for the correlation length of the models at criticality.

This is joint work with Hugo Duminil-Copin, Maxime Gangebin, Ioan Manolescu, and Vincent Tassion.

We present a general approach to video understanding, inspired by semantic transfer techniques that have been successfully used for 2D image analysis. Our method considers a video to be a 1D sequence of clips, each one associated with its own semantics. The nature of these semantics -- natural language captions or other labels -- depends on the task at hand. A test video is processed by forming correspondences between its clips and the clips of reference videos with known semantics, following which, reference semantics can be transferred to the test video. We describe two matching methods, both designed to ensure that (a) reference clips appear similar to test clips and (b), taken together, the semantics of the selected reference clips is consistent and maintains temporal coherence. We use our method for video captioning on the LSMDC'16 benchmark, video summarization on the SumMe and TVSum benchmarks, Temporal Action Detection on the Thumos2014 benchmark, and sound prediction on the Greatest Hits benchmark. Our method not only surpasses the state of the art, in four out of five benchmarks, but importantly, it is the only single method we know of that was successfully applied to such a diverse range of tasks.

Digital geometry processing (DGP) is one of the core topics of computer graphics, and has been an active line of research for over two decades. On one hand, the field introduces theoretical studies in topics such as vector-field design, preservative maps and deformation theory. On the other hand, the tools and algorithms developed by this community are applicable in fields ranging from computer-aided design, to multimedia, to computational biology and medical imaging. Throughout my work, I have sought to bridge the gap between the theoretical aspects of DGP and their applications. In this talk, I will demonstrate how DGP concepts can be leveraged to facilitate real-life applications with the right adaptation. More specifically, I will portray how I have employed deformation theory to support problems in animation and augmented reality. I will share my thoughts and first taken steps to enlist DGP to the aid of machine learning, and perhaps most excitingly, I will discussion my own and the graphics community's contributions to computational fabrication field, as well as my vision for its future.

Bio: Dr. Amit H. Bermano is a postdoctoral Researcher at the Princeton Graphics Group, hosted by Professor Szymon Rusinkiewicz and Professor Thomas Funkhouser. Previously, he was a postdoctoral researcher at Disney Research Zurich in the computational materials group, led by Dr. Bernhard Thomaszewski. He conducted his doctoral studies at ETH Zurich under the supervision of Prof. Dr. Markus Gross, in collaboration with Disney Research Zurich. His Masters and Bachelors degrees were obtained at The Technion - Israel Institute of Technology under the supervision of Prof. Craig Gotsman. His research focuses on connecting the geometry processing field with other fields in computer graphics and vision, mainly by using geometric methods to facilitate other applications. His interests in this context include computational fabrication, animation, augmented reality, medical imaging and machine learning.

We present new heavy-hitters algorithms satisfying local-differential-privacy, with optimal or near-optimal worst-case error, running time, and memory. In our algorithms, the server running time is $\tilde O(n)$ and user running time is $\tilde O(1)$, hence improving on the prior state-of-the-art result of Bassily and Smith [STOC 2015] requiring $O(n^{5/2})$ server time and $O(n^{3/2})$ user time. With a typically large number of participants in local algorithms ($n$ in the millions), this reduction in time complexity is crucial for making locally-private heavy-hitters algorithms usable in practice.

Joint work with Raef Bassily, Kobbi Nissim, and Abhradeep Thakurta.

In this talk, I will try to present some techniques to handle the problem of large deviations of the spectrum of random matrices. I will focus on the case of macroscopic statistics of the spectrum of Hermitian matrices - in particular Wigner matrices - as the empirical distribution of the eigenvalues, the largest eigenvalue or the traces of powers.

In a first part, I will be concerned with the so-called "objective method''. Coined by David Aldous, this method consists in introducing, given a sequence of random objects, like random finite graphs, a new infinite random object from which one can deduce asymptotic properties of the original sequence. In the context of random matrices, this method has been mainly advertised by Balint Virag, and proven effective in showing universality results for the so-called beta-ensembles. Regarding large deviations of random matrices, this "objective method'' amounts to embed our sequence of matrices with growing size into an appropriate space on which one is able to understand the large deviations, and carry out a contraction principle. I will review several large deviations principles obtained by this method, given by interpretations of random matrices as either dense or sparse graphs, and point out the limits of this strategy.

Cryo-EM is an imaging technology that is revolutionizing structural biology; the Nobel Prize in Chemistry 2017 was recently awarded to Jacques Dubochet, Joachim Frank and Richard Henderson "for developing cryo-electron microscopy for the high-resolution structure determination of biomolecules in solution".

Cryo-electron microscopes produce a large number of very noisy two-dimensional projection images of individual frozen molecules. Unlike related methods, such as computed tomography (CT), the viewing direction of each image is unknown. The unknown directions, together with extreme levels of noise and additional technical factors, make the determination of the structure of molecules challenging.

While other methods for structure determination, such as x-ray crystallography and nuclear magnetic resonance (NMR), measure ensembles of molecules together, cryo-EM produces measurements of individual molecules. Therefore, cryo-EM could potentially be used to study mixtures of different conformations of molecules. Indeed, current algorithms have been very successful at analyzing homogeneous samples, and can recover some distinct conformations mixed in solutions, but, the determination of multiple conformations, and in particular, continuums of similar conformations (continuous heterogeneity), remains one of the open problems in cryo-EM.

I will discuss a one-dimensional discrete model problem, Heterogeneous Multireference Alignment, which captures many of the group properties and other mathematical properties of the cryo-EM problem. I will then discuss different components which we are introducing in order to address the problem of continuous heterogeneity in cryo-EM: 1. "hyper-molecules", the first mathematical formulation of truly continuously heterogeneous molecules, 2. The optimal representation of objects that are highly concentrated in both the spatial domain and the frequency domain using high-dimensional prolate spheroidal functions, and 3. Bayesian algorithms for inverse problems with an unsupervised-learning component for recovering such hyper-molecules in cryo-EM.

Weyl group multiple Dirichlet series are Dirichlet series in r complex variables which initially converge on a cone in C^r, possess analytic continuation to a meromorphic function on the whole complex space, and satisfy functional equations whose action on C^r is isomorphic to the Weyl group of a reduced root system. I will review different constructions of such series and discuss the relations between them.

The talk will discuss informational lower bounds of approximate Nash equilibrium in two complexity models: Query Complexity and Communication Complexity.

For both models we prove exponential (in the number of players) lower bound on the complexity of computing ε -Nash equilibrium, for constant value of approximation ε .

Consider a real Gaussian stationary process, either on Z or on R.

What is the probability that it remains positive on [0,N] for large N?

The relation between this probability, known as the persistence probability, and the covariance kernel of the process has been investigated since the 1950s with motivations stemming from probability, engineering and mathematical physics. Nonetheless, until recently, good estimates were known only for particular cases, or when the covariance kernel is either non-negative or summable.

In the first hour of the talk we will discuss new spectral methods which greatly simplify the analysis of persistence. We will then describe its qualitative behavior in a very general setting.

In the second hour, we will describe (very) recent progress. In particular we will show the proof of the "spectral gap conjecture'', which states: if the spectral measure vanishes on an interval containing 0 then the persistence is less then e^{-cN^2}, and this bound is tight if the measure is non-singular and compactly supported.

Time permitting, we will also discuss "tiny persistence'' phenomena (of the order of e^{-e^{cN}}).

Based on joint works with Ohad Feldheim, Benjamin Jaye, Fedor Nazarov and Shahaf Nitzan.

Isolating the voice of a specific person while filtering out other voices or background noises is challenging when video is shot in noisy environments, using a single microphone. For example, video conferences from home or office are disturbed by other voices, TV reporting from city streets is mixed with traffic noise, etc. We propose audio-visual methods to isolate the voice of a single speaker and eliminate unrelated sounds. Face motions captured in the video are used to estimate the speaker's voice, which is applied as a filter on the input audio. This approach avoids using mixtures of sounds in the learning process, as the number of such possible mixtures is huge, and would inevitably bias the trained model.

In the first part of this talk, I will describe a few techniques to predict speech signals by a silent video of a speaking person. In the second part of the talk, I will propose a method to separate overlapping speech of several people speaking simultaneously (known as the cocktail-party problem), based on the speech predictions generated by video-to-speech system.

Assume that $(X,f)$ is a dynamical system and $\phi$ is a real non negative potential such that the corresponding $f$-invariant measure $\mu_\phi$ measure is infinite. Under assumptions of good inducing schemes, we give conditions under which the pressure of $f$ for a perturbed potential $\phi+s\psi$ relates to the pressure of the induced system term.

This extends some of Sarig's results to the setting of infinite "equilibrium states".

In addition, limit properties of the family of measures $\mu_{\phi+s\psi}$ as $s\to 0$ are studied and statistical properties (e.g. correlation coefficients) under the limit measure are derived. I will discuss several examples.

This is based on joint work with H. Bruin and M. Todd.

As a concrete variant of motivic integration, I will discuss uniform p-adic integration and constructive aspects of results involved. Uniformity is in the p-adic fields, and, for large primes p, in the fields F_p((t)) and all their finite field extensions. Using real-valued Haar measures on such fields, one can study integrals, Fourier transforms, etc. We follow a line of research that Jan Denef started in the eighties, with in particular the use of model theory to study various questions related to p-adic integration. A form of uniform p-adic quantifier elimination is used. Using the notion of definable functions, one builds constructively a class of complex-valued functions which one can integrate (w.r.t. any of the variables) without leaving the class. One can also take Fourier transforms in the class. Recent applications in the Langlands program are based on Transfer Principles for uniform p-adic integrals, which allow one to get results for F_p((t)) from results for Q_p, once p is large, and vice versa. These Transfer Principles are obtained via the study of general kinds of loci, some of them being zero loci. More recently, these loci are playing a role in the uniform study of p-adic wave front sets for (uniformly definable) p-adic distributions, a tool often used in real analysis.

This talk contains various joint works with Gordon, Hales, Halupczok, Loeser, and Raibaut, and may mention some work in progress with Aizenbud about WF-holonomicity of these distributions, in relation to a question raized by Aizenbud and Drinfeld.

Covariance matrix estimation is essential in many areas of modern Statistics and Machine Learning including Graphical Models, Classification/Discriminant Analysis, Principal Component Analysis, and many others. Classical statistics suggests using Sample Covariance Matrix (SCM) which is a Maximum Likelihood Estimator (MLE) in the Gaussian populations. Real world data, however, usually exhibits heavy-tailed behavior and/or contains outliers, making the SCM non-efficient or even useless. This problem and many similar ones gave rise to the Robust Statistics field in early 60s, where the main goal was to develop estimators stable under reasonable deviations from the basic Gaussian assumptions. One of the most prominent robust covariance matrix estimators was introduced and thoroughly studied by D. Tyler in the mid-80s. This important representative of the family of M-estimators can be defined as an MLE of a certain population. The problem of robust covariance estimation becomes even more involved in the high-dimensional scenario, where the number of samples n is of the order of the dimension p, or even less. In such cases, prior knowledge, often referred to as structure, is utilized to decrease the number of degrees of freedom and make the estimation possible. Unlike the Gaussian setting, in Tyler's case even imposition of linear structure becomes challenging due to the non-convexity of the negative log-likelihood. Recently, Tyler's target function was shown to become convex under a certain change of metric (geodesic convexity), which stimulated further investigation of the estimator.

In this work, we focus on the so-called group symmetry structure, which essentially means that the true covariance matrix commutes with a group of unitary matrices. In engineering applications such structures appear due to the natural symmetries of the physical processes; examples include circulant, perHermitian, proper quaternion matrices, etc. Group symmetric constraints are linear, and thus convex in the regular Euclidean metric. We show that they are also convex in the geodesic metric. These properties allow us to develop symmetric versions of the SCM and Tyler's estimator and build a general framework for their performance analysis. The classical results claim that at least n = p and n = p+1 samples in general position are necessary to ensure the existence and uniqueness of the SCM and Tyler's estimator, respectively. We significantly improve the sample complexity requirements for both estimators under the symmetry structure and show that in some cases even 1 or 2 samples are enough to guarantee the existence and uniqueness regardless of the ambient dimension.

In this paper we study the adaptive complexity of submodular optimization. Informally, the adaptive complexity of a problem is the minimal number of sequential rounds required to achieve a constant factor approximation when polynomially-many queries can be executed in parallel at each round. Adaptivity is a fundamental concept that is heavily studied in computer science, largely due to the need for parallelizing computation. Somewhat surprisingly, very little is known about adaptivity in submodular optimization. For the canonical problem of maximizing a monotone submodular function under a cardinality constraint, to the best of our knowledge, all that is known to date is that the adaptive complexity is between 1 and Ω(n).Our main result in this paper is a tight characterization showing that the adaptive complexity of maximizing a monotone submodular function under a cardinality constraint is, up to lower order terms, θ(log n):We describe an algorithm which requires O(log n) sequential rounds and achieves an approximation that is arbitrarily close to 1/3; We show that no algorithm can achieve an approximation better than O(1 / log n) with fewer than O(log n / log log n) rounds. Thus, when allowing for parallelization, our algorithm achieves a constant factor approximation exponentially faster than any known existing algorithm for submodular maximization. Importantly, the approximation algorithm is achieved via adaptive sampling and complements a recent line of work on optimization of functions learned from data. In many cases, we do not know the functions we optimize and learn them from labeled samples. Recent results show that no algorithm can obtain a constant factor approximation guarantee using polynomially-many labeled samples as in the PAC and PMAC models, drawn from any distribution. Since learning with non-adaptive samples over any distribution results in a sharp impossibility, we consider learning with adaptive samples where the learner obtains poly(n) samples drawn from a distribution of her choice in every round. Our result implies that in the realizable case, where there is a true underlying function generating the data, θ(log n) batches, up to lower order terms, of adaptive samples are necessary and sufficient to approximately "learn to optimize" a monotone submodular function under a cardinality constraint. This is joint work with Yaron Singer.

Given an unknown D-dimensional quantum state rho, as well as M two-outcome measurements E_1,...,E_M, how many copies of rho do we need, if we want to learn the approximate probability that E_i accepts rho for *every* i? In this talk, I'll prove the surprising result --I didn't believe it myself at first -- that one can achieve this using a number of copies that's polylogarithmic in both M and D. So, e.g., one can learn whether *every* size-n^3 quantum circuit accepts or rejects an n-qubit state, given only poly(n) copies of the state. To prove this will require first surveying previous results on measuring quantum states and succinctly describing them, including my 2004 postselected learning theorem, and my 2006 "Quantum OR Bound" (with an erroneous proof fixed in 2016 by Harrow, Lin, and Montanaro).

As time permits, I'll also discuss new joint work with Xinyi Chen, Elad Hazan, and Ashwin Nayak, which takes my 2006 result on PAC-learnability of quantum states, and extends to the setting of online learning. Here we show that, given a sequence of T two-outcome measurements on an n-qubit state, even if the sequence is chosen adversarially, one can still learn to predict the outcomes of those measurements with total regret O(n) (in the "realizable" case) or O(sqrt(Tn)) (in the "non-realizable" case).

No quantum computing background will be assumed.

Calabi conjectured that the complex Monge-Ampere equation on compact Kaehler manifolds has a unique solution.

This was solved by Yau in 1978. In this talk, we present a non-archimedean version on projective Berkovich spaces.

In joint work with Burgos, Jell, Kunnemann and Martin, we improve a result of Boucksom, Favre and Jonsson in the equicharacteristic 0 case. We give also a result in positive equicharacteristic using test ideals.

We study the non-vanishing gradient-like vector fields $v$ on smooth compact manifolds $X$ with boundary. We call such fields traversing. With the help of a boundary generic field $v$, we divide the boundary $\d X$ of $X$ into two complementary compact manifolds, $\d^+X(v)$ and $\d^-X(v)$. Then we introduce the causality map $C_v: \d^+X(v) \to \d^-X(v)$, a distant relative of the Poincare return map. Let $\mathcal F(v)$ denote the oriented 1-dimensional foliation on $X$, produced by a traversing $v$-flow.

Our main result, the Holography Theorem, claims that, for boundary generic traversing vector fields $v$, the knowledge of the causality map $C_v$ is allows for a reconstruction of the pair $(X, \mathcal F(v))$, up to a homeomorphism $\Phi: X \to X$ which is the identity on the boundary $\d X$. In other words, for a massive class of ODE's, we show that the topology of their solutions, satisfying a given boundary value problem, is rigid. We call these results ``holographic" since the $(n+1)$-dimensional $X$ and the un-parameterized dynamics of the flow on it are captured by a single correspondence $C_v$ between two $n$-dimensional screens, $\d^+X(v)$ and $\d^-X(v)$.

This holography of traversing flows has numerous applications to the dynamics of general flows. Time permitting, we will discuss some applications of the Holography Theorem to the geodesic flows and the inverse scattering problems on Riemannian manifolds with boundary.

Let Rep(GL(m|n)) denote the category of finite-dimensional algebraic representations of the supergroup Gl(m|n). Nowadays the abelian structure (Ext^1 between irreducibles, block description,...) is well understood. Kazhdan-Lusztig theory gives an algorithmic solution for the character problem, and in special cases even explicit character formulas. However we understand the monoidal structure hardly at all (e.g. the decomposition of tensor products into the indecomposable constituents). I will talk about the problem of decomposing tensor products "up to superdimension 0", i.e. about the structure of Rep(GL(m|n))/N where N is the ideal of indecomposable representations of superdimension 0.

When dealing with a highly variable problem such as action recognition, focusing on a small area, such as the hand's region, makes the problem more manageable, and enables us to invest relatively high amount of resources needed for interpretation in a small but highly informative area of the image. In order to detect this region of interest in the image and properly analyze it, I have built a process that includes several steps, starting with a state of the art hand detector, incorporating both detection of the hand by appearance and by estimation of human body pose. The hand detector is built upon a fully convolutional neural network, detecting hands efficiently and accurately. The human body pose estimation starts with a state of the art head detector and continues with a novel approach where each location in the image votes for the position of each body keypoint, utilizing information from the whole image. Using dense, multi-target votes enables us to compute image-dependent joint keypoint probabilities by looking at consensus voting, and accurately estimates the body pose. Once the detection of the hands is complete, an additional step of segmentation of the hand and fingers is made. In this step each hand pixel in the image is labeled using a dense fully convolutional network. Finally, an additional step is made to segment and identify the held object. Understanding the hand-object interaction is an important step toward understanding the action taking place in the image. These steps enable us to perform fine interpretation of hand-object interaction images as an essential step towards understanding the human-object interaction and recognizing human activities.

Recent progress in imaging technologies leads to a continuous growth in biomedical data, which can provide better insight into important clinical and biological questions. Advanced machine learning techniques, such as artificial neural networks are brought to bear on addressing fundamental medical image computing challenges such as segmentation, classification and reconstruction, required for meaningful analysis of the data. Nevertheless, the main bottleneck, which is the lack of annotated examples or 'ground truth' to be used for training, still remains.

In my talk, I will give a brief overview on some biomedical image analysis problems we aim to address, and suggest how prior information about the problem at hand can be utilized to compensate for insufficient or even the absence of ground-truth data. I will then present a framework based on deep neural networks for the denoising of Dynamic contrast-enhanced MRI (DCE-MRI) sequences of the brain. DCE-MRI is an imaging protocol where MRI scans are acquired repetitively throughout the injection of a contrast agent, that is mainly used for quantitative assessment of blood-brain barrier (BBB) permeability. BBB dysfunctionality is associated with numerous brain pathologies including stroke, tumor, traumatic brain injury, epilepsy. Existing techniques for DCE-MRI analysis are error-prone as the dynamic scans are subject to non-white, spatially-dependent and anisotropic noise. To address DCE-MRI denoising challenges we use an ensemble of expert DNNs constructed as deep autoencoders, where each is trained on a specific subset of the input space to accommodate different noise characteristics and dynamic patterns. Since clean DCE-MRI sequences (ground truth) for training are not available, we present a sampling scheme, for generating realistic training sets with nonlinear dynamics that faithfully model clean DCE-MRI data and accounts for spatial similarities. The proposed approach has been successfully applied to full and even temporally down-sampled DCE-MRI sequences, from two different databases, of stroke and brain tumor patients, and is shown to favorably compare to state-of-the-art denoising methods.

We consider the problem of hidden common manifold extraction from multiple data sets, which have observation-specific distortions and artifacts. A new manifold learning method is presented based on alternating products of diffusion operators and local kernels. We provide theoretical analysis showing that our method is able to build a variant of the Laplacian of the hidden common manifold, while suppressing the observation-specific artifacts. The generality of this method is demonstrated in data analysis applications, where different types of devices are used to measure the same activity. In particular, we present applications to problems in biomedicine, neuroscience, and audio analysis.

This is joint work with Roy Lederman and Hau-tieng Wu.

We prove the first super-logarithmic lower bounds on the cell probe complexity of dynamic *boolean* (a.k.a. decision) data structure problems, a long-standing milestone in data structure lower bounds. We introduce a new technique and use it to prove a ~ log^{1.5}(n) lower bound on the operational time of a wide range of boolean data structure problems, most notably, on the query time of dynamic range counting *over F_2* ([Patrascu07]). Proving a super-logarithmic lower bound for this problem was explicitly posed as one of five important open problems in the late Mihai Patrascu's obituary [Tho13]. This result also implies the first super-logarithmic lower bound for the classical 2D range counting problem,one of the most fundamental data structure problems in computational geometry and spatial databases. We derive similar lower bounds for boolean versions of dynamic polynomial evaluation and 2D "rectangle stabbing", and for the (non-boolean) problems of range selection and range median. Our technical centerpiece is a new way of "weakly" simulating dynamic data structures using efficient *one-way* communication protocols with small advantage over random guessing. This simulation involves a surprising excursion to low-degree (Chebychev) polynomials which may be of independent interest, and offers an entirely new algorithmic angle on the "cell sampling" method of Panigrahy et al. [PTW10].

Joint work with Kasper Green-Larsen and Huacheng Yu.

We study the classic bipartite matching problem in the online setting, first introduced in the seminal work of Karp, Vazirani and Vazirani. Specifically, we consider the problem for the well-studied class of regular graphs. Matching in this class of graphs was studied extensively in the offline setting. In the online setting, an optimal deterministic algorithm, as well as efficient algorithms under stochastic input assumptions were known. In this work, we present a novel randomized algorithm with competitive ratio tending to one on this family of graphs, under adversarial arrival order. Our main contribution is a novel algorithm which achieves competitive ratio 1-O(\sqrt{\log d}/\sqrt{d}) in expectation on d-regular graphs. In contrast, we show that all previously-known online algorithms, such as the generally worst-case optimal ranking algorithm of Karp et al., are restricted to a competitive ratio strictly bounded away from one, even as d grows. Moreover, we show the convergence rate of our algorithm's competitive ratio to one is nearly tight, as no algorithm achieves competitive ratio better than 1-O(1/\sqrt{d}). Finally, we show that our algorithm yields a similar competitive ratio with high probability, as well as guaranteeing each offline vertex a probability of being matched tending to one.

The recent success of convolutional neural networks (CNNs) for image processing tasks is inspiring research efforts attempting to achieve similar success for geometric tasks. One of the main challenges in applying CNNs to surfaces is defining a natural convolution operator on surfaces. In this paper we present a method for applying deep learning to sphere-type shapes using a global seamless parameterization to a planar flat-torus, for which the convolution operator is well defined. As a result, the standard deep learning framework can be readily applied for learning semantic, high-level properties of the shape. An indication of our success in bridging the gap between images and surfaces is the fact that our algorithm succeeds in learning semantic information from an input of raw low-dimensional feature vectors.

We demonstrate the usefulness of our approach by presenting two applications: human body segmentation, and automatic landmark detection on anatomical surfaces. We show that our algorithm compares favorably with competing geometric deep-learning algorithms for segmentation tasks, and is able to produce meaningful correspondences on anatomical surfaces where hand-crafted features are bound to fail.

Joint work with: Meirav Galun, Noam Aigerman, Miri Trope, Nadav Dym, Ersin Yumer, Vladimir G. Kim and Yaron Lipman.

The past five years have seen a dramatic increase in the performance of recognition systems due to the introduction of deep architectures for feature learning and classification. However, the mathematical reasons for this success remain elusive. In this talk we will briefly survey some existing theory of deep learning. In particular, we will focus on data structure based theory and discuss two recent developments.

The first work studies the generalization error of deep neural network. We will show how the generalization error of deep networks can be bounded via their classification margin. We will also discuss the implications of our results for the regularization of the networks. For example, the popular weight decay regularization guarantees the margin preservation, but it leads to a loose bound to the classification margin. We show that a better regularization strategy can be obtained by directly controlling the properties of the network's Jacobian matrix.

The second work focuses on solving minimization problems with neural networks. Relying on recent recovery techniques developed for settings in which the desired signal belongs to some low-dimensional set, we show that using a coarse estimate of this set leads to faster convergence of certain iterative algorithms with an error related to the accuracy of the set approximation. Our theory ties to recent advances in sparse recovery, compressed sensing and deep learning. In particular, it provides an explanation for the successful approximation of the ISTA (iterative shrinkage and thresholding algorithm) solution by neural networks with layers representing iterations.

Joint work with Guillermo Sapiro, Miguel Rodrigues, Jure Sokolic, Alex Bronstein and Yonina Eldar.

Computational social choice deals with algorithms for aggregating individual preferences or opinions towards collective decisions. AI researchers (including myself) have long argued that such algorithms could play a crucial role in the design and implementation of multiagent systems. However, in the last few years I have come to realize that the "killer app" of computational social choice is helping people -- not software agents -- make joint decisions. I will illustrate this theme through two recent endeavors: Spliddit.org, a website that offers provably fair solutions to everyday problems; and Robovote.org, which provides optimization-driven voting methods.

Throughout the talk, I will devote special attention to the theoretical foundations and results that make these services possible.

The theory of hypergeometric functions with matrix argument was developed in the 1950s by S. Bochener for Hermitian matrices, and by C. Herz for symmetric matrices. This theory admits a common generalization to the setting of symmetric cones, which is discussed in the book by Faraut-Koranyi. It also has applications to the study of non-central distributions in statistics and to the theory of random matrices.

In the 1980s, I.G. Macdonald introduced a one parameter family of multivariate hypergeometric functions, which, for special values of the parameter, are the *radial* parts of the matrix hypergeometric functions. He also formulated a number of natural conjectures about these functions, which in the matrix case can be proved by appropriate integral formulas. However this technique is unavailable in the general setting and as a result these conjectures have remained open.

In recent work with G. Olafsson we have solved most of these conjectures, using ideas from the theory of Cherednik algebras and Jack polynomials. Among other results we obtain sharp estimates for the exponential kernel that allow us to establish a rigorous theory of the Fourier and Laplace transforms, and we prove an explicit formula for the Laplace transform of a Jack polynomial conjectured by Macdonald. This opens the door for several future developments in the associated harmonic analysis, some of which we also treat. This includes (1) the Paley-Wiener theorem, (2) Laplace transform identities for hypergeometric functions, and (3) the "so-called" Ramanujan master theorem.

**Note the unusual room [De Picciotto ****Building, Room 25]**

We show that there exist binary locally testable codes (for all rates) and locally correctable codes (for low rates) with rate and distance approaching the Gilbert-Varshamov bound (which is the best rate-distance tradeoff known for general binary error-correcting codes). Our constructions use a number of ingredients: Thommesen's random concatenation technique, the Guruswami-Sudan-Indyk strategy for list-decoding concatenated codes, the Alon-Edmonds-Luby distance amplification method, and the local list-decodability and local testability of Reed-Muller codes. Interestingly, this seems to be the first time that local testability is used in the construction of locally correctable codes.

Joint work with Sivakanth Gopi, Rafael Oliveira, Noga Ron-Zewi and Shubhangi Saraf

This is a joint work with Bezrukavnikov and Kazhdan. The goal of my talk is to give an explicit formula for the Bernstein projector to representations of depth $\leq r$. As a consequence, we show that the depth zero Bernstein projector is supported on topologically unipotent elements and it is equal to the restriction of the character of the Steinberg representation. As another application, we deduce that the depth $r$ Bernstein projector is stable. Moreover, for integral depths our proof is purely local.

Image denoising is the most fundamental problem in image enhancement, and it is largely solved: It has reached impressive heights in performance and quality -- almost as good as it can ever get. But interestingly, it turns out that we can solve many other problems using the image denoising "engine". I will describe the Regularization by Denoising (RED) framework: using the denoising engine in defining the regularization of any inverse problem. The idea is to define an explicit image-adaptive regularization functional directly using a high performance denoiser. Surprisingly, the resulting regularizer is guaranteed to be convex, and the overall objective functional is explicit, clear and well-defined. With complete flexibility to choose the iterative optimization procedure for minimizing this functional, RED is capable of incorporating any image denoising algorithm as a regularizer, treat general inverse problems very effectively, and is guaranteed to converge to the globally optimal result.

* Joint work with Peyman Milanfar (Google Research) and Yaniv Romano (EE-Technion).

We propose a procedure (the first of its kind) for computing a fully data-dependent interval that traps the mixing time t_mix of a finite reversible ergodic Markov chain at a prescribed confidence level. The interval is computed from a single finite-length sample path from the Markov chain, and does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge.

The interval is constructed around the relaxation time t_relax, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a sqrt{n} rate, where n is the length of the sample path. Upper and lower bounds are given on the number of samples required to achieve constant-factor multiplicative accuracy. The lower bounds indicate that, unless further restrictions are placed on the chain, no procedure can achieve this accuracy level before seeing each state at least \Omega(t_relax) times on the average. Future directions of research are identified. Time permitting, we will mention some recent further developments by D. Levin and Y. Peres.

Joint work with Daniel Hsu and Csaba Szepesvari.

Graphs that are prevalent in current applications (the Internet, Facebook etc.) are not only very large and highly dynamic, but also distributed between many servers, none of which sees the graph in its entirety. The distributed monitoring problem deals with the question of imposing conditions on the local graphs, such that as long as they hold, it is guaranteed that some desired property holds for the global graph.

While defining local conditions for linear properties (e.g. average degree) is relatively easy, they are more difficult to derive for non-linear functions over the graph. We propose a solution and a general definition of solution optimality, and demonstrate how to apply it to two important graph properties -- spectral gap and number of triangles. We also define an absolute lower bound on the communication overhead for distributed monitoring, and compare our algorithm to it, with good results. Performance improves as the graph becomes larger and denser -- that is, when distributing it is more important.

I will talk about my recent adventures with ants. Together with biologists we study P. longicornis ants as they collaboratively transport a large food item to their nest. This collective navigation process is guided by pheromones which are laid by individual ants. Using a new methodology to detect scent marks, we identify a new kind of ant trail characterized by very short and dynamic pheromone markings and highly stochastic navigation response to them. We argue that such a trail can be highly beneficial in conditions in which knowledge of individual ants regarding the underlying topological structure is unreliable. This gives rise to a new theoretical search model on graphs under unreliable guiding instructions, which is of independent computational interest. To illustrate the model, imagine driving a car in an unknown country that is in the aftermath of a major hurricane which has randomly flipped a certain small fraction of the road-signs. Under such conditions of unreliability, how can you still reach your destination fast? I will discuss the limits of unreliability that allow for efficient navigation. In trees, for example, there is a phase transition phenomenon that occurs roughly around the inverse of the square root of the maximal degree. That is, if noise is above this threshold then any algorithm cannot avoid finding the target in exponential time (in the original distance), while below the threshold we identify an optimal, almost linear, walking algorithm. Finally, I will discuss algorithms that under such a noisy model aim to minimize the number of queries to find a target (rather than the number of moves).

This talk is based on joint works with biologists from the Weizmann Institute: Ofer Feinerman, Udi Fonio, and others, and with CS researchers: Lucas Bockowski, Adrian Kosowski, and Yoav Rodeh.

By analytical and numerical studies of Deep Neural Networks (using standard TensorFlow) in the "Information Plane" - the Mutual Information the network layers preserve on the input and the output variables - we obtain the following new insights.

- The training epochs, for each layer, are divided into two phases: (1) fitting the training data - increasing the mutual information on the labels; (2) compressing the representation - reducing the mutual information on the inputs. The layers are learnt hierarchically, from the bottom to the top layer, with some overlaps.
- Most (~80%) of the training time - optimization with SGD - is spent on compressing the representation (the second phase) - NOT on fitting the training data labels, even when the training has no regularization or terms that directly aim at such compression.
- The convergence point, FOR EVERY HIDDEN LAYER, lies on or very close to the Information Bottleneck IB) theoretical bound. Thus, the mappings from the input to the hidden layer and from the hidden layer to the output obey the IB self-consistent equations for some value of the compression-prediction tradeoff.
- The main benefit of adding more hidden layers is in the optimization/training time, as the compression phase for each layer amounts to relaxation to a Maximum conditional Entropy state, subject to the proper constraints on the error/information on the labels. As such relaxation takes super-linear time in the compressed entropy, adding more hidden layers dramatically reduces the training time. There is also benefit in sample complexity to adding hidden layers, but this is a smaller effect.

I will explain these new observations and the benefits of exploring Deep Learning in the "Information Plane", and discuss some of the exciting theoretical and practical consequences of our analysis.

Joint work with Ravid Ziv and Noga Zaslavsky.

We present a novel approach to template matching that is efficient, can handle partial occlusions, and is equipped with provable performance guarantees. A key component of the method is a reduction that transforms the problem of searching a nearest neighbor among N high-dimensional vectors, to searching neighbors among two sets of order sqrt(N) vectors, which can be done efficiently using range search techniques. This allows for a quadratic improvement in search complexity, that makes the method scalable when large search spaces are involved.

For handling partial occlusions, we develop a hashing scheme based on consensus set maximization within the range search component. The resulting scheme can be seen as a randomized hypothesize-and-test algorithm, that comes with guarantees regarding the number of iterations required for obtaining an optimal solution with high probability.

The predicted matching rates are validated empirically and the proposed algorithm shows a significant improvement over the state-of-the-art in both speed and robustness to occlusions.

Joint work with Stefano Soatto.

While there has been a lot of progress in designing efficient custom protocols for computing Private Set Intersection (PSI), there has been less research on using generic MPC protocols for this task. However, there are many variants of the set intersection functionality which seem hard to compute with existing custom protocols and are easy to compute with generic MPC based solutions (for example comparing the cardinality of the intersection with a threshold or measuring ad conversion rates). Generic protocols work over circuits which compute the intersection. For sets of size n the best known circuit constructions compute O(n log n) comparisons. In this work we propose new circuit-based protocols for computing variants of the intersection, with circuits computing only O(n) comparisons. Our constructions are based on a new variant of Cuckoo hashing in two dimensions. We employ several optimizations and determine experimentally the required sizes of tables and circuits, and measure the runtime, showing that our protocol is more efficient in concrete terms than existing constructions. The proof technique is new and can be generalized to analyzing simple Cuckoo hashing as well as new variants.

We study the ecological use of analogies in AI. Specifically, we address the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given representation function f, which accepts inputs in either domains, would remain unchanged. Other than f, the training data is unsupervised and consist of a set of samples from each domain, without any mapping between them. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f preserving component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity.

Joint work with Yaniv Taigman and Adam Polyak

Joint work with Srikanth Srinivasan.

A determinantal point process governed by a locally trace class Hermitian contraction kernel on a measure space $E$ remains determinantal when conditioned on its configuration on an arbitrary measurable subset $B \subset E$. Moreover, the conditional kernel can be chosen canonically in a way that is "local" in a non-commutative sense, i.e. invariant under "restriction" to closed subspaces $L^2(B) \subset P \subset L^2(E)$.

Using the properties of the canonical conditional kernel we establish a conjecture of Lyons and Peres: if $K$ is a projection then almost surely all functions in its image can be recovered by sampling at the points of the process.

I will discuss the asymptotic behaviour (both on and off the diagonal) of the spectral function of a Schroedinger operator with smooth bounded potential when energy becomes large. I formulate the conjecture that the local density of states (i.e. the spectral function on the diagonal) admits the complete asymptotic expansion and discuss the known results, mostly for almost-periodic potentials.

Let X be a regular scheme, projective and flat over Spec Z. We give a conjectural formula in terms of motivic cohomology, singular cohomology and de Rham cohomology for the special value of the zeta-function of X at any rational integer. We will explain how this reduces to the standard formula for the residue of the Dedekind zeta-function at s = 1.

A classical theorem by Anosov states that the slow motion of a slow-fast system where the fast subsystem is ergodic with respect to a smooth invariant measure can be approximated, in a well-defined sense, by the slow subsystem averaged over the fast variables. We address the question of what happens if the fast system is not ergodic. We discuss a theory which is developing in joint works with V. Gelfreich, T. Pereira, V. Rom-Kedar and K. Shah, and suggest that in the non-ergodic case the behavior of the slow variables is approximated by a random process, and not a single, deterministic averaged system. We also discuss the question of the relevance of ergodicity to the foundations of statistical mechanics.

Image processing algorithms often involve a data fidelity penalty, which encourages the solution to comply with the input data. Existing fidelity measures (including perceptual ones) are very sensitive to slight misalignments in the locations and shapes of objects. This is in sharp contrast to the human visual system, which is typically indifferent to such variations. In this work, we propose a new error measure, which is insensitive to small smooth deformations and is very simple to incorporate into existing algorithms. We demonstrate our approach in lossy image compression. As we show, optimal encoding under our criterion boils down to determining how to best deform the input image so as to make it "more compressible". Surprisingly, it turns out that very minor deformations (almost imperceptible in some cases) suffice to make a huge visual difference in methods like JPEG and JPEG2000. Thus, by slightly sacrificing geometric integrity, we gain a significant improvement in preservation of visual information.

We also show how our approach can be used to visualize image priors. This is done by determining how images should be deformed so as to best conform to any given image model. By doing so, we highlight the elementary geometric structures to which the prior resonates. Using this method, we reveal interesting behaviors of popular priors, which were not noticed in the past.

Finally, we illustrate how deforming images to possess desired properties can be used for image "idealization" and for detecting deviations from perfect regularity.

Joint work with Tamar Rott Shaham, Tali Dekel, Michal Irani, and Bill Freeman.

Given a quadratic form Q in d variables over the integers, e.g. Q(x,y,z) = xy - z^2, and a set of positive density E in Z^d, we investigate what kind of structure can be found in the set Q(E-E).

We will see that if d >= 3, and Q is indefinite, then the measure rigidity, due to Bourgain-Furman-Lindenstrauss-Mozes or Benoist-Quint, of the action of the group of the symmetries of Q implies that there exists k >=1 such that k^2*Q(Z^d) is a subset of Q(E-E).

We will give an alternative proof of the theorem for the case Q(x,y,z) = xy - z^2 that uses more classical equidistribution results of Vinogradov, and Weyl, as well as a more recent result by Frantzikinakis-Kra. The latter proof extends the theorem to other polynomials having a much smaller group of symmetries. Based on joint works with M. Bjorklund (Chalmers), and K. Bulinski (Sydney).

The algorithm referred to in the title builds on Luks's powerful group-theoretic divide-and-conquer method (1980) and addresses the bottleneck situation where Luks's method fails to "divide".

Luks will continue to "conquer" if an alternative method "divides"; we develop such a partitioning technique.

In the talk we shall outline the algorithm and explain in some detail its group theoretic core, the "Unaffected Stabilizers Lemma" and the "Local Certificates" routine. The Lemma is used to construct, somewhat implausibly, global automorphisms out of local information -- a key step toward the construction of combinatorial structures to which the partitioning method from the previous day's lecture will be applied, providing the required "divide" step.

Symmetry is defined in terms of structure-preserving transformations (automorphisms); regularity in terms of numerical invariants. Symmetry always implies regularity but there are many highly regular combinatorial objects (such as "strongly regular graphs") with no symmetry. The opposite of irregularity is regularity, not symmetry. Yet we show that in a well-defined sense, *the opposite of hidden irregularity is hidden symmetry*, and in fact hidden symmetry of a particularly robust kind.

The symmetry of a circle is easily destroyed: just "individualize" two non-opposite points -- color one of them red, the other blue -- and all the symmetry is gone. In fact, the resulting structure is completely irregular: every point is uniquely identified by a pair of numerical invariants, namely, its pair of distances to the two individualized points. We shall say that the circle has a high degree of hidden irregularity.

In contrast, *Johnson graphs* are objects with robust symmetry: individualizing a small number of vertices of a Johnson graph hardly makes a dent in its symmetry.

Recent work on the algorithmic problem of Graph Isomorphism has revealed that Johnson graphs are unique in this regard: Every finite relational structure of small arity either has a measurable (say 10%) hidden irregularity (revealed by individualizing a polylogarithmic number of elements) or has a large degree of hidden symmetry, manifested in a canonically embedded Johnson graph on more than 90% of the underlying set.

This dichotomy is the key Divide-and-Conquer tool in recent progress on the worst-case complexity of the Graph Isomorphism problem.

This subject is purely combinatorial and does not require advanced mathematical apparatus. The group theoretic aspects of the new Graph Isomorphism test will be discussed in a follow-up seminar on January 30.

Within the wide field of sparse approximation, convolutional sparse coding (CSC) has gained increasing attention in recent years. This model assumes a structured-dictionary built as a union of banded Circulant matrices. Most attention has been devoted to the practical side of CSC, proposing efficient algorithms for the pursuit problem, and identifying applications that benefit from this model. Interestingly, a systematic theoretical understanding of CSC seems to have been left aside, with the assumption that the existing classical results are sufficient.

In this talk we start by presenting a novel analysis of the CSC model and its associated pursuit. Our study is based on the observation that while being global, this model can be characterized and analyzed locally. We show that uniqueness of the representation, its stability with respect to noise, and successful greedy or convex recovery are all guaranteed assuming that the underlying representation is locally sparse. These new results are much stronger and informative, compared to those obtained by deploying the classical sparse theory.

Armed with these new insights, we proceed by proposing a multi-layer extension of this model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers. This, in turn, is shown to be tightly connected to Convolutional Neural Networks (CNN), so much so that the forward-pass of the CNN is in fact the Thresholding pursuit serving the ML-CSC model. This connection brings a fresh view to CNN, as we are able to attribute to this architecture theoretical claims such as uniqueness of the representations throughout the network, and their stable estimation, all guaranteed under simple local sparsity conditions. Lastly, identifying the weaknesses in the above scheme, we propose an alternative to the forward-pass algorithm, which is both tightly connected to deconvolutional and recurrent neural networks, and has better theoretical guarantees.

Many problems about finite groups (e.g., convergence of random walks, properties of word maps, spectrum of Cayley graphs, etc.) can be approached in terms of sums of group characters. More precisely, what intervenes in such sums are the character ratios:

X_r(g) / dim(r), g in G,

where r is an irreducible representation of G, and X_r is its character. This leads to the quest for good estimates on the character ratios.

In this talk I will introduce a precise notion of "**size**" for representations of finite classical groups and show that it tends to put together those with character ratios of the same order of magnitude.

As an application I will show how one might generalize to classical groups the following result of Diaconis-Shahshahani (for k=2) and Berestycki -Schramm -Zeitouni (for general k): The mixing time for the random walk on the group G=S_n using the cycles of length k is (1/k) n log(n).

The talk should be accessible for beginning graduate students, and is part from our joint project with **Roger Howe (Yale and Texas A&M).**

Detecting genetic adaptation in recent human history is a major challenge of population genetics. The fundamental problem is to infer historical changes in the frequency of genetic variants (alleles), from data of contemporary genomes. With this we can identify unusual changes that are unlikely to have occurred in the absence of selective pressures. However, a generally applicable method to infer recent allele frequency changes is lacking. Instead, present methods can only detect frequency changes under very restrictive assumptions on the model of selection. Moreover, their time resolution is generally limited to prehistoric scales, on the order of the past 25-75 thousand years. To address these gaps we developed a novel statistical method, Singleton Density Score (SDS), that infers the recent changes in allele frequencies from local variation in contemporary genome sequences with specificity to historical timescales. Applied to data of ~3000 genomes from the UK10K project, SDS reveals that human genetic adaptation continued well into historical times. Over the past ~2000-3000 years, ancestors of modern Britons genetically evolved over a range of phenotypes related to diet, immunity, and physical appearance. Notably, we found that polygenic adaptation, whereby selection acting on many small-effect variants across the genome that together determine a single trait, has played a pervasive, previously undetected role in shaping present genotypic and phenotypic variation.

Reference:

Field et al, Science 2016, Detection of human adaptation during the past 2000 years. https://www.ncbi.nlm.nih.gov/pubmed/27738015

I will describe two branches of my work related to algorithms for distributed networks. The main focus will be devoted for Fault-Tolerant (FT) Network Structures.

The undisrupted operation of structures and services is a crucial requirement in modern day communication networks. As the vertices and edges of the network may occasionally fail or malfunction, it is desirable to make those structures robust against failures.

FT Network Structures are low cost highly resilient structures, constructed on top of a given network, that satisfy certain desirable performance requirements concerning, e.g., connectivity, distance or capacity. We will overview some results on fault tolerant graph structures with a special focus on FT Breadth-First-Search.

The second part of the talk will discuss distributed models and algorithms for large-scale networks. Towards the end, we will see some connections between distributed computing and other areas such as EE and Biology.

Let M be a smooth, compact, connected two-dimensional, Riemannian manifold without boundary, and let C_epsilon be the amount of time needed for the Brownian motion to come within (Riemannian) distance epsilon of all points in M. The first order asymptotics of C_epsilon as epsilon goes to 0 are known. We show that for the two dimensional sphere

\sqrt{C_epsilon}-2\sqrt{2}\( \log \epsilon^{-1}- \frac{1}{4}\log\log \epsilon^{-1}\) is tight.

Joint work with David Belius and Ofer Zeitouni.

**First Speaker: Ran Tessler (ETH)**

Time: 11:00

Title: A sharp threshold for Hamiltonian spheres in a random 2-complex.

Abstract: We define the notion of Hamiltonian sphere - a 2-complex homeomorphic to a sphere which uses all vertices. We prove an explicit sharp threshold for the appearance of Hamiltonian spheres in the Linial-Meshulam model for random 2-complexes. The proof combines combinatorial, probabilistic and geometric arguments. Based on a joint work with Zur Luria.

**Second Speaker: Assaf Naor (Princeton)**

Time: 12:00

Title: A new vertical-versus-horizontal isoperimetric inequality on the Heisenberg group, with applications to metric geometry and approximation algorithms

Abstract: In this talk we will show that for every measurable subset of the Heisenberg group of dimension at least 5, an appropriately defined notion of its "vertical perimeter" is at most a constant multiple of its horizontal (Heisenberg) perimeter. We will explain how this new isoperimetric-type inequality solves open questions in analysis (an endpoint estimate for a certain singular integral on W^{1,1}), metric geometry (sharp nonembeddability into L_1) and approximation algorithms (asymptotic evaluation of the performance of the Goemans-Linial algorithm for the Sparsest Cut problem). Joint work with Robert Young.

Let X be a set definable in some o-minimal structure. The Pila-Wilkie theorem (in its basic form) states that the number of rational points in the transcendental part of X grows sub-polynomially with the height of the points. The Wilkie conjecture stipulates that for sets definable in $R_\exp$, one can sharpen this asymptotic to polylogarithmic.

I will describe a complex-analytic approach to the proof of the Pila-Wilkie theorem for subanalytic sets. I will then discuss how this approach leads to a proof of the "restricted Wilkie conjecture", where we replace $R_\exp$ by the structure generated by the restrictions of $\exp$ and $\sin$ to the unit interval (both parts are joint work with Dmitry Novikov). If time permits I will discuss possible generalizations and applications.

Pseudo-deterministic algorithms are randomized search algorithms that on different executions on the same input, output the same solution with high probability.

We will discuss how pseudo-deterministic algorithms bridge the gap between randomized search and decision problems for problems in P and in NC. Next, we will show a pseudo-deterministic NC algorithm for bipartite matching. Finally, we will show how pseudo-determinism can be used to save on random bits used by classical randomized algorithms, and apply the method to obtain an algorithm for RNC depth first search using only O(log^2 n) random bits. This is joint work with Shafi Goldwasser.

Consider the Gaussian Entire Function (GEF) whose Taylor coefficients are independent complex-valued Gaussian variables, and the variance of the k-th coefficient is 1/k!. This random Taylor series is distinguished by the invariance of its zero set with respect to the isometries of the complex plane.

I will show that the law of the zero set, conditioned on the GEF having no zeros in a disk of radius r, and properly normalized, converges to an explicit limiting Radon measure in the plane, as r goes to infinity. A remarkable feature of this limiting measure is the existence of a large 'forbidden region' between a singular part supported on the boundary of the (scaled) hole and the equilibrium measure far from the hole. This answers a question posed by Nazarov and Sodin, and is in stark contrast to the corresponding result known to hold in the random matrix setting, where such a gap does not appear.

The talk is based on a joint work with S. Ghosh.

The Operator Scaling problem asks whether a set of complex matrices can be jointly moved to a certain canonical (isotropic) position. This problem has a remarkable number of myriad incarnations: non-commutative algebra, invariant theory, arithmetic complexity, quantum information theory, analytic inequalities and more. We will describe an efficient algorithm solving all these related problems, and explain how their analysis combines ideas from all these areas.

Through these connections, the algorithm can be shown to solve some non-convex optimization problems, some systems of quadratic equations, and some linear programs with exponentially many inequalities - all these, and concrete examples we will give, suggest that it might be a powerful algorithmic tool via reductions to these problems.

No special background will be assumed!

Joint on two joint works with Ankit Garg, Leonid Gurvits and Rafael Olivera.

This talk is longer than usual and has a two-hour slot.

If f, g are two polynomials in C[x,y] such that J(f,g)=1, but C[f,g] does not coincide with C[x,y], then the mapping given by these polynomials ( (x,y) maps to (f(x,y), g(x,y)) ) has a rather unexpected property which will be discussed in the talk.

The proliferation of data collection in the health, commercial, and economic spheres, brings with it opportunities for extracting new knowledge with concrete policy implications. Examples include individualizing medical practices based on electronic healthcare records, and understanding the implications of job training programs on employment and income.

The scientific challenge lies in the fact that standard prediction models such as supervised machine learning are often not enough for decision making from this so-called "observational data": Supervised learning does not take into account causality, nor does it account for the feedback loops that arise when predictions are turned into actions. On the other hand, existing causal-inference methods are not adapted to dealing with the rich and complex data now available, and often focus on populations, as opposed to individual-level effects.

The problem is most closely related to reinforcement learning and bandit problems in machine learning, but with the important property of having no control over experiments and no direct access to the actor's model.

In my talk I will discuss how we apply recent ideas from machine learning to individual-level causal-inference and action. I will introduce a novel generalization bound for estimating individual-level treatment effect, and further show how we use representation learning and deep temporal generative models to create new algorithms geared towards this problem. Finally, I will show experimental results using data from electronic medical records and data from a job training program.

The sensitivity conjecture is a famous open problem in the theory of boolean functions. Let f be a boolean function defined on the hypercube. The sensitivity of a node x is the number of its neighbours in the hypercube, for which f give the opposite value as that it does on x. The sensitivity conjecture speculates that if all nodes have low sensitivity, then the function f must be simple. Concretely, all its Fourier mass is supported on levels with low hamming weight.

Recently, Gopalan et al [CCC 2016] conjectured a robust analogue of the sensitivity conjecture: if most of the nodes have low sensitivity, then most of the Fourier mass is supported on levels with low hamming weight. They also prove it under the stronger assumption that all nodes have low sensitivity. In this work, we prove this conjecture, with near tight quantitative bounds.

Joint work with Avishay Tal (IAS) and Jiapeng Zhang (UCSD).

Given a dynamical system, a partition of the space induces a mapping to the space of sequences of the partition elements (a point is mapped to the partition elements containing its orbit terms). Such a duality is called Symbolic Dynamics, Markov partitions are an important tool, as the symbolic dynamics they induce enfold many of the important dynamical properties of the original system, and they allow an easier studying of them.

We show that general non uniformly hyperbolic C^{1+epsilon} diffeomorphism on compact manifolds of any dimension admit countable Markov partitions. Previously this was only known in dimension 2.

We study an online learning framework introduced by Mannor and Shamir (2011) in which the feedback is specified by a graph, in a setting where the graph may vary from round to round and is \emph{never fully revealed} to the learner. We show a large gap between the adversarial and the stochastic cases. In the adversarial case, we prove that even for dense feedback graphs, the learner cannot improve upon a trivial regret bound obtained by ignoring any additional feedback besides her own loss. In contrast, in the stochastic case we give an algorithm that achieves $\widetilde \Theta(\sqrt{\alpha T})$ regret over $T$ rounds, provided that the independence numbers of the hidden feedback graphs are at most $\alpha$. completely unlearnable. We also extend our results to a more general feedback model, in which the learner does not necessarily observe her own loss, and show that, even in simple cases, concealing the feedback graphs might render the problem unlearnable.

Cauchy-Riemann maps (shortly: CR-maps) occur in complex analysis as boundary values of maps holomorphic in a domain in complex space. As a rule, CR-mappings of real-analytic hypersurfaces appear to be analytic as well. However, we recently showed in a joint work with Rasul Shafikov the existence of Stokes Phenomenon in CR-geometry: there exist real-analytic hypersurfaces, which are equivalent formally, but not holomorphically.

Despite of this, it appears that in complex dimension 2, CR-maps necessarily posses appropriate weaker regularity properties. Namely, components of such maps necessarily belong to the well known Gevrey classes. The latter statement has the following remarkable application: if two real-analytic hypersurfaces in complex two-space are equivalent formally, then they are also equivalent smoothly.

The proof of all these facts employs the recent multi-summability theory in Dynamical Systems. It as well employs the recent CR-DS technique that we developed, which connects CR-manifolds and certain Dynamical Systems. In this talk, I will outline the technique, as well as some details of the proof.

Liouville's rigidity theorem (1850) states that a map $f:\Omega\subset R^d \to R^d$ that satisfies $Df \in SO(d)$ is an affine map. Reshetnyak (1967) generalized this result and showed that if a sequence $f_n$ satisfies $Df_n \to SO(d)$ in $L^p$, then $f_n$ converges to an affine map.

In this talk I will discuss generalizations of these theorems to mappings between manifolds, present some open questions, and describe how these rigidity questions arise in the theory of elasticity of pre-stressed materials (non-Euclidean elasticity).

If time permits, I will sketch the main ideas of the proof, using Young measures and harmonic analysis techniques, adapted to Riemannian settings.

Based on a joint work with Asaf Shachar and Raz Kupferman.

We show an efficient method for converting a logic circuit of gates with fan-out 1 into an equivalent circuit that works even if some fraction of its gates are short-circuited, i.e., their output is short-circuited to one of their inputs. Our conversion can be applied to any circuit with fan-in k>= 2, yielding a resilient circuit whose size is polynomial in the size of the (non-resilient) input circuit.

The resilient circuit gives the correct output as long as less than 1/3 of the gates in any of its input-to-output paths are corrupted. Furthermore, we prove that a resilience level of 1/3 is optimal (maximal) for this type of faulty gates. This fully answers an open question by Kalai et al. (FOCS 2012).

Joint work with Mark Braverman and Michael Yitayew.

Everyone wants to program with "high-level concepts", rather than meddle with the fine details of the implementation, such as pointers, network packets, and asynchronous callbacks. This is usually achieved by introducing layers of abstraction - but every layer incurs some overhead, and when they accumulate, this overhand becomes significant and sometimes prohibitive. Optimizing the program often requires to break abstractions, which leads to suboptimal design, higher maintenance costs, and subtle hard-to-trace bugs.

I will present two recent projects that attempt to address this difficulty. STNG is an automated lifting compiler that can synthesize high-level graphics code for computing stencils over matrices, from low-level legacy code written in Fortran. Its output is expressed in Halide, a domain-specific language for image processing that can take advantage of modern GPUs. The result is therefore code that is both easier to understand and more efficient than the original code.

Bellmania is a specialized tool for accelerating dynamic-programming algorithms by generating recursive divide-and-conquer implementations of them. Recursive divide-and-conquer is an algorithmic technique that was developed to obtain better memory locality properties. Bellmania includes a high-level language for specifying dynamic programming algorithms and a calculus that facilitates gradual transformation of these specifications into efficient implementations. These transformations formalize the divide-and-conquer technique; a visualization interface helps users to interactively guide the process, while an SMT-based back-end verifies each step and takes care of low-level reasoning required for parallelism.

The lesson is that synthesis techniques are becoming mature enough to play a role in the design and implementation of realistic software systems, by combining the elegance of abstractions with the performance gained by optimizing and tuning the fine details.

We discuss recent progress on hardness of 2-to-2 games, with applications to the inapproximability of the Vertex-Cover problem.

A 2-to-2 game (which is a variant of Khot's well known unique games), is defined by a graph where there is a variable in each node, and a constraint of a specific structure defined on each edge. While in unique games each edge- constraint must be a one-to-one correspondence -- i.e. for each assignment to one node there is exactly one assignent to the other node that satisfies the constraint -- in 2-to-2 games the correspondence has a "two-to-two" structure.

The goal is to distinguish between instances in which almost all of the edge- constraints can be satisfied, and instances in which almost none of them can be satisfied simultaneously.

We present a new combinatorial hypothesis regarding Grassmann graphs, and show that it implies that 2-to-2 games are NP-hard *in a certain sense*. As a consequence, the hypothesis implies that it is NP-hard to distinguish between graphs that have an independent set of fractional size (1- 1/sqrt{2}), and graphs with no independent sets of any constant fractional size. This easily implies that it is NP-hard to approximate the Vertex Cover problem within a factor \sqrt{2} - o(1).

The talk is mostly based on a joint work with Subhash Khot and Muli Safra, nevertheless, we will also mention results from a more recent extension, which is a joint work with Irit Dinur, Subhash Khot, Guy Kindler and Muli Safra.

Selfadjoint extensions of a closed symmetric operator A in a Hilbert space with equal deficiency indices were described by in the 30's by J. von Neumann. Another approach, based on the notion of abstract boundary triple originates in the work of J.W. Calkin and was developed by M. I. Visik, G. Grubb, F. S. Rofe-Beketov, M. L. Gorbachuck, A .N. Kochubei and others.

By Calkin's approach, all selfadjoint extensions of the symmetric operator A can be parametrized via "multivalued" selfadjoint operators in an auxiliary Hilbert space. Spectral properties of these extensions can be characterized in terms of the abstract Weyl function, associated to the boundary triple. In the present talk some recent developments in the theory of boundary triples will be presented. Applications to boundary value problems for Laplacian operators in bounded domains with smooth and rough boundaries will be discussed.

We characterize the communication complexity of truthful mechanisms. Our departure point is the well known taxation principle. The taxation principle asserts that every truthful mechanism can be interpreted as follows: every player is presented with a menu that consists of a price for each bundle (the prices depend only on the valuations of the other players). Each player is allocated a bundle that maximizes his profit according to this menu. We define the taxation complexity of a truthful mechanism to be the logarithm of the maximum number of menus that may be presented to a player.

Our main finding is that in general the taxation complexity essentially equals the communication complexity. The proof consists of two main steps. First, we prove that for rich enough domains the taxation complexity is at most the communication complexity. We then show that the taxation complexity is much smaller than the communication complexity only in "pathological" cases and provide a formal description of these extreme cases.

Our approach yields several applications, including strengthening the solution concept with low communication overhead, fast computation of prices, and hardness of approximation by computationally efficient truthful mechanisms.

In the classical Node-Disjoint Paths (NDP) problem, the input consists of an undirected n-vertex graph G, and a collection M of pairs of its vertices, called source-destination, or demand, pairs. The goal is to route the largest possible number of the demand pairs via node-disjoint paths. The best current approximation for the problem is achieved by a simple greedy algorithm, whose approximation factor is O(\sqrt n), while the best current negative result is a roughly \Omega(log^{1/2}n)-hardness of approximation. Even seemingly simple special cases of the problem are still poorly understood: when the input graph is a grid, the best current algorithm achieves a \tilde{O}(n^{1/4})- approximation, and when it is a general planar graph, the best current approximation ratio of an efficient algorithm is \tilde{O}(n^{9/19}). The best currently known lower bound for both these versions of the problem is APX- hardness.

In this talk we will show that NDP is 2^{\Omega(\log n)}-hard to approximate, unless all problems in NP have algorithms with running time n^{O(\log n)}. Our result holds even when the underlying graph is a planar graph with maximum vertex degree 3, and all source vertices lie on the boundary of a single face. We extend this result to the closely related Edge-Disjoint Paths problem, showing the same hardness of approximation ratio even for sub-cubic planar graphs with all sources lying on the boundary of a single face.

This is joint work with David H.K. Kim and Rachit Nimavat.

One of the most common tasks in cryptography and cryptanalysis is to find some interesting event (a needle) in an exponentially large collection (haystack) of N=2^n possible events, or to demonstrate that no such event is likely to exist. In particular, we are interested in finding needles which are defined as events that happen with an unusually high probability of p>>1/N in a haystack which is an almost uniform distribution on N possible events. When the search algorithm can only sample values from this distribution, the best known time/memory tradeoff for finding such an event requires O(1/Mp^2) time given O(M) memory.

In this talk I will describe much faster needle searching algorithms in the common cryptographic setting in which the distribution is defined by applying some deterministic function f to random inputs. Such a distribution can be modeled by a random directed graph with N vertices in which almost all the vertices have O(1) predecessors while the vertex we are looking for has an unusually large number of O(pN) predecessors. When we are given only a constant amount of memory, we propose a new search methodology which we call **NestedRho**. As p increases, such random graphs undergo several subtle phase transitions, and thus the log-log dependence of the time complexity T on p becomes a piecewise linear curve which bends four times. Our new algorithm is faster than the O(1/p^2) time complexity of the best previous algorithm in the full range of 1/N < p < 1 , and in particular it improves the previous time complexity by a significant factor of \sqrt{N} for any p in the range N^(- 0.75) < p < N^(-0.5). When we are given more memory, we show how to combine the **NestedRho** technique with the parallel collision search technique in order to further reduce its time complexity. Finally, we show how to apply our new search technique to more complicated distributions with multiple peaks when we want to find all the peaks whose probabilities are higher than p.

Joint work with Itai Dinur, Orr Dunkelman and Nathan Keller.

Projected gradient descent (PGD), and its close variants, are often considered the methods of choice for solving a large variety of machine learning optimization problems, including empirical risk minimization, statistical learning, and online convex optimization. This is not surprising, since PGD is often optimal in a very appealing information-theoretic sense. However, for many problems PGD is infeasible both in theory and practice since each step requires to compute an orthogonal projection onto the feasible set. In many important cases, such as when the feasible set is a non-trivial polytope, or a convex surrogate for a low-rank structure, computing the projection is computationally inefficient in high-dimensional settings. An alternative is the conditional gradient method (CG), aka Frank-Wolfe algorithm, that replaces the expensive projection step with a linear optimization step over the feasible set. Indeed in many problems of interest, the linear optimization step admits much more efficient algorithms than the projection step, which is the reason to the substantial regained interest in this method in the past decade. On the downside, the convergence rates of the CG method often fall behind that of PGD and its variants.

In this talk I will survey an ongoing effort to design CG variants that on one hand enjoy the cheap iteration complexity of the original method, and on the other hand converge provably faster, and are applicable to a wider variety of machine learning settings. In particular I will focus on the cases in which the feasible set is either a polytope or a convex surrogate for low-rank matrices. Results will be demonstrated on applications including: LASSO, video co-localization, optical character recognition, matrix completion, and multi-class classification.

Suppose that you have n truly random bits X=(X1,...,Xn) and you wish to use them to generate m>>n pseudorandom bits Y=(Y1,..., Ym) using a local mapping, i.e., each Yi should depend on at most d=O(1) bits of x. In the polynomial regime of m=n^s, s>1, the only known solution, originates from (Goldreich, ECCC 2000), is based on Random Local Functions: Compute Yi by applying some fixed (public) d-ary predicate P to a random (public) tuple of distinct inputs. In this talk, we will try to understand, for any value of s, how the pseudorandomness of the resulting sequence depends on the choice of the underlying predicate.

Based on joint work with Shachar Lovett.

One of the most exciting possibilities opened by deep neural networks is end-to-end learning: the ability to learn tasks without the need for feature engineering or breaking down into sub-tasks. This talk will present three cases illustrating how end-to-end learning can operate in machine perception across the senses (Hearing, Vision) and even for the entire perception-cognition-action cycle.

The talk begins with speech recognition, showing how acoustic models can be learned end-to-end. This approach skips the feature extraction pipeline, carefully designed for speech recognition over decades.

Proceeding to vision, a novel application is described: identification of photographers of wearable video cameras. Such video was previously considered anonymous as it does not show the photographer.

The talk concludes by presenting a new task, encompassing the full perception-cognition-action cycle: visual learning of arithmetic operations using only pictures of numbers. This is done without using or learning the notions of numbers, digits, and operators.

The talk is based on the following papers:

Speech Acoustic Modeling From Raw Multichannel Waveforms, Y. Hoshen, R.J. Weiss, and K.W. Wilson, ICASSP'15

An Egocentric Look at Video Photographer Identity, Y. Hoshen, S. Peleg, CVPR'16

Visual Learning of Arithmetic Operations, Y. Hoshen, S. Peleg, AAAI'16

If $X$ is a finite set, the $p$-biased measure on the power-set of $X$ is defined as follows: choose a subset $S$ of $X$ at random by including each element of $X$ independently with probability $p$. If $\mathcal{F}$ is a family of subsets of $X$, one can consider the {\em $p$-biased measure} of $\mathcal{F}$, denoted by $\mu_p(\mathcal{F})$, as a function of $p$; if $\mathcal{F}$ is closed under taking supersets, then this function is an increasing function of $p$. Seminal results of Friedgut and Friedgut-Kalai give criteria for this function to have a 'sharp threshold'. A careful analysis of the behaviour of this function also yields some rather strong results in extremal combinatorics which do not explicitly mention the $p$-biased measure - in particular, in the field of {\em Erd\H{o}s-Ko-Rado type problems}, which concern the sizes of families of objects in which any two (or three...) of the objects 'agree' or 'intersect' in some way. We will discuss some of these, including a recent proof of an old conjecture of Frankl that a symmetric three-wise intersecting family of subsets of $\{1,2,\ldots,n\}$ has size $o(2^n)$, and some 'stability' results characterizing the structure of 'large' $t$-intersecting families of $k$-element subsets of $\{1,2,\ldots,n\}$. Based on joint work with (subsets of) Nathan Keller, Noam Lifshitz and Bhargav Narayanan.

Interactive proofs have had a dramatic impact on Complexity Theory and Cryptography. The celebrated IP=PSPACE Theorem [LFKN92,Shamir92] allows an all-powerful but untrusted prover to convince a polynomial-time verifier of the validity of extremely complicated statements (as long as they can be evaluated using polynomial space). The interactive proof system designed for this purpose requires a polynomial number of communication rounds. This talk will focus on studying the power of more efficient interactive proof systems.

Our main result is that for every statement that can be evaluated in polynomial time and bounded-polynomial space, there exists an interactive proof that satisfies the following strict efficiency requirements:

(1) The honest prover runs in polynomial time.

(2) The verifier is almost linear time (and under some conditions even sub linear).

(3) The interaction consists of only a constant number of communication rounds.

To obtain this result, we introduce and study several new notions for interactive proofs, which may be of independent interest. One of these notions is that of unambiguous interactive proofs, where the prover has a unique successful strategy. Another notion is that of probabilistically checkable interactive proofs (PCIPs) where the verifier only reads a few bits of the transcript in checking the proof (this could be viewed as an interactive extension of PCPs).

Joint work with Omer Reingold and Ron Rothblum.

Computer science and optics are usually studied separately -- separate people, in separate departments, meet at separate conferences. This is changing. The exciting promise of technologies like virtual reality and self-driving cars demand solutions that draw from the best aspects of computer vision, computer graphics, and optics. Previously, it has proved difficult to bridge these communities. For instance, the laboratory setups in optics are often designed to image millimeter-size scenes in a vibration-free darkroom.

This talk is centered around time of flight imaging, a growing area of research in computational photography. A time of flight camera works by emitting amplitude modulated (AM) light and performing correlations on the reflected light. The frequency of AM is in the radio frequency range (like a Doppler radar system), but the carrier signal is optical, overcoming diffraction limited challenges of full RF systems while providing optical contrast. The obvious use of such cameras is to acquire 3D geometry. By spatially, temporally and spectrally coding light transport we show that it may be possible to go "beyond depth", demonstrating new forms of imaging like photography through scattering media, fast relighting of photographs, real-time tracking of occluded objects in the scene (like an object around a corner), and even the potential to distinguish between biological molecules using fluorescence. We discuss the broader impact of this design paradigm on the future of 3D depth sensors, interferometers, computational photography, medical imaging and many other applications.

Cluster algebras are a class of commutative rings introduced by Fomin and Zelevinsky in 2000. I will give an introductory talk about cluster algebras. The main examples are the cluster algebra of type A2, the coordinate ring of $SL_4/N$, and the homogeneous coordinate ring of the Grassmannian $Gr_{2,n+3}(\mathbb{C})$.

This Thursday we will have two SIGGRAPH rehearsal talks in the Vision Seminar, one by Netalee Efrat and one by Meirav Galun. Abstracts are below. Each talk will be about 15 minutes (with NO interruptions), followed by 10 minutes feedback.

**Talk1 (Netalee Efrat): Cinema 3D: Large scale automultiscopic display **

While 3D movies are gaining popularity, viewers in a 3D cinema still need to wear cumbersome glasses in order to enjoy them. Automultiscopic displays provide a better alternative to the display of 3D content, as they present multiple angular images of the same scene without the need for special eyewear. However, automultiscopic displays cannot be directly implemented in a wide cinema setting due to variants of two main problems: (i) The range of angles at which the screen is observed in a large cinema is usually very wide, and there is an unavoidable tradeoff between the range of angular images supported by the display and its spatial or angular resolutions. (ii) Parallax is usually observed only when a viewer is positioned at a limited range of distances from the screen. This work proposes a new display concept, which supports automultiscopic content in a wide cinema setting. It builds on the typical structure of cinemas, such as the fixed seat positions and the fact that different rows are located on a slope at different heights. Rather than attempting to display many angular images spanning the full range of viewing angles in a wide cinema, our design only displays the narrow angular range observed within the limited width of a single seat. The same narrow range content is then replicated to all rows and seats in the cinema. To achieve this, it uses an optical construction based on two sets of parallax barriers, or lenslets, placed in front of a standard screen. This paper derives the geometry of such a display, analyzes its limitations, and demonstrates a proof-of-concept prototype.

*Joint work with Piotr Didyk, Mike Foshey, Wojciech Matusik, Anat Levin

**Talk 2 (Meirav Galun): Accelerated Quadratic Proxy for Geometric Optimization **

We present the Accelerated Quadratic Proxy (AQP) - a simple first order algorithm for the optimization of geometric energies defined over triangular and tetrahedral meshes. The main pitfall encountered in the optimization of geometric energies is slow convergence. We observe that this slowness is in large part due to a Laplacian-like term existing in these energies. Consequently, we suggest to exploit the underlined structure of the energy and to locally use a quadratic polynomial proxy, whose Hessian is taken to be the Laplacian. This improves stability and convergence, but more importantly allows incorporating acceleration in an almost universal way, that is independent of mesh size and of the specific energy considered. Experiments with AQP show it is rather insensitive to mesh resolution and requires a nearly constant number of iterations to converge; this is in strong contrast to other popular optimization techniques used today such as Accelerated Gradient Descent and Quasi-Newton methods, e.g., L-BFGS. We have tested AQP for mesh deformation in 2D and 3D as well as for surface parameterization, and found it to provide a considerable speedup over common baseline techniques.

*Joint work with Shahar Kovalsky and Yaron Lipman

The Jacquet-Rallis relative trace formula was introduced as a tool towards solving the global conjectures of Gan-Gross-Prasad for unitary groups. I will present some recent progress in developing the full formula.

I will show how to extend the transfer of regular orbital integrals to singular geometric terms using a mix of local and global methods.

(Joint with Pierre-Henri Chaudouard)

Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge-coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. In this paper we prove that these exponential gaps are necessary and establish connections between the deterministic and randomized complexities in the LOCAL model. Each result has a very compelling take-away message:

1. Fast Δ-coloring of trees requires random bits: Building on the recent lower bounds of Brandt et al., we prove that the randomized complexity of Δ-coloring a tree with maximum degree Δ≥55 is Θ(log_Δ log n), whereas its deterministic complexity is Θ(log_Δ n) for any Δ≥3. This also establishes a large separation between the deterministic complexity of Δ-coloring and (Δ+1)-coloring trees.

2. Randomized lower bounds imply deterministic lower bounds: We prove that any deterministic algorithm for a natural class of problems that runs in O(1)+o(log_Δ n) rounds can be transformed to run in O(log*n −log*Δ+1) rounds. If the transformed algorithm violates a lower bound (even allowing randomization), then one can conclude that the problem requires Ω(log_Δ n) time deterministically.

3. Deterministic lower bounds imply randomized lower bounds: We prove that the randomized complexity of any natural problem on instances of size n is at least its deterministic complexity on instances of size √ log n. This shows that a deterministic Ω(log_Δ n) lower bound for any problem implies a randomized Ω(log_Δ log n) lower bound. It also illustrates that the graph shattering technique is absolutely essential to the LOCAL model.

Joint work with Tsvi Kopelowitz and Seth Pettie. http://arxiv.org/abs/1602.08166

There are numerous essentially equivalent characterizations of mixing in $L_1$ of a finite Markov chain. Some of these characterizations involve natural probabilistic concepts such as couplings, stopping times and hitting times. In contrast, while there are several analytic and geometric tools for bounding the $L_2$ mixing time, none of them are tight and they do not have a probabilistic interpretation.

We provide tight probabilistic characterizations in terms of hitting times distributions for mixing in $L_2$ (for normal chains) and (under reversibility) in relative entropy. This is done by assigning appropriate penalty (depending on the size of the set) to the case that the chain did not escape from a certain set.

We also prove a new extremal characterization of the log-sobolev constant in terms of a weighted version of the spectral gap (where the weight depends on the size of the support of the function).

We will discuss representation theory of a symmetric pair (G,H), where G is a complex reductive group, and H is a real form of G. The main objects of study are the G-representations with a non trivial H-invariant functional, called the H-distinguished representations of G.

I will give a necessary condition for a G-representation to be H-distinguished and show that the multiplicity of such representations is less or equal to the number of double cosets B\G/H, where B is a Borel subgroup of G.

Quantum computers are hypothetical devices, based on quantum physics, which would enable us to perform certain computations (among them some that Chaim Leib Pekeris pioneered) hundreds of orders of magnitude faster than digital computers. This feature is coined “quantum supremacy.” We start the lecture with a gentle introduction to computing - classical and quantum, with basic notions of computational complexity, and with the vision of quantum computers.

A main reason for concern regarding the feasibility of quantum computers is that quantum systems are inherently noisy. We will explain what is "noise" and describe an optimistic hypothesis regarding quantum noise that will allow quantum computing and a pessimistic hypothesis that won’t. The remarkable progress witnessed during the past two decades in the field of experimental physics of controlled quantum systems places the decision between the pessimistic and optimistic hypotheses within reach. On the optimistic side, one aspect or another of quantum supremacy might be seen by experiments in the near future: by implementing quantum error-correction or by systems of free bosons or by exotic new phases of matter called anyons or by quantum annealing, or in various other ways.

In the lecture I will explain my pessimistic line of research and here is a brief summary of my view: understanding quantum computers in the presence of noise requires consideration of behavior at different scales. In the small scale, standard models of noise from the mid-90s are suitable, and quantum evolutions and states described by them manifest a very low-level computational power. This small-scale behavior has far-reaching consequences for the behavior of noisy quantum systems at larger scales. On the one hand, it does not allow reaching the starting points for quantum fault tolerance and quantum supremacy, making them both impossible at all scales. On the other hand, it leads to novel implicit ways for modeling noise at larger scales and to various predictions on the behavior of noisy quantum systems.

We will rely on the theory of noise-sensitivity and stability developed with Benjamini and Schramm in the late 90s and on recent work with Guy Kindler related to the mysterious gap between permanents and determinants (or, in other words, between bosons and fermions).

Let F/Q_p be a finite extension, supersingular representations are the irreducible mod p representations of GL_n(F) which do not appear as a subquotient of a principal series representation, and similarly to the complex case, they are the building blocks of the representation theory of GL_n(F). Historically, they were first discovered by L. Barthel and R. Livne some twenty years ago and they are still not understood even for n=2.

For F=Q_p, the supersingular representations of GL_2(F) have been classified by C. Breuil, and a local mod p Langlands correspondence was established between them and certain mod p Galois representations.

When one tries to generalize this connection and move to a non-trivial extension of Q_p, Breuil's method fails; The supersingular representations in that case have complicated structure and instead of two as in the case F=Q_p we get infinitely many such representations, when there are essentially only finitely many on the Galois side.

In this talk we give an exposition of the subject and explore, using what survives from Breuil's methods, the universal modules whose quotients contain all the supersingular representations in the difficult case where F is a non-trivial extension of Q_p.

In distributed systems, communication between the participants in the computation is usually the most expensive part of the computation. Theoretical models of distributed systems usually reflect this by neglecting the cost of local computation, and charging only for messages sent between the participants; in particular, we usually assume that the computation proceeds in rounds, and in each round, each participant can send only a limited number of bits. We are interested in characterizing the number of rounds required to perform various tasks.

In this talk I will describe two sets of results. The first concerns the complexity of distributed subgraph detection: we have n servers, each representing a node in an undirected graph, and each server receives as input its adjacent edges in the graph. The goal of the computation is to determine whether the global input graph contains some fixed subgraph. I will describe upper and lower bounds for several classes of subgraphs, through a connection to Turan numbers. The general case remains open.

In the second part of the talk I will describe recent work on multi- party number-in-hand communication and information complexity, and show a tight upper and lower bound for set disjointness in the shared blackboard model.

Joint work with Mark Braverman, Andrew Drucker and Fabian Kuhn.

Children may learn about the world by pushing, banging, and manipulating things, watching and listening as materials make their distinctive sounds-- dirt makes a thud; ceramic makes a clink. These sounds reveal physical properties of the objects, as well as the force and motion of the physical interaction.

We've explored a toy version of that learning-through-interaction by recording audio and video while we hit many things with a drumstick. We developed an algorithm the predict sounds from silent videos of the drumstick interactions. The algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We demonstrate that the sounds generated by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that the task of predicting sounds allows our system to learn about material properties in the scene.

Joint work with:

Andrew Owens, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson

http://arxiv.org/abs/1512.08512

The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. In a sense, the strongest guarantee available is the L2 guarantee, which requires finding all items that occur at least eps*||f|| times in the stream, where the i-th coordinate of the vector f is the number of occurrences of i in the stream. The first algorithm to achieve the L2 guarantee was the CountSketch (Charikar, Chen, and Farach-Colton ICALP'02), which, for constant eps, requires O(log n) words of memory and O(log n) update time. It is known to be space-optimal if the stream includes deletions.

In this talk I will discuss recent improvements that allow us to find L2 heavy hitters in O(1) memory and O(1) update time in insertion only streams. The improvements rely on a deeper understanding of the AMS sketch (Alon, Matias, and Szegedy STOC'96) and similar sketches and draw on the theory of Gaussian processes. This talk is based on joint work with Vladimir Braverman, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P. Woodruff in arxiv:1511.00661 and arxiv:1603.00759.

There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for essentially similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self-averaging, interpolating algorithm which fits training data without error but is nevertheless somewhat smooth. We show that AdaBoost has the same property. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples and some theoretical justification to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees and without direct regularization or early stopping.

For some rectangular Hardy classes of analytic functions,an optimal method of interpolation has been previously found, within the framework of Optimal Recovery. It will be shown that this method of interpolation, based on the Abel-Jacobi elliptic functions, is also optimal, according to corresponding criteria of Nonparametric Regression and Optimal Design.

In a non-asymptotic setting, the maximal mean squared error of the optimal interpolant is evaluated explicitly, for all noise levels away from 0. In these results, a pivotal role is played by an interference effect, in which both the stochastic and deterministic parts of the interpolant exhibit an oscillating behavior, with the two oscillating processes mutually subduing each other.

In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, we propose the methodology of sampling correctors. Such algorithms use structure that the distribution is purported to have, in order to allow one to make "on-the-fly" corrections to samples drawn from probability distributions. These algorithms may then be used as filters between the noisy data and the end user. We show connections between sampling correctors, distribution learning algorithms, and distribution property testing algorithms. We show that these connections can be utilized to expand the applicability of known distribution learning and property testing algorithms as well as to achieve improved algorithms for those tasks.Warning: This talk contains more questions than answers...

Joint work with Clement Canonne and Themis Gouleakis.

We present a randomized algorithm that computes a Minimum Spanning Tree (MST) in O(log^* n) rounds, with high probability, in the Congested Clique model of distributed computing. In this model, the input is a graph on n nodes, initially each node knows only its incident edges, and per round each two nodes can exchange O(log n) bits.

Our key technical novelty is an O(log^* n) Graph Connectivity algorithm, the heart of which is a (recursive) forest growth method, based on a combination of two ideas: a sparsity-sensitive sketching aimed at sparse graphs and a random edge sampling aimed at dense graphs.

Our result improves significantly over the $O(\log \log \log n)$ algorithm of Hegeman et al. [PODC 2015] and the $O(\log \log n)$ algorithm of Lotker et al. [SPAA 2003; SICOMP 2005].

Join work with Mohsen Ghaffari.

Estimating the amount of distinct elements in a dataset by examining only a fraction of the data is known to be a hard problem, both theoretically and in practice.

Our work explores a breakthrough theoretical result by Valiant and Valiant from 2011 that presents a provably accurate method for doing such estimations.

Our goal is to put this theory into practice for the important task of estimating the deduplication ratio of a very large dataset. However, deploying this technique in a real world setting runs into significant obstacles.

In the talk I will describe new techniques that help bridging the gap and enable the use of this exciting approach. Our work achieves a major improvement over the current state of the art practical solutions.

The talk is for a general audience, no prior knowledge is assumed.

Based on joint work with Dmitry Sotnikov and Ety Khaitzin that appeared at Usenix FAST 2016.

While co-addition and subtraction of astronomical images stand at the heart of observational astronomy, the existing solutions for them lack rigorous argumentation, are not achieving maximal sensitivity and are often slow. Moreover, there is no widespread agreement on how they should be done, and often different methods are used for different scientific applications. I am going to present rigorous solutions to these problems, deriving them from the most basic statistical principles. These solutions are proved optimal, under well defined and practically acceptable assumptions, and in many cases improve substantially the performance of the most basic operations in astronomy.

For coaddition, we present a coadd image that is:

a) sufficient for any further statistical decision or measurement on the underlying constant sky, making the entire data set redundant.

b) improves both survey speed (by 5-20%) and effective spatial resolution of past and future astronomical surveys.

c) improves substantially imaging through turbulence applications.

d) much faster than many of the currently used coaddition solutions.

For subtraction, we present a subtraction image that is:

a) optimal for transient detection under the assumption of spatially uniform noise.

b) sufficient for any further statistical decision on the differences between the images, including the identification of cosmic rays and other image artifacts.

c) Free of subtraction artifacts, allowing (for the first time) robust transient identification in real time, opening new avenues for scientific exploration.

d) orders of magnitude faster than past subtraction methods.

I will present an elementary proof of the following theorem of Alexander Olshanskii:

Let F be a free group and let A,B be finitely generated subgroups of infinite index in F. Then there exists an infinite index subgroup C of F which contains both A and a finite index subgroup of B.

The proof is carried out by introducing a 'profinite' measure on the discrete group F, and is valid also for some groups which are not free.Some applications of this result will be discussed:

1. Group Theory - Construction of locally finite faithful actions of countable groups.

2. Number Theory - Discontinuity of intersections for large algebraic extensions of local fields.

3. Ergodic Theory - Establishing cost 1 for groups boundedly generated by subgroups of infinite index and finite cost.

A common approach to face recognition relies on using deep learning for extracting a signature. All leading work on the subject use stupendous amounts of processing power and data. In this work we present a method for efficient and compact learning of metric embedding. The core idea allows a more accurate estimation of the global gradient and hence fast and robust convergence. In order to avoid the need for huge amounts of data we include an explicit alignment phase into the network, hence greatly reducing the number of parameters. These insights allow us to efficiently train a compact deep learning model for face recognition in only 12 hours on a single GPU, which can then fit a mobile device.

Joint work with: Oren Tadmor, Tal Rosenwein, Shai Shalev-Schwartz, Amnon Shashua

Expander graphs are widely studied, and various methods are known to obtain bounded degree expander graphs. Recently, there is a growing interest in understanding combinational expansion in higher dimensions (higher dimensional simplicial complexes). However, bounded degree combinatorial expanders (random or explicit) were not known till our work.

We present a local to global criterion on a complex that implies combinatorial expansion. We use our criterion to present explicit bounded degree high dimensional expanders. This solves in the affirmative an open question raised by Gromov, who asked whether bounded degree high dimensional expanders could at all exist.

We expect that the emerging theory of high dimensional expansion is likely to have various application in the theory of computation. Thus, one of the goals of this talk in to introduce this concept to the theory community.

Based on joint works with David Kazhdan and Alex Lubotzky, and with Shai Evra.

Dynamic events such as family gatherings, concerts or sports events are often photographed by a group of people. The set of still images obtained this way is rich in dynamic content. We consider the question of whether such a set of still images, rather than traditional video sequences, can be used for analyzing the dynamic content of the scene. This talk will describe several instances of this problem, their solutions and directions for future studies.

In particular, we will present a method to extend epipolar geometry to predict location of a moving feature in CrowdCam images. The method assumes that the temporal order of the set of images, namely photo-sequencing, is given. We will briefly describe our method to compute photo-sequencing using geometric considerations and rank aggregation. We will also present a method for identifying the moving regions in a scene, which is a basic component in dynamic scene analysis. Finally, we will consider a new vision of developing collaborative CrowdCam, and a first step toward this goal.

This talk will be based on joint works with Tali Dekel, Adi Dafni, Mor Dar, Lior Talked, Ilan Shimshoni, and Shai Avidan.

Projection inequalities bound the volume of a body in terms of a product of the volumes of lower-dimensional projections of the body. Two well-known examples are the Loomis-Whitney inequality and the more general Uniform Cover inequality. We will see how to use information theory to prove stability versions of these inequalities, showing that when they are close to being tight, the body in question is close to being a box (which is the unique case of equality). We will also see how to obtain a stability result for the edge-isoperimetric inequality in the infinite d-dimensional lattice. Namely, that a subset of Z^d with small edge-boundary must be close in symmetric difference to a d-dimensional cube.

Based on work with David Ellis, Ehud Friedgut and Guy Kindler.

It is common practice in multivariate and matrix-valued data analysis to reduce dimensionality by performing a Singular Value Decomposition or Principal Component Analysis, and keeping only $r$ singular values or principal components, the rest being presumably associated with noise. However, the literature does not propose a disciplined criterion to determine $r$; most practitioners still look for the ``elbow in the Scree Plot'', a 50-years-old heuristic performed by eye. I'll review a line of work which develops a systematic approach to eigenvalue and singular value thresholding. This approach assumes that the signal is low-rank and that the noise is rotationally invariant. Recent results derive optimal thresholds in the presence of quite general noise distributions.

Joint work with David Donoho, Iain Johnstone and Edgar Dobriban (Stanford).

Let K be a complete discrete valuation field with finite residue field of characteristic p>0. Let G be the absolute Galois group of K and for a natural M, let G(M) be the maximal quotient of G of nilpotent class <p and period p^M. Then G(M) can be identified with a group obtained from a Lie Z/p^M-algebra L via (truncated) Campbell-Hausdorff composition law. Under this identification the ramification subgroups in upper numbering G(M)^(v)correspond to ideals L^(v) of L. It will be explained an explicit construction of L and the ideals L^(v). The case of fields K of characteristic p was obtained by the author in 1990's (recently refined), the case of fields K of mixed characteristic requires the assumption that K contains a primitive p^M-th root of unity (for the case M=1 cf. Number Theory Archive).

We show that a version of the classical Iterated Log Law of Khinchin, and independently of Kolmogorov from the 1920's, holds for various parameters in the binomial random graph model and in a random 0/1 Bernoulli matrix. In particular, for a constant p, we show that such a law holds for the number of copies of a fixed graph H in G(n,p), we show a similar statement for the number of Hamilton cycles in a random k-uniform hypergraph, provided that k\geq 4. In the graph case (that is, k=2), since the number of Hamilton cycles in G(n,p), denoted by X_n, does not converge to a normal distribution but rather tends to a log-normal distribution (as has been first proved by Janson), we show that a version of the Iterated Log Law holds for \log X_n. We also obtain similar result for the permanent of a 0/1 bernouli random matrix.

No prior knowledge is required.

Joint with Daniel Motealegre and Van Vu.

Heuristically, delocalization for a random matrix means that its normalized eigenvectors look like the vectors uniformly distributed over the unit sphere. This can be made precise in a number of different ways. We show that with high probability, any sufficiently large set of coordinates of an eigenvector carries a non-negligible portion of its Euclidean norm. Our results pertain to a large class of random matrices including matrices with independent entries, symmetric, skew-symmetric matrices, as well as more general ensembles.

Joint work with Roman Vershynin.

Raz's celebrated Parallel Repetition Theorem shows that the probability of simultaneously winning n independent instances of a two-player one-round game G is exponentially small in n, when the maximum success probability of G is less than 1. Though the statement is intuitive, the proof is rather nontrivial and has found important application in hardness of approximation, cryptography, and communication complexity.

There are two major open problems regarding the parallel repetition of games: does an analogue of Raz's theorem hold for (a) games with more than two players, and (b) games with quantumly entangled players? Extending Raz’s theorem to these settings is a challenging problem for a number of reasons: techniques for attacking direct sum/direct product problems in multiparty settings are lacking, and our understanding of quantum entanglement as an information theoretic resource is quite limited.

In this work, we show to sidestep these barriers and make progress on the two open problems. We first prove exponential-decay parallel repetition theorems for a class of games we called "anchored games" in the multiplayer and entangled-player settings. Then, we show how to efficiently transform any game into an equivalent anchored game. Together, our results provide a simple hardness-amplification technique for games in both the classical multiplayer and quantum settings.

Joint work with Mohammad Bavarian and Thomas Vidick.

Erdös-Ko-Rado (EKR) type theorems yield upper bounds on the size of set families under various intersection requirements on the elements. Stability versions of such theorems assert that if the size of a family is close to the maximum possible then the family itself must be close (in appropriate sense) to a maximum family. In this talk we present an approach to stability versions of EKR-type theorems through isoperimetric inequalities for subsets of the hypercube. We use this approach to obtain tight stability versions of the EKR theorem itself and of the Ahlswede-Khachatrian theorem on t-intersecting families (for k < n/(t+1)), and to show that, somewhat surprisingly, both theorems hold when the "intersection" requirement is replaced by a much weaker requirement. Furthermore, we obtain stability versions of several recent EKR-type results, including Frankl's proof of the Erdös matching conjecture for n>(2s+1)k-s.

Joint work with David Ellis and Noam Lifshitz.

We study the discrepancy of the number of visits of a Kronicker sequence on a d dimensional torus to nice sets. We are interested in particular in the question how the answer depends on the geometry of the set.

This is a joint work with Bassam Fayad.

(http://arxiv.org/abs/1211.4323 and http://arxiv.org/abs/1206.4853)

Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) initiated the formal study of this problem, and gave the first upper bounds on the achievable generalization error for adaptive data analysis.

The results of Dwork et al. are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that stable algorithms of the kind guaranteed by differential privacy imply low generalization error. We also show that weaker stability guarantees such as bounded KL divergence and total variation distance lead to correspondingly weaker generalization guarantees.

Joint work with Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, and Jonathan Ullman.

For the past 40 years computer scientists generally believed that NP-complete problems are intractable. In particular, Boolean satisfiability (SAT), as a paradigmatic NP-complete problem, has been considered to be intractable. Over the past 20 years, however, there has been a quiet, but dramatic, revolution, and very large SAT instances are now being solved routinely as part of software and hardware design.

In this talk I will review this amazing development and show that we can leverage SAT solving to accomplish other Boolean reasoning tasks. Counting the number of satisfying truth assignments of a given Boolean formula or sampling such assignments uniformly at random are fundamental computational problems in computer science with numerous applications. While the theory of these problems has been thoroughly investigated in the 1980s, approximation algorithms developed by theoreticians do not scale up to industrial-sized instances. Algorithms used by the industry offer better scalability, but give up certain correctness guarantees to achieve scalability. We describe a novel approach, based on universal hashing and Satisfiability Modulo Theory, that scales to formulas with hundreds of thousands of variable without giving up correctness guarantees.

Homological algebra plays a major role in noncommutative ring theory. One important homological construct related to a noncommutative ring A is the dualizing complex, which is a special kind of complex of A-bimodules. When A is a ring containing a central field K, this concept is well-understood now. However, little is known about dualizing complexes when the ring A does not contain a central field (I shall refer to this as the noncommutative arithmetic setting). The main technical issue is finding the correct derived category of A-bimodules.

In this talk I will propose a promising definition of the derived category of A-bimodules in the noncommutative arithmetic setting. Here A is a (possibly) noncommutative ring, central over a commutative base ring K (e.g. K = Z). The idea is to resolve A: we choose a DG (differential graded) ring A', central and flat over K, with a DG ring quasi-isomorphism A' -> A. Such resolutions exist. The enveloping DG ring A'^{en} is the tensor product over K of A' and its opposite. Our candidate for the "derived category of A-bimodules" is the category D(A'^{en}), the derived category of DG A'^{en}-modules. A recent theorem says that the category D(A'^{en}) is independent of the resolution A', up to a canonical equivalence. This justifies our definition.

Working within D(A'^{en}), it is not hard to define dualizing complexes over A, and to prove all their expected properties (like when K is a field). We can also talk about rigid dualizing complexes in the noncommutative arithmetic setting.

What is noticeably missing is a result about existence of rigid dualizing complexes. When the K is a field, Van den Bergh had discovered a powerful existence result for rigid dualizing complexes. We are now trying to extend Van den Bergh's method to the noncommutative arithmetic setting. This is work in progress, joint with Rishi Vyas.

In this talk I will explain, in broad strokes, what are DG rings, DG modules, and the associated derived categories and derived functors. Also, I will try to go into the details of a few results and examples, to give the flavor of this material.

Stochastic analysis of real-world signals consists of 3 main parts: mathematical representation; probabilistic modeling; statistical inference. For it to be effective, we need mathematically-principled and practical computational tools that take into consideration not only each of these components by itself but also their interplay. This is especially true for a large class of computer-vision and machine-learning problems that involve certain mathematical structures; the latter may be a property of the data or encoded in the representation/model to ensure mathematically-desired properties and computational tractability. For concreteness, this talk will center on structures that are geometric, hierarchical, or topological.

Structures present challenges. For example, on nonlinear spaces, most statistical tools are not directly applicable, and, moreover, computations can be expensive. As another example, in mixture models, topological constraints break statistical independence. Once we overcome the difficulties, however, structures offer many benefits. For example, respecting and exploiting the structure of Riemannian manifolds and/or Lie groups yield better probabilistic models that also support consistent synthesis. The latter is crucial for the employment of analysis-by-synthesis inference methods used within, e.g., a generative Bayesian framework. Likewise, imposing a certain structure on velocity fields yields highly-expressive diffeomorphisms that are also simple and computationally tractable; particularly, this facilitates MCMC inference, traditionally viewed as too expensive in this context.

Time permitting, throughout the talk I will also briefly touch upon related applications such as statistical shape models, transfer learning on manifolds, image warping/registration, time warping, superpixels, 3D-scene analysis, nonparametric Bayesian clustering of spherical data, multi-metric learning, and new machine-learning applications of diffeomorphisms. Lastly, we also applied the (largely model-based) ideas above to propose the first learned data augmentation scheme; as it turns out, when compared with the state-of-the-art schemes, this improves the performance of classifiers of the deep-net variety.

I will describe recent work on building and using rich representations aimed at automatic analysis of visual scenes. In particular, I will describe methods for semantic segmentation (labeling regions of an image according to the category it belongs to), and on semantic boundary detection (recovering accurate boundaries of semantically meaningful regions, such as those corresponding to objects). We focus on feed-forward architectures for these tasks, leveraging recent advances in the art of training deep neural networks. Our approach aims to shift the burden of inducing desirable constraints from explicit structure in the model to implicit structure inherent in computing richer, context-aware representations. I will describe experiments on standard benchmark data sets that demonstrate the success of this approach.

Joint work with Mohammadreza Mostajabi, Payman Yadollahpour, and Harry Yang.

Many questions about properties of a finite group such as random walks, spectrum of Cayley graphs, distribution of word maps etc., can be approached by using “generalized Fourier sum” formulas involving characters of the group. Numerical data show that characters of low dimensional representations of the group contribute the largest terms to these sums. However, relatively little seems to be known about these small representations so a systematic knowledge of them could lead to proofs of some of the properties. The talk will demonstrates, through concrete examples, and numerical simulations, a new method to construct and analyze those small representations, and hence hopefully to solve some of the aforementioned questions.

The talk is intended for non-experts.

This is part from a joint project with Roger Howe (Yale).

Many sequence prediction tasks---such as automatic speech recognition and video analysis---benefit from long-range temporal features. One way of utilizing long-range information is through segmental (semi-Markov) models such as segmental conditional random fields. Such models have had some success, but have been constrained by the computational needs of considering all possible segmentations. We have developed new segmental models with rich features based on neural segment embeddings, trained with discriminative large-margin criteria, that are efficient enough for first-pass decoding. In our initial work with these models, we have found that they can outperform frame-based HMM/deep network baselines on two disparate tasks, phonetic recognition and sign language recognition from video. I will present the models and their results on these tasks, as well as (time permitting) related recent work on neural segmental acoustic word embeddings.

This is joint work with Hao Tang, Weiran Wang, Herman Kamper, Taehwan Kim, and Kevin Gimpel

Finite group theorists have established many formulas that express interesting properties of a finite group in terms of sums of characters of the group. An obstacle to applying these formulas is lack of control over the dimensions of representations of the group. In particular, the representations of small dimension tend to contribute the largest terms to these sums, so a systematic knowledge of these small representations could lead to proofs of some of these facts. This talk will discuss a new method for systematically constructing the small representations of finite classical groups. I will explain the method with concrete examples and applications.

This is part from a joint project with Roger Howe (Yale).

Many sequence prediction tasks---such as automatic speech recognition and video analysis---benefit from long-range temporal features. One way of utilizing long-range information is through segmental (semi-Markov) models such as segmental conditional random fields. Such models have had some success, but have been constrained by the computational needs of considering all possible segmentations. We have developed new segmental models with rich features based on neural segment embeddings, trained with discriminative large-margin criteria, that are efficient enough for first-pass decoding. In our initial work with these models, we have found that they can outperform frame-based HMM/deep network baselines on two disparate tasks, phonetic recognition and sign language recognition from video. I will present the models and their results on these tasks, as well as (time permitting) related recent work on neural segmental acoustic word embeddings.

This is joint work with Hao Tang, Weiran Wang, Herman Kamper, Taehwan Kim, and Kevin Gimpel

Recently, the "information percolation" framework was introduced as a way to obtain sharp estimates on mixing for spin systems at high temperatures, and in particular, to establish cutoff for the Ising model in three dimensions up to criticality from a worst starting state. I will describe how this method can be used to understand the effect of different initial states on the mixing time, both random (''warm start'') and deterministic.

Joint work with Allan Sly.

Let *X* be a compact manifold with the boundary ∂ *X* and *R *(λ) be a Dirichlet-to-Neumann operator: *R *(λ): *f* → *u*|_{∂X} where *u* solves ( _{}+ λ^{2}) *u *= 0, *u*|_{∂X} = *f* . We establish asymptotics as λ→ + ∞ of the number of eigenvalues of λ^{-1 }*R* (λ) between s_{1} and s_{2}.

This is a joint work with Andrew Hassell, Australian National University.

In today’s world there are huge amounts of data that need to get reliably stored or transmitted. However, some amount of noise or corruption is inevitable. An error-correcting code is a scheme for robustly representing data in the form of a codeword that allows one to detect and correct errors in transmission. Locally-testable and locally-decodable codes are special families of error-correcting codes that admit highly efficient algorithms that detect and correct errors in sublinear time with high probability, probing only a small number of entries of the corrupted codeword. While locally-testable and locally-decodable codes have been intensely studied in the past 2 decades, in recent years there has been even further incentive for their study due to their relevance for transmission and storage of massive data and the successful implementation of local codes in cloud storage systems.

In this talk, I will show an exponential improvement on the best-known running time of error detection and correction algorithms for locally-testable and locally-decodable codes. Specifically, I will describe new families of locally-testable codes with constant rate that can detect a constant fraction of errors in time (log n)^{O(log log n)} and new families of locally-decodable codes of constant rate that can correct a constant fraction of errors in time exp(\sqrt{log n}). Prior to that, the best known running time for such codes was n^{epsilon} (for a constant epsilon) using several, quite different, constructions.

(Based on joint work with Swastik Kopparty, Or Meir and Shubhangi Saraf)

Locality-Sensitive Hashing (LSH) is a powerful technique for the approximate nearest neighbor search (ANN) in high dimensions. In this talk I will present two recent results:

1) I will show a data structure for ANN for the Euclidean distance that provably outperforms the best possible LSH-based data structure. We proceed via designing a good *data-dependent* hash family.

2) I will show a practical and optimal LSH family for the cosine similarity (a.k.a. Euclidean distance on a sphere). It substantially outperforms the celebrated Hyperplane LSH family. Along the way, I will try to debunk two popular myths about LSH:

* LSH-based data structures consume too much memory and are thus impractical;

* Optimal LSH constructions are too complicated to be made practical.

The talk is based on two papers: arXiv: 1501.01062 (joint with Alexandr Andoni, STOC 2015) and arXiv: 1509.02897 (joint with Alexandr Andoni, Piotr Indyk, Thijs Laarhoven and Ludwig Schmidt, NIPS 2015).

We show techniques for decreasing the error probability of randomized algorithms and for converting randomized algorithms to deterministic (non-uniform) algorithms. Unlike most existing techniques that involve repetition of the randomized algorithm and hence a slowdown, our techniques produce algorithms with a similar run-time to the original randomized algorithms.

The amplification technique is related to a certain stochastic multi-armed bandit problem. The derandomization technique -- which is the main contribution of this work -- points to an intriguing connection between derandomization and sketching/sparsification.

We demonstrate the techniques by showing applications to max-cut on dense graphs, approximate clique, constraint satisfaction problems on dense bipartite graphs, and list decoding to unique decoding for Reed-Muller code.

This is joint work with Ofer Grossman.

By a Theorem of Aaronson, normalized Birkhoff sums of positive integrable functions in infinite, ergodic systems either tend to 0 almost surely or there is a subsequence along which every further subsequence tends to infinity. This is not true for normalized symmetric Birkhoff sums where the summation is along a symmetric time interval as there are examples of infinite, ergodic systems for which the absolutely normalized symmetric Birkhoff sums of positive integrable functions may be almost surely bounded away from zero and infinity. In this talk I will start by explaining a variety of transformations (of different nature) satisfying this phenomena, discuss the case main result that the absolutely normalized, symmetric Birkhoff sums of positive integrable functions in infinite, ergodic systems never converge point-wise and there even exists a universal divergence statement. Time permits I will show some examples of actions of other groups which converge and some recent (yet unwritten) results on actions by commuting skew products which are related to self intersection local times.

The contents of this talk are a combination of 3 papers, one of which is a joint work with Benjamin Weiss and Jon Aaronson and another one is work in progress with Jon Aaronson.

Parameter estimation is performed by fitting data measurements to a model using Bayesian statistics, assuming additional prior information. The estimation requires a numerical solution of large scale optimization problem, whose objective traditionally includes data fidelity and regularization terms. In this talk I will present numerical solution methods for two such estimation problems.

In the first part of the talk I will concentrate on parameter estimation of physical models, obtained by solving optimization problems that are constrained by partial differential equations (PDEs). I will focus on my recent work on 3D Full Waveform Inversion, which arises in seismic exploration of oil and gas reservoirs, earth sub-surface mapping, ultrasound imaging and more. I will demonstrate how to computationally treat this inverse problem, and improve its solution by using travel time tomography in a joint inversion framework. This includes efficient algorithms for the solution of the Helmholtz and eikonal equations (the two associated PDEs), and a parallel software framework for applying these algorithms for the joint inversion using a Gauss Newton algorithm.

In the second part of the talk, I will consider the estimation of large scale sparse inverse covariance matrices of multivariate Gaussian distribution. Such matrices are often used to characterize and analyze data measurements in fields that range from machine learning, signal processing and computational biology. To estimate these matrices, an l1 regularized log-determinant optimization problem needs to be solved. I will present a block-coordinate descent algorithm that can efficiently solve this problem at large scales with low memory footprint, and a multilevel acceleration framework that is suitable for general sparse optimization problems. These algorithms can be used as a tool for enriching inverse problems by "learning" appropriate prior information, adopting an empirical Bayesian framework.

Line defects appear in the microscopic structure of crystalline materials (e.g. metals) as well as liquid crystals, the latter an intermediate phase of matter between liquids and solids. Mathematically, their study is challenging since they correspond to topological singularities that result in blow-up of total energies of finite bodies when utilizing most commonly used classical models of energy density; as a consequence, formulating nnonlinear dynamical models (especially pde) for the representation and motion of such defects is a challenge as well. I will discuss the development and implications of a single pde model intended to describe equilibrium states and dynamics of these defects. The model alleviates the nasty singularities mentioned above and it will also be shown that incorporating a conservation law for the topological charge of line defects allows for the correct prediction of some important features of defect dynamics that would not be possible just with the knowledge of an energy function.

This is joint work with Chiqun Zhang, Dmitry Golovaty, and Noel Walkington.

The class P attempts to capture the efficiently solvable computational tasks. It is full of practically relevant problems, with varied and fascinating combinatorial structure.

In this talk, I will give an overview of a rapidly growing body of work that seeks a better understanding of the structure within P. Inspired by NP-hardness, the main tool in this approach are combinatorial reductions. Combining these reductions with a small set of plausible conjectures, we obtain tight lower bounds on the time complexity of many of the most important problems in P.

I will present the most recent landscape of P and the conjectures on which this project is based on (e.g. the Strong Exponential Time Hypothesis). I will discuss recent attempts on identifying new conjectures: either more reliable ones, or ones that will get us closer to a full classification of the important problems in P.

Finally, I will highlight a surprising new reduction from Circuit-SAT to natural problems in P like Edit-Distance which proves that minor improvements over the quadratic running time of Edit-Distance are enough to prove major complexity separations.

In many practical parameter estimation problems, such as medical experiments and cognitive radio communications, parameter selection is performed prior to estimation. The selection process has a major impact on subsequent estimation by introducing a selection bias and creating coupling between decoupled parameters. As a result, classical estimation theory may be inappropriate and inaccurate and a new methodology is needed. In this study, the problem of estimating a preselected unknown deterministic parameter, chosen from a parameter set based on a predetermined data-based selection rule, \Psi, is considered. In this talk, I present a general non-Bayesian estimation theory for estimation after parameter selection, includes estimation methods, performance analysis, and adaptive sampling strategies. First, I use the post-selection mean-square-error (PSMSE) criterion as a performance measure instead of the commonly used mean-square-error (MSE). The corresponding Cramér-Rao-type bound on the PSMSE of any \Psi-unbiased estimator is derived, where the \Psi -unbiasedness is in the Lehmann-unbiasedness sense. The post-selection maximum-likelihood (PSML) estimator is presented and its \Psi–efficiency properties are demonstrated. Practical implementations of the PSML estimator are proposed as well. Finally, I discuss the concept of adaptive sampling in a two-sampling stages scheme of selection and estimation.

Transmission rates in broadband optical waveguide systems are enhanced by launching many pulse sequences through the same waveguide. Since pulses from different sequences propagate with different group velocities, intersequence pulse collisions are frequent, and can lead to severe transmission degradation. On the other hand, the energy exchange in pulse collisions can be beneficially used for controlling the transmission.

In this work we show that collision-induced amplitude dynamics of soliton sequences of N perturbed coupled nonlinear Schrödinger (NLS) equations can be described by N-dimensional Lotka-Volterra (LV) models, where the model's form depends on the perturbation. To derive the LV models, we first carry out single-collision analysis, which is based on the method of eigenmode expansion with the eigenmodes of the linear operator describing small perturbations about the fundamental NLS soliton. We use stability and bifurcation analysis for the equilibrium points of the LV models to develop methods for achieving robust transmission stabilization and switching that work well for a variety of waveguides. Further enhancement of transmission stability is obtained in waveguides with a narrowband Ginzburg-Landau gain-loss profile. We also discuss the possibility to use the relation between NLS and LV models to realize transition to spatio-temporal chaos with NLS solitons.

The Max-Cut problem seeks to determine the maximal cut size in a given graph. With no polynomial-time efficient approximation for Max-Cut (unless *P=NP*), its asymptotic for a typical large sparse graph is of considerable interest. We prove that for uniformly random d-regular graph of N vertices, and for the uniformly chosen Erdos-Renyi graph of *M=Nd/2* edges, the leading correction to *M/2 *(the typical cut size), is *P∗sqrt(NM/2)*. Here *P∗* is the ground state energy of the Sherrington-Kirkpatrick model, expressed analytically via Parisi's formula.

This talk is based on a joint work with Subhabrata Sen and Andrea Montanari.

The goal of my talk (based on joint work with Dima Grigoriev, Anatol Kirillov, and Gleb Koshevoy) is to generalize the celebrated Robinson-Schensted-Knuth (RSK) bijection between the set of matrices with nonnegative integer entries, and the set of the planar partitions.

Namely, for any pair of injective valuations on an integral domain we construct a canonical bijection K, which we call the generalized RSK, between the images of the valuations, i.e., between certain ordered abelian monoids.

Given a semisimple or Kac-Moody group, for each reduced word ii=(i_1,...,i_m) for a Weyl group element we produce a pair of injective valuations on C[x_1,...,x_m] and argue that the corresponding bijection K=K_ii, which maps the lattice points of the positive octant onto the lattice points of a convex polyhedral cone in R^m, is the most natural generalization of the classical RSK and, moreover, K_ii can be viewed as a bijection between Lusztig and Kashiwara parametrizations of the dual canonical basis in the corresponding quantum Schubert cell.

Generalized RSKs are abundant in "nature", for instance, any pair of polynomial maps phi,psi:C^m-->C^m with dense images determines a pair of injective valuations on C[x_1,...,x_n] and thus defines a generalized RSK bijection K_{phi,psi} between two sub-monoids of Z_+^m.

When phi and psi are birational isomorphisms, we expect that K_{phi,psi} has a geometric "mirror image", i.e., that there is a rational function f on C^m whose poles complement the image of phi and psi so that the tropicalization of the composition psi^{-1}phi along f equals to K_{phi,psi}. We refer to such a geometric data as a (generalized) geometric RSK, and view f as a "super-potential". This fully applies to each ii-RSK situation, and we find a super-potential f=f_ii which helps to compute K_ii.

While each K_ii has a "crystal" flavor, its geometric (and mirror) counterpart f_ii emerges from the cluster twist of the relevant double Bruhat cell studied by Andrei Zelevinsky, David Kazhdan, and myself.

We consider the fundamental problem of prediction with expert advice where the experts are "optimizable": there is a black-box optimization oracle that can be used to compute, in constant time, the leading expert in retrospect at any point in time. In this setting, we give a novel online algorithm that attains vanishing regret with respect to $N$ experts in total $\sqrt{N}$ computation time. We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is linear in $N$. These results demonstrate an exponential gap between the power of optimization in online learning and its power in statistical learning: in the latter, an optimization oracle---i.e., an efficient empirical risk minimizer---allows to learn a finite hypothesis class of size $N$ in time $\log{N}$.

We also study the implications of our results to learning in repeated zero-sum games, in a setting where the players have access to oracles that compute, in constant time, their best-response to any mixed strategy of their opponent. We show that the runtime required for approximating the minimax value of the game in this setting is $\sqrt{N}$, yielding again a quadratic improvement upon the oracle-free setting, where linear time in $N$ is known to be tight.

Periodically driven systems are of immense interest in plasma physics both from the point of view of plasma confinement as well as plasma heating.

One of the models to explain plasma heating in capacitive RF discharges is Fermi acceleration, which consists of a particle moving in a dynamical billiard with oscillating boundaries. It is well known that the energy growth rate of an ensemble of particles in a strongly chaotic billiard with moving walls is quadratic-in-time whereas it can be exponential-in-time in billiards with multiple ergodic components. Since a real plasma device allows for an exchange of particles with the surroundings, we have now studied Fermi accelerators with a hole (small enough so as not to disturb the statistics). We find that energy gain is significantly higher in a leaky Fermi accelerators with multiple ergodic components and it can be further increased by shrinking the hole size. In the ergodic case, energy gain is found to be independent of the hole size. Work done jointly with V. Gelfreich, V. Rom-Kedar and D. Turaev [Physical Review E 91, 062920 (2015)].

Paul trap is a device used to confine electrons by using time-periodic spatially non-uniform electric fields and a Nobel Prize as awarded for its discovery in 1989. The time-averaged distribution function of plasma in such devices is usually modelled using the concept of an effective potential (ponderomotive theory). For a specific example of the electric field used in Paul traps, we had shown earlier that the exact solutions of the Vlasov equation (collisionless Boltzmann equation) do not agree with solutions obtained by the effective potential approach. Now we have been able to obtain a perturbative solution of the Vlasov equation for a much more general case and find the same discrepancy with conventional theory. These perturbative solutions represent a non-equilibrium steady state and further work needs to be done to understand their statistical evolution. Work done jointly with B. Srinivasan [arXiv:1510.03974].

We study the two-party communication complexity of the geometric problem of finding an approximate Brouwer fixed-point of a composition of two Lipschitz functions g*f, where Alice knows f and Bob knows g.

We prove an essentially tight communication lower bound on this problem, using a novel adaptation of the Raz-McKenzie simulation theorem into geometric settings.

We show that a slightly stronger version of this communication problem would imply an (essentially) tight communication lower bounds on the problem of finding an approximate Nash equilibrium in 2-player (and n-player) games, where each player initially knows only his own payoff matrix.

Joint work with Tim Roughgarden.

We consider random polynomials of degree n whose coefficients are i.i.d. distributed over a finite set of integers, with probability at most 1/2 to take any particular value. We show that the probability that such a polynomial of degree n has a double root is dominated by the probability that 0,1 or -1 are double roots up to an error of o(n^{−2}). Our result generalizes a similar result of Peled, Sen and Zeitouni for Littlewood polynomials.

Joint work with Ron Peled and Arnab Sen.

"Circular inference" is a pejorative coined for methods in which a hypothesis is selected after looking at the data, but the inferential procedures treat it as if it was determined in advance. Unfortunately, many throughput screening experiments in genomics or neuroimaging seek to do exactly this: identify regions (bumps) of high signal in the data **and** evaluate these found regions using the same data. Simple estimators that ignore the selection will be biased; when the data is non-stationary, this bias can vary dramatically between different regions. Nevertheless, methods for evaluating and comparing selected regions are crucial, because typically only a handful of regions can be further explored in tailored follow up studies.

In this talk I describe a new conditional inference approach for characterizing these found regions by estimating their population parameters. Our method explicitly models the selection procedure, and simulates from the conditional distribution to estimate the underlying parameters. Efficient strategies for providing p-value, estimators and intervals will be discussed, as well as power versus accuracy tradeoffs. I will demonstrate the new method for estimating bumps in a comparison of DNA-methylation patterns across tissue type.

This is joint work with Jonathan Taylor and Rafael Irizarry.

In this talk I will consider the problem of local analytic classification of powers of volume forms on manifolds with boundary, i.e. of ordinary volume forms multiplied by the (complex in general) power of a function f, under the action of the group of diffeomorpshims preserving both the boundary and the hypersurface defined by the zero locus of f. In the case where this function defines an isolated boundary singularity in the sense of Arnol'd, I will show how to obtain local normal forms and moduli theorems, analogous to those obtained by Arnol'd, Varchenko, Lando and others for the ordinary, without boundary case. Moreover I will show how these moduli are related to (in fact obtained by) the topological and analytic (Hodge theoretic) invariants of the boundary singularity, such as the relative Picard-Lefschetz monodromy, the relative Brieskorn lattices with their relative Gauss-Manin connection, the relative spectrum and so on, all objects generalising, in the presence of a boundary, the corresponding well known objects already defined for isolated hypersurface singularities.

Online d-dimensional vector packing models many settings such as minimizing resources in data centers where jobs have multiple resource requirements (CPU, Memory, etc.). However, no online d-dimensional vector packing algorithm can achieve a competitive ratio better than d. Fortunately, in many natural applications, vectors are relatively small, and thus the lower bound does not hold. For sufficiently small vectors, an O(log d)-competitive algorithm was known. We improve this to a constant competitive ratio, arbitrarily close to e (where e is the base of the natural logarithm), given that vectors are sufficiently small.

We give improved results for the two dimensional case. For arbitrarily small vectors, the First Fit algorithm for two dimensional vector packing is no better than 2-competitive. We present a natural family of First Fit variants, and for optimized parameters get a competitive ratio of approximately 1.48 for sufficiently small vectors.

We improve upon the 1.48 competitive ratio - not via a First Fit variant - and give a competitive ratio arbitrarily close to 4/3 for packing small, two dimensional vectors. We show that no algorithm can achieve better than a 4/3 competitive ratio for two dimensional vectors, even if one allows the algorithm to split vectors among arbitrarily many bins.

It has long been conjectured that hypothesis spaces suitable for data that is compositional in nature, such as text or images, may be more efficiently represented with deep hierarchical architectures than with shallow ones. Despite the vast empirical evidence, formal arguments to date are limited and do not capture the kind of networks used in practice. Using tensor factorization, we derive a universal hypothesis space implemented by an arithmetic circuit over functions applied to local data structures (e.g. image patches). The resulting networks first pass the input through a representation layer, and then proceed with a sequence of layers comprising sum followed by product-pooling, where sum corresponds to the widely used convolution operator. The hierarchical structure of networks is born from factorizations of tensors based on the linear weights of the arithmetic circuits. We show that a shallow network corresponds to a rank-1 decomposition, whereas a deep network corresponds to a Hierarchical Tucker (HT) decomposition. Log-space computation for numerical stability transforms the networks into SimNets.

In its basic form, our main theoretical result shows that the set of polynomially sized rank-1 decomposable tensors has measure zero in the parameter space of polynomially sized HT decomposable tensors. In deep learning terminology, this amounts to saying that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require an exponential size if one wishes to implement (or approximate) them with a shallow network. Our construction and theory shed new light on various practices and ideas employed by the deep learning community, and in that sense bear a paradigmatic contribution as well.

Joint work with Or Sharir and Amnon Shashua.

The notion of a Lie conformal algebra (LCA) comes from physics, and is related to the operator product expansion. An LCA is a module over a ring of differential operators with constant coefficients, and with a bracket which may be seen as a deformation of a Lie bracket. LCA are related to linearly compact differential Lie algebras via the so-called annihilation functor. Using this observation and the Cartan's classification of linearly compact simple Lie algebras, Bakalov, D'Andrea and Kac classified finite simple LCA in 2000.

I will define the notion of LCA over a ring R of differential operators with not necessarily constant coefficients, extending the known one for R=K[x]. I will explain why it is natural to study such an object and will suggest an approach for the classification of finite simple LCA over arbitrary differential fields.

Much of the theory of mathematical programs for combinatorial optimization can be described in the following way: A polytope of interest has exponentially many (in the dimension) facets, but can be written as the linear projection of a simpler convex body in a higher-dimensional space. Simple might mean a polytope with a much smaller number of facets, or a spectrahedron (the intersection of an affine subspace with the PSD cone) of small dimension. This allows one to optimize linear functionals over the base polytope by instead optimizing a lifted functional over the lifted body.

Unless P=NP, one does not expect certain polytopes--like the convex hull of indicators of traveling salesman tours in a graph--to have a small lift. But it remained open to prove any non-trivial lower bound on the necessary dimension for a spectrahedral lift, i.e. to prove that semi-definite programs do not yield efficient optimization procedures over these polytopes.

We show that the cut, TSP, and stable set polytopes on n-vertex graphs are not the linear image of a spectrahedron of dimension less than exp(n^c) for some constant c > 0. In the process, many interesting phenomena emerge: Factorization of operators through the PSD cone, quantum information theory, discrete Fourier analysis, and real algebraic geometry.

This is based joint work with Prasad Ragahvendra and David Steurer.

It is well-known that Hecke algebras H_q(W) do not have interesting Hopf algebra structures because, first, the only available one would emerge only via an extremely complicated isomorphism with the group algebra of W and, second, this would make H_q(W) into yet another cocommutative Hopf algebra.

The goal of my talk (based on joint work with D. Kazhdan) is to extend each Hecke algebra H_q(W) to a non-cocommutative Hopf algebra (we call it Hecke-Hopf algebra of W) that contains H_q(W) as a coideal.

Our Hecke-Hopf algebras have a number of applications: they generalize Bernstein presentation of Hecke algebras, provide new solutions of quantum Yang-Baxter equation and a large category of endo-functors of H_q(W)-Mod, and suggest further generalizations of Hecke algebras.

In view of the recent huge interest in image classification and object recognition problems and the spectacular success of deep learning and random forests in solving these tasks, it seems astonishing that much more modest efforts are being invested into related, and often more difficult, problems of image and multimodal content-based retrieval, and, more generally, similarity assessment in large-scale databases. These problems, arising as primitives in many computer vision tasks, are becoming increasingly important in the era of exponentially increasing information. Semantic and similarity-preserving hashing methods have recently received considerable attention to address such a need, in part due to their significant memory and computational advantage over other representations.

In this talk, I will overview some of my recent attempts to construct efficient semantic hashing schemes based on deep neural networks and random forests.

Based on joint works with Qiang Qiu, Guillermo Sapiro, Michael Bronstein, and Jonathan Masci.

I will present a 'categorical' way of doing analytic geometry in which analytic geometry is seen as a precise analogue of algebraic geometry. Our approach works for both complex analytic geometry and p-adic analytic geometry in a uniform way. I will focus on the idea of an 'open set' as used in these various areas of math and how it is characterised categorically. In order to do this, we need to study algebras and their modules in the category of Banach spaces. The categorical characterization that we need uses homological algebra in these 'quasi-abelian' categories which is work of Schneiders and Prosmans. In fact, we work with the larger category of Ind-Banach spaces for reasons I will explain. This gives us a way to establish foundations of analytic geometry and to compare with the standard notions such as the theory of affinoid algebras, Grosse-Klonne's theory of dagger algebras (over-convergent functions), the theory of Stein domains and others. I will explain how this extends to a formulation of derived analytic geometry following the relative algebraic geometry approach of Toen, Vaquie and Vezzosi.

This is joint work with Federico Bambozzi (Regensburg) and Kobi Kremnizer (Oxford).

An Oblivious RAM (ORAM), introduced by Goldreich and Ostrovsky (JACM 1996), is a (probabilistic) RAM that hides its access pattern, i.e. for every input the observed locations accessed are similarly distributed. Great progress has been made in recent years in minimizing the overhead of ORAM constructions, with the goal of obtaining the smallest overhead possible.

We revisit the lower bound on the overhead required to obliviously simulate programs, due to Goldreich and Ostrovsky. While the lower bound is fairly general, including the offline case, when the simulator is given the reads and writes ahead of time, it does assume that the simulator behaves in a "balls and bins" fashion. That is, the simulator must act by shuffling data items around, and is not allowed to have sophisticated encoding of the data.

We prove that for the OFFLINE case, showing a lower bound without the above restriction is related to the size of the circuits for sorting. Our proof is constructive, and uses a bit-slicing approach which manipulates the bit representations of data in the simulation. This implies that without obtaining yet unknown superlinear lower bounds on the size of such circuits, we cannot hope to get lower bounds on offline (unrestricted) ORAMs.

What is it that enables learning with multi-layer networks? What causes the network to generalize well? What makes it possible to optimize the error, despite the problem being hard in the worst case? In this talk I will attempt to address these questions and relate between them, highlighting the important role of optimization in deep learning. I will then use the insight to suggest studying novel optimization methods, and will present Path-SGD, a novel optimization approach for multi-layer RELU networks that yields better optimization and better generalization.

Joint work with Behnam Neyshabur, Ryota Tomioka and Russ Salakhutdinov.

We study properties of affine algebras with small Gel'fand-Kirillov dimension, from the points of view of the prime spectrum, gradations and radical theory.

As an application, we are able to prove that Z-graded algebras with quadratic growth, and graded domains with cubic growth have finite (and efficiently bounded) classical Krull dimension; this is motivated by Artin's conjectured geometric classification of non-commutative projective surfaces, and by opposite examples in the non-graded case.

As another application, we prove a graded version of a dichotomy question raised by Braun and Small, between primitive algebras (namely, algebras admitting faithful irreducible representations) and algebras satisfying polynomial identities.

If time permits, we discuss approximations of the well-studied Koethe problem and in particular prove a stability result for certain radicals under suitable growth conditions.

We finally propose further questions and possible directions, which already stimulated new constructions of monomial algebras.

This talk is partially based on a joint work with A. Leroy, A. Smoktunowicz and M. Ziembowski.

The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence that for a given text and pattern, one cannot compute the exact Hamming distance for all locations in the text in time which is polynomially less than o(n\sqrt m). Nevertheless, Karloff showed that if one is willing to suffer a 1+-\epsilon approximation, then it is possible to solve the problem with high probability in O~(n / \epsilon^2) time.

Due to related lower bounds for computing the Hamming distance of two strings in the one-way communication complexity model, it is strongly believed that obtaining an algorithm for solving the approximation version cannot be done much faster as a function of 1 / \epsilon. We will show that this belief is false by introducing a new O~(n / \epsilon) time algorithm that succeeds with high probability.

The main idea behind our algorithm, which is common in sparse recovery problems, is to reduce the variance of a specific randomized experiment by (approximately) separating heavy hitters from non-heavy hitters. However, while known sparse recovery techniques work very well on vectors, they do not seem to apply here, where we are dealing with mismatches between pairs of characters. We introduce two main algorithmic ingredients. The first is a new sparse recovery method that applies for pair inputs (such as in our setting). The second is a new construction of hash/projection functions, which allows to count the number of projections that induce mismatches between two characters exponentially faster than brute force. We expect that these algorithmic techniques will be of independent interest.