## Select Seminar Series

All seminars- Home
- ›
- Studying at the Faculty
- ›
- Seminars ›
- Previous Seminars

# Previous Seminars

Genome-wide forces of mutation and selection create local and global sequence patterns that carry information on functional role of genomic DNA. I will describe algorithmic approaches to genome analysis that decode sequence patterns and identify features of structural gene organization and gene expression (e.g. leaderless transcription). The algorithms make unsupervised inference of the structure of underlying graphical model and its parameters.

The subsequently developed software tools were, among other applications, used at NCBI (Bethesda, MD) to annotate and re-annotate 2,500+ fungal genomes and 130,000+ prokaryotic genomes. Yet another algorithm was employed by DOE JGI (Walnut Creek, CA) to annotate the largest collection of metagenomes.

Speaker s short Bio

Mark Borodovsky, PhD, a Regents' Professor at the Joint Georgia Tech and Emory University Department of Biomedical Engineering and the School of Computational Science and Engineering. Since 1990 when he arrived to Atlanta from Moscow, Dr Borodovsky had also joint faculty appointments at the School of Biology and the School of Mathematics. Dr. Borodovsky is a Founder of the Georgia Tech Bioinformatics graduate program.

Recent success stories of using machine learning for diagnosing skin cancer, diabetic retinopathy, pneumonia, and breast cancer may give the impression that artificial intelligence (AI) is on the cusp of radically changing all aspects of health care. However, many of the most important problems, such as predicting disease progression, personalizing treatment to the individual, drug discovery, and finding optimal treatment policies, all require a fundamentally different way of thinking. Specifically, these problems require a focus on *causality* rather than simply prediction. Motivated by these challenges, my lab has been developing several new approaches for causal inference from observational data. In this talk, I describe our recent work on the deep Markov model (Krishnan, Shalit, Sontag AAAI '17) and TARNet (Shalit, Johansson, Sontag, ICML '17).

Recent success stories of using machine learning for diagnosing skin cancer, diabetic retinopathy, pneumonia, and breast cancer may give the impression that artificial intelligence (AI) is on the cusp of radically changing all aspects of health care. However, many of the most important problems, such as predicting disease progression, personalizing treatment to the individual, drug discovery, and finding optimal treatment policies, all require a fundamentally different way of thinking. Specifically, these problems require a focus on *causality* rather than simply prediction. Motivated by these challenges, my lab has been developing several new approaches for causal inference from observational data. In this talk, I describe our recent work on the deep Markov model (Krishnan, Shalit, Sontag AAAI '17) and TARNet (Shalit, Johansson, Sontag, ICML '17).

In formal verification, one uses mathematical tools in order to prove that a system satisfies a given specification.

In this talk I consider three limitations of traditional formal verification:

The first is the fact that treating systems as finite-state machines is problematic, as systems become more complex, and thus we must consider infinite-state machines.

The second is that Boolean specifications are not rich enough to specify properties required by today's systems. Thus, we need to go beyond the Boolean setting.

The third concerns certifying the results of verification: while a verifier may answer that a system satisfies its specification, a designer often needs some palpable evidence of correctness.

I will present several works addressing the above limitations, and the mathematical tools involved in overcoming them.

No prior knowledge is assumed. Posterior knowledge is guaranteed.

Today's technological world is increasingly dependent upon the reliability, robustness, quality of service and timeliness of networks including those of power distribution, financial, transportation, communication, biological, and social. For the time-critical functionality in transferring resources and information, a key requirement is the ability to adapt and reconfigure in response to structural and dynamic changes, while avoiding disruption of service and catastrophic failures. We will outline some of the major problems for the development of the necessary theory and tools that will permit the understanding of network dynamics in a multiscale manner.

Many interesting networks consist of a finite but very large number of nodes or agents that interact with each other. The main challenge when dealing with such networks is to understand and regulate the collective behavior. Our goal is to develop mathematical models and optimization tools for treating the ** Big Data** nature of large scale networks while providing the means to understand and regulate the collective behavior and the dynamical interactions (short and long-range) across such networks.

The key mathematical technique will be based upon the use optimal mass transport theory and resulting notions of curvature applied to weighted graphs in order to characterize network robustness. Examples will be given from biology, finance, and transportation.

Assume that $(X,f)$ is a dynamical system and $\phi$ is a real non negative potential such that the corresponding $f$-invariant measure $\mu_\phi$ measure is infinite. Under assumptions of good inducing schemes, we give conditions under which the pressure of $f$ for a perturbed potential $\phi+s\psi$ relates to the pressure of the induced system term.

This extends some of Sarig's results to the setting of infinite "equilibrium states".

In addition, limit properties of the family of measures $\mu_{\phi+s\psi}$ as $s\to 0$ are studied and statistical properties (e.g. correlation coefficients) under the limit measure are derived. I will discuss several examples.

This is based on joint work with H. Bruin and M. Todd.

We study the non-vanishing gradient-like vector fields $v$ on smooth compact manifolds $X$ with boundary. We call such fields traversing. With the help of a boundary generic field $v$, we divide the boundary $\d X$ of $X$ into two complementary compact manifolds, $\d^+X(v)$ and $\d^-X(v)$. Then we introduce the causality map $C_v: \d^+X(v) \to \d^-X(v)$, a distant relative of the Poincare return map. Let $\mathcal F(v)$ denote the oriented 1-dimensional foliation on $X$, produced by a traversing $v$-flow.

Our main result, the Holography Theorem, claims that, for boundary generic traversing vector fields $v$, the knowledge of the causality map $C_v$ is allows for a reconstruction of the pair $(X, \mathcal F(v))$, up to a homeomorphism $\Phi: X \to X$ which is the identity on the boundary $\d X$. In other words, for a massive class of ODE's, we show that the topology of their solutions, satisfying a given boundary value problem, is rigid. We call these results ``holographic" since the $(n+1)$-dimensional $X$ and the un-parameterized dynamics of the flow on it are captured by a single correspondence $C_v$ between two $n$-dimensional screens, $\d^+X(v)$ and $\d^-X(v)$.

This holography of traversing flows has numerous applications to the dynamics of general flows. Time permitting, we will discuss some applications of the Holography Theorem to the geodesic flows and the inverse scattering problems on Riemannian manifolds with boundary.

Detecting genetic adaptation in recent human history is a major challenge of population genetics. The fundamental problem is to infer historical changes in the frequency of genetic variants (alleles), from data of contemporary genomes. With this we can identify unusual changes that are unlikely to have occurred in the absence of selective pressures. However, a generally applicable method to infer recent allele frequency changes is lacking. Instead, present methods can only detect frequency changes under very restrictive assumptions on the model of selection. Moreover, their time resolution is generally limited to prehistoric scales, on the order of the past 25-75 thousand years. To address these gaps we developed a novel statistical method, Singleton Density Score (SDS), that infers the recent changes in allele frequencies from local variation in contemporary genome sequences with specificity to historical timescales. Applied to data of ~3000 genomes from the UK10K project, SDS reveals that human genetic adaptation continued well into historical times. Over the past ~2000-3000 years, ancestors of modern Britons genetically evolved over a range of phenotypes related to diet, immunity, and physical appearance. Notably, we found that polygenic adaptation, whereby selection acting on many small-effect variants across the genome that together determine a single trait, has played a pervasive, previously undetected role in shaping present genotypic and phenotypic variation.

Reference:

Field et al, Science 2016, Detection of human adaptation during the past 2000 years. https://www.ncbi.nlm.nih.gov/pubmed/27738015

The proliferation of data collection in the health, commercial, and economic spheres, brings with it opportunities for extracting new knowledge with concrete policy implications. Examples include individualizing medical practices based on electronic healthcare records, and understanding the implications of job training programs on employment and income.

The scientific challenge lies in the fact that standard prediction models such as supervised machine learning are often not enough for decision making from this so-called "observational data": Supervised learning does not take into account causality, nor does it account for the feedback loops that arise when predictions are turned into actions. On the other hand, existing causal-inference methods are not adapted to dealing with the rich and complex data now available, and often focus on populations, as opposed to individual-level effects.

The problem is most closely related to reinforcement learning and bandit problems in machine learning, but with the important property of having no control over experiments and no direct access to the actor's model.

In my talk I will discuss how we apply recent ideas from machine learning to individual-level causal-inference and action. I will introduce a novel generalization bound for estimating individual-level treatment effect, and further show how we use representation learning and deep temporal generative models to create new algorithms geared towards this problem. Finally, I will show experimental results using data from electronic medical records and data from a job training program.

Everyone wants to program with "high-level concepts", rather than meddle with the fine details of the implementation, such as pointers, network packets, and asynchronous callbacks. This is usually achieved by introducing layers of abstraction - but every layer incurs some overhead, and when they accumulate, this overhand becomes significant and sometimes prohibitive. Optimizing the program often requires to break abstractions, which leads to suboptimal design, higher maintenance costs, and subtle hard-to-trace bugs.

I will present two recent projects that attempt to address this difficulty. STNG is an automated lifting compiler that can synthesize high-level graphics code for computing stencils over matrices, from low-level legacy code written in Fortran. Its output is expressed in Halide, a domain-specific language for image processing that can take advantage of modern GPUs. The result is therefore code that is both easier to understand and more efficient than the original code.

Bellmania is a specialized tool for accelerating dynamic-programming algorithms by generating recursive divide-and-conquer implementations of them. Recursive divide-and-conquer is an algorithmic technique that was developed to obtain better memory locality properties. Bellmania includes a high-level language for specifying dynamic programming algorithms and a calculus that facilitates gradual transformation of these specifications into efficient implementations. These transformations formalize the divide-and-conquer technique; a visualization interface helps users to interactively guide the process, while an SMT-based back-end verifies each step and takes care of low-level reasoning required for parallelism.

The lesson is that synthesis techniques are becoming mature enough to play a role in the design and implementation of realistic software systems, by combining the elegance of abstractions with the performance gained by optimizing and tuning the fine details.

Cancer is a highly complex aberrant cellular state where mutations impact a multitude of signalling pathways operating in different cell types. In recent years it has become apparent that in order to understand and fight cancer, it must be viewed as a system, rather than as a set of cellular activities. This mind shift calls for new techniques that will allow us to investigate cancer as a holistic system. In this talk, I will discuss some of the progress made towards achieving such a system-level understanding using computer modelling of biological behaviours, also known as Executable Biology. I will concentrate on our recent attempts to better understand cancer through the following examples: 1) drug target optimization for Chronic Myeloid Leukaemia using an innovative platform called BioModelAnalyzer, which allows to prove stabilization of biological systems; 2) dynamic hybrid modelling of Glioblastoma (brain tumour) development; 3) state-based modelling of cancer signalling pathways and their analysis using model-checking; and 4) synthesis of blood stem cell programs from single-cell gene expression data. Looking forward, inspired by David Harel’s Grand Challenge proposed a decade ago, I will propose a smaller grand challenge for computing and biology that could shed new light on our ability to control cell fates during development and disease and potentially change the way we treat cancer in the future.

Let M be a closed Riemannian manifold. There are numerous results that establish the existence of various minimal objects on M, such as periodic geodesics, minimal surfaces, or geodesic nets. We will present some effective versions of these existence theorems.

For example, we will present diameter upper bounds for the lengths of three shortest simple periodic geodesics on a Riemannian 2-sphere, which can be interpreted as an effective version of the existence theorem of Lusternik and Schnirelmann. (Joint with Y. Liokumovich and A. Nabutovsky).

Finding upper bounds for the size of smallest stationary objects is closely related with construction of "optimal" homotopies. We will show that if M is a closed surface of diameter d (with or without boundary), then any simple closed curve on M that can be contracted to a point over free loops of length less than L, can be contracted over based loops of length at most 3L+2d. (Joint with G. Chambers).