The genetic diversity of the gut microbiota has a central role in host health. Here, we created pangenomes for 728 human gut prokaryotic species, quadrupling the genes of strain-specific genomes. Each of these species has a core set of a thousand genes, differing even between closely related species, and an accessory set of genes unique to the different strains. Functional analysis shows high strain variability associates with sporulation, whereas low variability is linked with antibiotic resistance. We further map the antibiotic resistome across the human gut population and find 237 cases of extreme resistance even to last-resort antibiotics, with a predominance among Enterobacteriaceae. Lastly, the presence of specific genes in the microbiota relates to host age and sex. Our study underscores the genetic complexity of the human gut microbiota, emphasizing its significant implications for host health. The pangenomes and antibiotic resistance map constitute a valuable resource for further research.
Vanegas S. M., Curado S., Gujral A., Valverde G., Parraga S., Aleman J. O., Reid M., Elbel B., Schmidt A. M., Heffron S. P., Segal E., Li H., Abrams C., Sevick M. A., Popp C., Armijos E., Merriwether E. N., Ivezaj V., Ren-Fielding C., Parikh M. & Jay M.
(2024)
BMJ Open.
14,
8,
e081201.
Purpose We developed a comprehensive sleeve gastrectomy (SG) weight loss study cohort and biorepository to uncover mechanisms, biomarkers and predictive factors of weight loss, weight maintenance and amelioration of obesity-related comorbidities. For this purpose, we collected psychosocial, anthropometric, clinical data and a variety of samples pre-surgery, intraoperatively and 1.5, 3, 12 and 24 months post-surgery. For longer-term assessment, the collection of psychosocial and anthropometric data was extended to 10 years. Here, we present in-depth characterisation of the cohort and detailed overview of study procedures as a foundation for future analyses. Participants We consented 647 participants between June 2017 and March 2020 from two bariatric surgery clinics in New York City - one major urban hospital and one private hospital. Of 355 participants who provided baseline data, 300 underwent SG. Of these, 79% are females with an average age of 38 years, 68% are Hispanic, 20% are non-Hispanic Black and 11% are non-Hispanic White. Findings to date We collected intraoperative adipose and stomach tissues from 282 patients and biosamples (blood, urine, saliva, stool) from 245 patients at 1.5 months, 238 at 3 month, 218 at 12 months and 180 at 24 months post-surgery. We are currently collecting anthropometric and psychosocial data annually until 10 years post-surgery. Data analysis is currently underway. Future plans Our future research will explore the variability in weight loss outcomes observed in our cohort, particularly among Black and Hispanic patients in comparison to their White counterparts. We will identify social determinants of health, metabolic factors and other variables that may predict weight loss success, weight maintenance and remission of obesity-related comorbidities. Additionally, we plan to leverage our biorepository for collaborative research studies. We will complete long-term follow-up data by December 2031. We plan to apply for funding to expand biosample collection through year 10 to provide insights into the mechanisms of long-term weight maintenance.
Berube L. T., Popp C. J., Curran M., Hu L., Pompeii M. L., Barua S., Bernstein E., Salcedo V., Li H., St-Jules D. E., Segal E., Bergman M., Williams N. J. & Sevick M. A.
(2024)
Trials.
25,
506.
Background: The Diabetes Telemedicine Mediterranean Diet (DiaTeleMed) Study is a fully remote randomized clinical trial evaluating personalized dietary management in individuals with type 2 diabetes (T2D). The study aims to test the efficacy of a personalized behavioral approach for dietary management of moderately controlled T2D, versus a standardized behavioral intervention that uses one-size-fits-all dietary recommendations, versus a usual care control (UCC). The primary outcome will compare the impact of each intervention on the mean amplitude of glycemic excursions (MAGE). Methods: Eligible participants are between 21 and 80 years of age diagnosed with moderately controlled T2D (HbA1c: 6.0 to 8.0%) and managed on lifestyle alone or lifestyle plus metformin. Participants must be willing and able to attend virtual counseling sessions and log meals into a dietary tracking smartphone application (DayTwo), and wear a continuous glucose monitor (CGM) for up to 12 days. Participants are randomized with equal allocation (n = 255, n = 85 per arm) to one of three arms: (1) Personalized, (2) Standardized, or (3) UCC. Measurements occur at 0 (baseline), 3, and 6 months. All participants receive isocaloric energy and macronutrient targets to meet Mediterranean diet guidelines, in addition to 14 intervention contacts over 6 months (4 weekly then 10 biweekly) to cover diabetes self-management education. The first 4 UCC intervention contacts are delivered via synchronous videoconferences followed by educational video links. Participants in Standardized receive the same educational content as those in the UCC arm, following the same schedule. However, all intervention contacts are conducted via synchronous videoconferences, paired with Social Cognitive Theory (SCT)-based behavioral counseling, plus dietary self-monitoring of planned meals using a mobile app that provides real-time feedback on calories and macronutrients. Participants in the Personalized arm receive all elements of the Standardized intervention, in addition to real-time feedback on predicted post-prandial glycemic response (PPGR) to meals and snacks logged into the mobile app. Discussion: The DiaTeleMed Study aims to address an important gap in the current landscape of precision nutrition by determining the contributions of behavioral counseling and personalized nutrition recommendations on glycemic control in individuals with T2D. The fully remote methodology of the study allows for scalability and innovative delivery of personalized dietary recommendations at a population level. Trial registration: ClinicalTrials.gov NCT05046886. Registered on September 16, 2021.
Sekeresova Kralova J., Donic C., Dassa B., Livyatan I., Jansen P. M., Ben-Dor S., Fidel L., Trzebanski S., Narunsky-Haziza L., Asraf O., Brenner O., Dafni H., Jona G., Boura-Halfon S., Stettner N., Segal E., Brunke S., Pilpel Y., Straussman R., Zeevi D., Bacher P., Hube B., Shlezinger N. & Jung S.
(2024)
Journal of Experimental Medicine.
221,
5,
e20231686.
The mycobiota are a critical part of the gut microbiome, but hostfungal interactions and specific functional contributions of commensal fungi to host fitness remain incompletely understood. Here, we report the identification of a new fungal commensal, Kazachstania heterogenica var. weizmannii, isolated from murine intestines. K. weizmannii exposure prevented Candida albicans colonization and significantly reduced the commensal C. albicans burden in colonized animals. Following immunosuppression of C. albicans colonized mice, competitive fungal commensalism thereby mitigated fatal candidiasis. Metagenome analysis revealed K. heterogenica or K. weizmannii presence among human commensals. Our results reveal competitive fungal commensalism within the intestinal microbiota, independent of bacteria and immune responses, that could bear potential therapeutic value for the management of C. albicansmediated diseases.
Shilo S., Keshet A., Rossman H., Godneva A., Talmor-Barkan Y., Aviv Y. & Segal E.
(2024)
Nature Medicine.
30,
5,
p. 1424-1431
Plasma fasting glucose (FG) levels play a pivotal role in the diagnosis of prediabetes and diabetes worldwide. Here we investigated FG values using continuous glucose monitoring (CGM) devices in nondiabetic adults aged 4070 years. FG was measured during 59,565 morning windows of 8,315 individuals (7.16 ± 3.17 days per participant). Mean FG was 96.2 ± 12.87 mg dl−1, rising by 0.234 mg dl−1 per year with age. Intraperson, day-to-day variability expressed as FG standard deviation was 7.52 ± 4.31 mg dl−1. As there are currently no CGM-based criteria for diabetes diagnosis, we analyzed the potential implications of this variability on the classification of glycemic status based on current plasma FG-based diagnostic guidelines. Among 5,328 individuals who would have been considered to have normal FG based on the first FG measurement, 40% and 3% would have been reclassified as having glucose in the prediabetes and diabetes ranges, respectively, based on sequential measurements throughout the study. Finally, we revealed associations between mean FG and various clinical measures. Our findings suggest that careful consideration is necessary when interpreting FG as substantial intraperson variability exists and highlight the potential impact of using CGM data to refine glycemic status assessment.
Woller A., Tamir Y., Bar A., Mayo A., Rein M., Godneva A., Cohen N. M., Segal E., Toledano Y., Shilo S., Gonze D. & Alon U.
(2024)
BioRxiv.
Prediabetes, a subclinical state of high glucose, carries a risk of transition to diabetes. One cause of prediabetes is insulin resistance, which impairs the ability of insulin to control blood glucose. However, many individuals with high insulin resistance retain normal glucose due to compensation by enhanced insulin secretion by beta cells. Individuals seem to differ in their maximum compensation level, termed beta cell carrying capacity, such that low carrying capacity is associated with a higher risk of prediabetes and diabetes. Carrying capacity has not been quantified using a mathematical model and cannot be estimated directly from measured glucose and insulin levels in patients, unlike insulin resistance and beta cell function which can be estimated using HOMA-IR and HOMA-B formula.Here we present a mathematical model of beta cell compensation and carrying capacity, and develop a new formula called HOMA-C to estimate it from glucose and insulin measurements. HOMA-C estimates the maximal potential beta cell function of an individual, rather than the current beta cell function. We test this approach using longitudinal cohorts of prediabetic people, finding 10-fold variation in carrying capacity. Low carrying capacity is associated with higher risk of transitioning to diabetes. We estimate the timescales of beta cell compensation and insulin resistance using large datasets, showing that, unlike previous mathematical models, the new model can explain the slow rise in glucose over decades. Our mathematical understanding of beta cell carrying capacity may help to assess the risk of prediabetes in each individual.Competing Interest StatementThe authors have declared no competing interest.
Shilo S. & Segal E.
(2024)
Nature Reviews Endocrinology.
20,
p. 73-74
Over the past decade, technological advances have enabled cost-efficient, high-throughput analysis of different types of omics data in large human cohorts. Here, we explore insights into the pathophysiology of metabolic disorders revealed through multi-omics studies, discuss novel computational analysis techniques and look at the fields future directions.
Routy B., Jackson T., Mählmann L., Baumgartner C. K., Blaser M., Byrd A., Corvaia N., Couts K., Davar D., Derosa L., Hang H. C., Hospers G., Isaksen M., Kroemer G., Malard F., McCoy K. D., Meisel M., Pal S., Ronai Z., Segal E., Sepich-Poore G. D., Shaikh F., Sweis R. F., Trinchieri G., van den Brink M., Weersma R. K., Whiteson K., Zhao L., McQuade J., Zarour H. & Zitvogel L.
(2024)
Cancer Cell.
42,
1,
p. 16-34
Over the last decade, the composition of the gut microbiota has been found to correlate with the outcomes of cancer patients treated with immunotherapy. Accumulating evidence points to the various mechanisms by which intestinal bacteria act on distal tumors and how to harness this complex ecosystem to circumvent primary resistance to immune checkpoint inhibitors. Here, we review the state of the microbiota field in the context of melanoma, the recent breakthroughs in defining microbial modes of action, and how to modulate the microbiota to enhance response to cancer immunotherapy. The host-microbe interaction may be deciphered by the use of \u201comics\u201d technologies, and will guide patient stratification and the development of microbiota-centered interventions. Efforts needed to advance the field and current gaps of knowledge are also discussed.
Levine Z., Kalka I., Kolobkov D., Rossman H., Godneva A., Shilo S., Keshet A., Weissglas-Volkov D., Shor T., Diament A., Talmor-Barkan Y., Aviv Y., Sharon T., Weinberger A. & Segal E.
(2024)
Med.
5,
1,
p. 90-101.e4
Background: Genome-wide association studies (GWASs) associate phenotypes and genetic variants across a study cohort. GWASs require large-scale cohorts with both phenotype and genetic sequencing data, limiting studied phenotypes. The Human Phenotype Project is a longitudinal study that has measured a wide range of clinical and biomolecular features from a self-assignment cohort over 5 years. The phenotypes collected are quantitative traits, providing higher-resolution insights into the genetics of complex phenotypes. Methods: We present the results of GWASs and polygenic risk score phenome-wide association studies with 729 clinical phenotypes and 4,043 molecular features from the Human Phenotype Project. This includes clinical traits that have not been previously associated with genetics, including measures from continuous sleep monitoring, continuous glucose monitoring, liver ultrasound, hormonal status, and fundus imaging. Findings: In GWAS of 8,706 individuals, we found significant associations between 169 clinical traits and 1,184 single-nucleotide polymorphisms. We found genes associated with both glycemic control and mental disorders, and we quantify the strength of genetic signals in serum metabolites. In polygenic risk score phenome-wide association studies for clinical traits, we found 16,047 significant associations. Conclusions: The entire set of findings, which we disseminate publicly, provides newfound resolution into the genetic architecture of complex human phenotypes. Funding: E.S. is supported by the Minerva foundation with funding from the Federal German Ministry for Education and Research and by the European Research Council and the Israel Science Foundation.
Zahavi L., Lavon A., Reicher L., Shoer S., Godneva A., Leviatan S., Rein M., Weissbrod O., Weinberger A. & Segal E.
(2023)
Nature Medicine.
29,
11,
p. 2785-2792
Genome-wide association studies (GWASs) have provided numerous associations between human single-nucleotide polymorphisms (SNPs) and health traits. Likewise, metagenome-wide association studies (MWASs) between bacterial SNPs and human traits can suggest mechanistic links, but very few such studies have been done thus far. In this study, we devised an MWAS framework to detect SNPs and associate them with host phenotypes systematically. We recruited and obtained gut metagenomic samples from a cohort of 7,190 healthy individuals and discovered 1,358 statistically significant associations between a bacterial SNP and host body mass index (BMI), from which we distilled 40 independent associations. Most of these associations were unexplained by diet, medications or physical exercise, and 17 replicated in a geographically independent cohort. We uncovered BMI-associated SNPs in 27 bacterial species, and 12 of them showed no association by standard relative abundance analysis. We revealed a BMI association of an SNP in a potentially inflammatory pathway of Bilophila wadsworthia as well as of a group of SNPs in a region coding for energy metabolism functions in a Faecalibacterium prausnitzii genome. Our results demonstrate the importance of considering nucleotide-level diversity in microbiome studies and pave the way toward improved understanding of interpersonal microbiome differences and their potential health implications.
Shoer S., Shilo S., Godneva A., Ben-Yacov O., Rein M., Wolf B. C., Lotan-Pompan M., Bar N., Weiss E. I., Houri-Haddad Y., Pilpel Y., Weinberger A. & Segal E.
(2023)
Nature Communications.
14,
5384.
Diabetes and associated comorbidities are a global health threat on the rise. We conducted a six-month dietary intervention in pre-diabetic individuals (NCT03222791), to mitigate the hyperglycemia and enhance metabolic health. The current work explores early diabetes markers in the 200 individuals who completed the trial. We find 166 of 2,803 measured features, including oral and gut microbial species and pathways, serum metabolites and cytokines, show significant change in response to a personalized postprandial glucose-targeting diet or the standard of care Mediterranean diet. These changes include established markers of hyperglycemia as well as novel features that can now be investigated as potential therapeutic targets. Our results indicate the microbiome mediates the effect of diet on glycemic, metabolic and immune measurements, with gut microbiome compositional change explaining 12.25% of serum metabolites variance. Although the gut microbiome displays greater compositional changes compared to the oral microbiome, the oral microbiome demonstrates more changes at the genetic level, with trends dependent on environmental richness and species prevalence in the population. In conclusion, our study shows dietary interventions can affect the microbiome, cardiometabolic profile and immune response of the host, and that these factors are well associated with each other, and can be harnessed for new therapeutic modalities.
Kharmats A. Y., Popp C., Hu L., Berube L., Curran M., Wang C., Pompeii M. L., Li H., Bergman M., St-Jules D. E., Segal E., Schoenthaler A., Williams N., Schmidt A. M., Barua S. & Sevick M. A.
(2023)
American Journal of Clinical Nutrition.
118,
2,
p. 443-451
Background: Recent studies have demonstrated considerable interindividual variability in postprandial glucose response (PPGR) to the same foods, suggesting the need for more precise methods for predicting and controlling PPGR. In the Personal Nutrition Project, the investigators tested a precision nutrition algorithm for predicting an individual's PPGR. Objective: This study aimed to compare changes in glycemic variability (GV) and HbA1c in 2 calorie-restricted weight loss diets in adults with prediabetes or moderately controlled type 2 diabetes (T2D), which were tertiary outcomes of the Personal Diet Study. Methods: The Personal Diet Study was a randomized clinical trial to compare a 1-size-fits-all low-fat diet (hereafter, standardized) with a personalized diet (hereafter, personalized). Both groups received behavioral weight loss counseling and were instructed to self-monitor diets using a smartphone application. The personalized arm received personalized feedback through the application to reduce their PPGR. Continuous glucose monitoring (CGM) data were collected at baseline, 3 mo and 6 mo. Changes in mean amplitude of glycemic excursions (MAGEs) and HbA1c at 6 mo were assessed. We performed an intention-to-treat analysis using linear mixed regressions. Results: We included 156 participants [66.5% women, 55.7% White, 24.1% Black, mean age 59.1 y (standard deviation (SD) = 10.7 y)] in these analyses (standardized = 75, personalized = 81). MAGE decreased by 0.83 mg/dL per month for standardized (95% CI: 0.21, 1.46 mg/dL; P = 0.009) and 0.79 mg/dL per month for personalized (95% CI: 0.19, 1.39 mg/dL; P = 0.010) diet, with no between-group differences (P = 0.92). Trends were similar for HbA1c values. Conclusions: Personalized diet did not result in an increased reduction in GV or HbA1c in patients with prediabetes and moderately controlled T2D, compared with a standardized diet. Additional subgroup analyses may help to identify patients who are more likely to benefit from this personalized intervention. This trial was registered at clinicaltrials.gov as NCT03336411.
Ben-Yacov O., Godneva A., Rein M., Shilo S., Lotan-Pompan M., Weinberger A. & Segal E.
(2023)
Gut.
72,
8,
p. 1486-1496
Objective: To explore the interplay between dietary modifications, microbiome composition and host metabolic responses in a dietary intervention setting of a personalised postprandial-targeting (PPT) diet versus a Mediterranean (MED) diet in pre-diabetes. Design: In a 6-month dietary intervention, adults with pre-diabetes were randomly assigned to follow an MED or PPT diet (based on a machine-learning algorithm for predicting postprandial glucose responses). Data collected at baseline and 6 months from 200 participants who completed the intervention included: dietary data from self-recorded logging using a smartphone application, gut microbiome data from shotgun metagenomics sequencing of faecal samples, and clinical data from continuous glucose monitoring, blood biomarkers and anthropometrics. Results: PPT diet induced more prominent changes to the gut microbiome composition, compared with MED diet, consistent with overall greater dietary modifications observed. Particularly, microbiome alpha-diversity increased significantly in PPT (p=0.007) but not in MED arm (p=0.18). Post hoc analysis of changes in multiple dietary features, including food-categories, nutrients and PPT-adherence score across the cohort, demonstrated significant associations between specific dietary changes and species-level changes in microbiome composition. Furthermore, using causal mediation analysis we detect nine microbial species that partially mediate the association between specific dietary changes and clinical outcomes, including three species (from Bacteroidales, Lachnospiraceae, Oscillospirales orders) that mediate the association between PPT-adherence score and clinical outcomes of hemoglobin A1c (HbA1c), high-density lipoprotein cholesterol (HDL-C) and triglycerides. Finally, using machine-learning models trained on dietary changes and baseline clinical data, we predict personalised metabolic responses to dietary modifications and assess features importance for clinical improvement in cardiometabolic markers of blood lipids, glycaemic control and body weight. Conclusions: Our findings support the role of gut microbiome in modulating the effects of dietary modifications on cardiometabolic outcomes, and advance the concept of precision nutrition strategies for reducing comorbidities in pre-diabetes. Trial registration number: NCT03222791.
Bourgonje A. R., Andreu-Sánchez S., Vogl T., Hu S., Vich Vila A., Gacesa R., Leviatan S., Kurilshikov A., Klompus S., Kalka I. N., van Dullemen H. M., Weinberger A., Visschedijk M. C., Festen E. A., Faber K. N., Wijmenga C., Dijkstra G., Segal E., Fu J., Zhernakova A. & Weersma R. K.
(2023)
Immunity.
56,
6,
p. 1393-1409.e6
Inflammatory bowel diseases (IBDs), e.g., Crohn's disease (CD) and ulcerative colitis (UC), are chronic immune-mediated inflammatory diseases. A comprehensive overview of an IBD-specific antibody epitope repertoire is, however, lacking. Using high-throughput phage-display immunoprecipitation sequencing (PhIP-Seq), we identified antibodies against 344,000 antimicrobial, immune, and food antigens in 497 individuals with IBD compared with 1,326 controls. IBD was characterized by 373 differentially abundant antibody responses (202 overrepresented and 171 underrepresented), with 17% shared by both IBDs, 55% unique to CD, and 28% unique to UC. Antibody reactivities against bacterial flagellins dominated in CD and were associated with ileal involvement, fibrostenotic disease, and anti-Saccharomyces cerevisiae antibody positivity, but not with fecal microbiome composition. Antibody epitope repertoires accurately discriminated CD from controls (area under the curve [AUC] = 0.89), and similar discrimination was achieved when using only ten antibodies (AUC = 0.87). Individuals with IBD thus show a distinct antibody repertoire against selected peptides, allowing clinical stratification and discovery of immunological targets.
Andreu-Sánchez S., Bourgonje A. R., Vogl T., Kurilshikov A., Leviatan S., Ruiz-Moreno A. J., Hu S., Sinha T., Vich Vila A., Klompus S., Kalka I. N., de Leeuw K., Arends S., Jonkers I., Withoff S., Brouwer E., Weinberger A., Wijmenga C., Segal E. & Weersma R. K.
(2023)
Immunity.
56,
6,
p. 1376-1392
Phage-displayed immunoprecipitation sequencing (PhIP-seq) has enabled high-throughput profiling of human antibody repertoires. However, a comprehensive overview of environmental and genetic determinants shaping human adaptive immunity is lacking. In this study, we investigated the effects of genetic, environmental, and intrinsic factors on the variation in human antibody repertoires. We characterized serological antibody repertoires against 344,000 peptides using PhIP-seq libraries from a wide range of microbial and environmental antigens in 1,443 participants from a population cohort. We detected individual-specificity, temporal consistency, and co-housing similarities in antibody repertoires. Genetic analyses showed the involvement of the HLA, IGHV, and FUT2 gene regions in antibody-bound peptide reactivity. Furthermore, we uncovered associations between phenotypic factors (including age, cell counts, sex, smoking behavior, and allergies, among others) and particular antibody-bound peptides. Our results indicate that human antibody epitope repertoires are shaped by both genetics and environmental exposures and highlight specific signatures of distinct phenotypes and genotypes.
Despite its rising prevalence, diabetes diagnosis still relies on measures from blood tests. Technological advances in continuous glucose monitoring (CGM) devices introduce a potential tool to expand our understanding of glucose control and variability in people with and without diabetes. Yet CGM data have not been characterized in large-scale healthy cohorts, creating a lack of reference for CGM data research. Here we present CGMap, a characterization of CGM data collected from over 7,000 non-diabetic individuals, aged 4070 years, between 2019 and 2022. We provide reference values of key CGM-derived clinical measures that can serve as a tool for future CGM research. We further explored the relationship between CGM-derived measures and diabetes-related clinical parameters, uncovering several significant relationships, including associations of mean blood glucose with measures from fundus imaging and sleep monitoring. These findings offer novel research directions for understanding the influence of glucose levels on various aspects of human health.
Cardiometabolic diseases are a major public-health concern owing to their increasing prevalence worldwide. These diseases are characterized by a high degree of interindividual variability with regards to symptoms, severity, complications and treatment responsiveness. Recent technological advances, and the growing availability of wearable and digital devices, are now making it feasible to profile individuals in ever-increasing depth. Such technologies are able to profile multiple health-related outcomes, including molecular, clinical and lifestyle changes. Nowadays, wearable devices allowing for continuous and longitudinal health screening outside the clinic can be used to monitor health and metabolic status from healthy individuals to patients at different stages of disease. Here we present an overview of the wearable and digital devices that are most relevant for cardiometabolic-disease-related readouts, and how the information collected from such devices could help deepen our understanding of metabolic diseases, improve their diagnosis, identify early disease markers and contribute to individualization of treatment and prevention plans.
Kennedy K. M., de Goffau M. C., Perez-Muñoz M. E., Arrieta M., Bäckhed F., Bork P., Braun T., Bushman F. D., Dore J., de Vos W. M., Earl A. M., Eisen J. A., Elovitz M. A., Ganal-Vonarburg S. C., Gänzle M. G., Garrett W. S., Hall L. J., Hornef M. W., Huttenhower C., Konnikova L., Lebeer S., Macpherson A. J., Massey R. C., McHardy A. C., Koren O., Lawley T. D., Ley R. E., O'Mahony L., O'Toole P. W., Pamer E. G., Parkhill J., Raes J., Rattei T., Salonen A., Segal E., Segata N., Shanahan F., Sloboda D. M., Smith G. C. S., Sokol H., Spector T. D., Surette M. G., Tannock G. W., Walker A. W., Yassour M. & Walter J.
(2023)
Nature.
613,
7945,
p. 639-649
Whether the human fetus and the prenatal intrauterine environment (amniotic fluid and placenta) are stably colonized by microbial communities in a healthy pregnancy remains a subject of debate. Here we evaluate recent studies that characterized microbial populations in human fetuses from the perspectives of reproductive biology, microbial ecology, bioinformatics, immunology, clinical microbiology and gnotobiology, and assess possible mechanisms by which the fetus might interact with microorganisms. Our analysis indicates that the detected microbial signals are likely the result of contamination during the clinical procedures to obtain fetal samples or during DNA extraction and DNA sequencing. Furthermore, the existence of live and replicating microbial populations in healthy fetal tissues is not compatible with fundamental concepts of immunology, clinical microbiology and the derivation of germ-free mammals. These conclusions are important to our understanding of human immune development and illustrate common pitfalls in the microbial analyses of many other low-biomass environments. The pursuit of a fetal microbiome serves as a cautionary example of the challenges of sequence-based microbiome studies when biomass is low or absent, and emphasizes the need for a trans-disciplinary approach that goes beyond contamination controls by also incorporating biological, ecological and mechanistic concepts.
Talmor-Barkan Y., Yacovzada N., Rossman H., Witberg G., Kalka I., Kornowski R. & Segal E.
(2023)
European heart journal. Cardiovascular pharmacotherapy.
9,
1,
p. 26-37
The advantages of direct oral anticoagulants (DOACs) over warfarin are well established in atrial fibrillation (AF) patients, however, studies that can guide the selection between different DOACs are limited. The aim was to compare the clinical outcomes of treatment with apixaban, rivaroxaban, and dabigatran in patients with AF.
We conducted a retrospective, nationwide, propensity score-matched-based observational study from Clalit Health Services. Data from 141992 individuals with AF was used to emulate a target trial for head-to-head comparison of DOACs therapy. Three-matched cohorts of patients assigned to DOACs, from January-2014 through January-2020, were created. One-to-one propensity score matching was performed. Efficacy/safety outcomes were compared using KaplanMeier survival estimates and Cox proportional hazards models. The trial included 56553 patients (apixaban, n = 35101; rivaroxaban, n = 15682; dabigatran, n = 5770). Mortality and ischaemic stroke rates in patients treated with rivaroxaban were lower compared with apixaban (HR,0.88; 95% CI,0.78-0.99; P,0.037 and HR 0.92; 95% CI,0.86-0.99; P,0.024, respectively). No significant differences in the rates of myocardial infarction, systemic embolism, and overall bleeding were noticed between the different DOACs groups. Patients treated with rivaroxaban demonstrated lower rate of intracranial haemorrhage compared with apixaban (HR,0.86; 95% CI,0.74-1.0; P,0.044). The rate of gastrointestinal bleeding in patients treated with rivaroxaban was higher compared with apixaban (HR, 1.22; 95% CI,1.01-1.44; P, 0.016).
We demonstrated significant differences in outcomes between the three studied DOACs. The results emphasize the need for randomized controlled trials that will compare rivaroxaban, apixaban, and dabigatran in order to better guide the selection among them.
Lee B. Y., Ordovás J. M., Parks E. J., Anderson C. A., Barabási A. L., Clinton S. K., de la Haye K., Duffy V. B., Franks P. W., Ginexi E. M., Hammond K. J., Hanlon E. C., Hittle M., Ho E., Horn A. L., Isaacson R. S., Mabry P. L., Malone S., Martin C. K., Mattei J., Meydani S. N., Nelson L. M., Neuhouser M. L., Parent B., Pronk N. P., Roche H. M., Saria S., Scheer F. A., Segal E., Sevick M. A., Spector T. D., Van Horn L., Varady K. A., Voruganti V. S. & Martinez M. F.
(2022)
The American journal of clinical nutrition.
116,
6,
p. 1877-1900
Precision nutrition is an emerging concept that aims to develop nutrition recommendations tailored to different people's circumstances and biological characteristics. Responses to dietary change and the resulting health outcomes from consuming different diets may vary significantly between people based on interactions between their genetic backgrounds, physiology, microbiome, underlying health status, behaviors, social influences, and environmental exposures. On 11-12 January 2021, the National Institutes of Health convened a workshop entitled "Precision Nutrition: Research Gaps and Opportunities" to bring together experts to discuss the issues involved in better understanding and addressing precision nutrition. The workshop proceeded in 3 parts: part I covered many aspects of genetics and physiology that mediate the links between nutrient intake and health conditions such as cardiovascular disease, Alzheimer disease, and cancer; part II reviewed potential contributors to interindividual variability in dietary exposures and responses such as baseline nutritional status, circadian rhythm/sleep, environmental exposures, sensory properties of food, stress, inflammation, and the social determinants of health; part III presented the need for systems approaches, with new methods and technologies that can facilitate the study and implementation of precision nutrition, and workforce development needed to create a new generation of researchers. The workshop concluded that much research will be needed before more precise nutrition recommendations can be achieved. This includes better understanding and accounting for variables such as age, sex, ethnicity, medical history, genetics, and social and environmental factors. The advent of new methods and technologies and the availability of considerably more data bring tremendous opportunity. However, the field must proceed with appropriate levels of caution and make sure the factors listed above are all considered, and systems approaches and methods are incorporated. It will be important to develop and train an expanded workforce with the goal of reducing health disparities and improving precision nutritional advice for all Americans.
Although food-directed immunoglobulin E (IgE) has been studied in the context of allergies, the prevalence and magnitude of IgG responses against dietary antigens are incompletely characterized in the general population. Here, we measured IgG binding against food and environmental antigens obtained from allergen databases and the immune epitope database (IEDB), represented in a phage displayed library of 58,233 peptides. By profiling blood samples of a large cohort representing the average adult Israeli population (n = 1,003), we showed that many food antigens elicited systemic IgG in up to 50% of individuals. Dietary intake of specific food protein correlated with antibody binding, suggesting that diet can shape the IgG epitope repertoire. Our work documents abundant systemic IgG responses against food antigens and provides a reference map of the exact immunogenic epitopes on a population scale, laying the foundation to unravel the role of food- and environmental antigen-directed antibody binding in disease contexts.
Rein M. S., Dadiani M., Godneva A., Bakalenik-Gavry M., Morzaev-Sulzbach D., Vachnish Y., Kolobkov D., Lotan-Pompan m., Weinberger A., Segal E. & Gal-Yam E. N.
(2022)
BMJ Open.
12,
11,
e062498.
Introduction Breast cancer survivors treated with adjuvant endocrine therapy commonly experience weight gain, which has been associated with low adherence to therapy and worse breast cancer prognosis. We aim to assess whether a personalised postprandial glucose targeting diet will be beneficial for weight management as compared with the recommended Mediterranean diet in this patient populationMethods and analysis The BREAst Cancer Personalised NuTrition study is a phase-2 randomised trial in hormone receptor positive patients with breast cancer, treated with adjuvant endocrine therapy. The study objective is to assess whether dietary intervention intended to improve postprandial glycaemic response to meals results in better weight and glycaemic control in this population as compared with the standard recommended Mediterranean diet. Consenting participants will be assigned in a single blinded fashion to either of two dietary arms (Mediterranean diet or an algorithm-based personalised diet). They will be asked to provide a stool sample for microbiome analysis and will undergo continuous glucose monitoring for 2weeks, at the initiation and termination of the intervention period. Microbiome composition data will be used to tailor personal dietary recommendations. After randomisation and provision of dietary recommendations, participants will be asked to continuously log their diet and lifestyle activities on a designated smartphone application during the 6-month intervention period, during which they will be monthly monitored by a certified dietitian. Participants clinical records will be followed twice yearly for 5years for treatment adherence, disease-free survival and recurrence.
Leviatan S., Kalka I. N., Vogl T., Klompas S., Weinberger A. & Segal E.
(2022)
PLoS Computational Biology.
18,
11,
e1010663.
BIPS (Build Phage ImmunoPrecipitation Sequencing library) is a software that converts a list of proteins into a custom DNA oligonucleotide library for the PhIP-Seq system. The tool creates constant-length oligonucleotides with internal barcodes, while maintaining the original length of the peptide. This allows using large libraries, of hundreds of thousands of oligonucleotides, while saving on the costs of sequencing and maintaining the accuracy of oligonucleotide reads identification. BIPS is available under GNU public license from: https://github.com/kalkairis/BuildPhIPSeqLibrary .
Mompeo O., Freidin M. B., Gibson R., Hysi P. G., Christofidou P., Segal E., Valdes A. M., Spector T. D., Menni C. & Mangino M.
(2022)
Nutrients.
14,
20,
4431.
Diet is a modifiable risk factor for common chronic diseases and mental health disorders, and its effects are under partial genetic control. To estimate the impact of diet on individual health, most epidemiological and genetic studies have focused on individual aspects of dietary intake. However, analysing individual food groups in isolation does not capture the complexity of the whole diet pattern. Dietary indices enable a holistic estimation of diet and account for the intercorrelations between food and nutrients. In this study we performed the first ever genome-wide association study (GWA) including 173,701 individuals from the UK Biobank to identify genetic variants associated with the Dietary Approaches to Stop Hypertension (DASH) diet. DASH was calculated using the 24 h-recall questionnaire collected by UK Biobank. The GWA was performed using a linear mixed model implemented in BOLT-LMM. We identified seven independent single-nucleotide polymorphisms (SNPs) associated with DASH. Significant genetic correlations were observed between DASH and several educational traits with a significant enrichment for genes involved in the AMP-dependent protein kinase (AMPK) activation that controls the appetite by regulating the signalling in the hypothalamus. The colocalization analysis implicates genes involved in body mass index (BMI)/obesity and neuroticism (ARPP21, RP11-62H7.2, MFHAS1, RHEBL1). The Mendelian randomisation analysis suggested that increased DASH score, which reflect a healthy diet style, is causal of lower glucose, and insulin levels. These findings further our knowledge of the pathways underlying the relationship between diet and health outcomes. They may have significant implications for global public health and provide future dietary recommendations for the prevention of common chronic diseases.
Keshet A., Rossman H., Shilo S., Barbash-Hazan S., Amit G., Bivas-Benita M., Yanover C., Girshovitz I., Akiva P., Ben-Haroush A., Hadar E., Wiznitzer A. & Segal E.
(2022)
PLoS ONE.
17,
10,
e0268103.
Assessing the impact of cesarean delivery (CD) on long-term childhood outcomes is challenging as conducting a randomized controlled trial is rarely feasible and inferring it from observational data may be confounded. Utilizing data from electronic health records of 737,904 births, we defined and emulated a target trial to estimate the effect of CD on predefined long-term pediatric outcomes. Causal effects were estimated using pooled logistic regression and standardized survival curves, leveraging data breadth to account for potential confounders. Diverse sensitivity analyses were performed including replication of results in an external validation set from the UK including 625,044 births. Children born in CD had an increased risk to develop asthma (10-year risk differences (95% CI) 0.64% (0.31, 0.98)), an average treatment effect of 0.10 (0.070.12) on body mass index (BMI) z-scores at age 5 years old and 0.92 (0.681.14) on the number of respiratory infection events until 5 years of age. A positive 10-year risk difference was also observed for atopy (10-year risk differences (95% CI) 0.74% (-0.06, 1.52)) and allergy 0.47% (-0.32, 1.28)). Increased risk for these outcomes was also observed in the UK cohort. Our findings add to a growing body of evidence on the long-term effects of CD on pediatric morbidity, may assist in the decision to perform CD when not medically indicated and paves the way to future research on the mechanisms underlying these effects and intervention strategies targeting them.
Popp C. J., Hu L., Kharmats A. Y., Curran M., Berube L., Wang C., Pompeii M. L., Illiano P., St-Jules D. E., Mottern M., Li H., Williams N., Schoenthaler A., Segal E., Godneva A., Thomas D., Bergman M., Schmidt A. M. & Sevick M. A.
(2022)
JAMA network open.
5,
9,
Importance Interindividual variability in postprandial glycemic response (PPGR) to the same foods may explain why low glycemic index or load and low-carbohydrate diet interventions have mixed weight loss outcomes. A precision nutrition approach that estimates personalized PPGR to specific foods may be more efficacious for weight loss.Objective To compare a standardized low-fat vs a personalized diet regarding percentage of weight loss in adults with abnormal glucose metabolism and obesity.Design, Setting, and Participants The Personal Diet Study was a single-center, population-based, 6-month randomized clinical trial with measurements at baseline (0 months) and 3 and 6 months conducted from February 12, 2018, to October 28, 2021. A total of 269 adults aged 18 to 80 years with a body mass index (calculated as weight in kilograms divided by height in meters squared) ranging from 27 to 50 and a hemoglobin A1c level ranging from 5.7% to 8.0% were recruited. Individuals were excluded if receiving medications other than metformin or with evidence of kidney disease, assessed as an estimated glomerular filtration rate of less than 60 mL/min/1.73 m2 using the Chronic Kidney Disease Epidemiology Collaboration equation, to avoid recruiting patients with advanced type 2 diabetes.Interventions Participants were randomized to either a low-fat diet (
Vogl T., Kalka I. N., Klompus S., Leviatan S., Weinberger A. & Segal E.
(2022)
Science advances.
8,
38,
eabq2422.
Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a debilitating disease with an unclear etiology and pathogenesis. Both an involvement of the immune system and gut microbiota dysbiosis have been implicated in its pathophysiology. However, potential interactions between adaptive immune responses and the microbiota in ME/CFS have been incompletely characterized. Here, we profiled antibody responses of patients with severe ME/CFS and healthy controls against microbiota and viral antigens represented as a phage-displayed 244,000 variant library. Patients with severe ME/CFS exhibited distinct serum antibody epitope repertoires against flagellins of Lachnospiraceae bacteria. Training machine learning algorithms on this antibody-binding data demonstrated that immune responses against gut microbiota represent a unique layer of information beyond standard blood tests, providing improved molecular diagnostics for ME/CFS. Together, our results point toward an involvement of the microbiota-immune axis in ME/CFS and lay the foundation for comparative studies with inflammatory bowel diseases and illnesses characterized by long-term fatigue symptoms, including post-COVID-19 syndrome.
Suez J., Cohen Y., Valdés-Mas R., Mor U., Dori-Bachash M., Federici S., Zmora N., Leshem A., Heinemann M., Linevsky R., Zur M., Ben-Zeev Brik R., Bukimer A., Eliyahu-Miller S., Metz A., Fischbein R., Sharov O., Malitsky S., Itkin M., Stettner N., Harmelin A., Shapiro H., Stein-Thoeringer C. K., Segal E. & Elinav E.
(2022)
Cell.
185,
18,
p. 3307-3328.e19
Non-nutritive sweeteners (NNS) are commonly integrated into human diet and presumed to be inert; however, animal studies suggest that they may impact the microbiome and downstream glycemic responses. We causally assessed NNS impacts in humans and their microbiomes in a randomized-controlled trial encompassing 120 healthy adults, administered saccharin, sucralose, aspartame, and stevia sachets for 2 weeks in doses lower than the acceptable daily intake, compared with controls receiving sachet-contained vehicle glucose or no supplement. As groups, each administered NNS distinctly altered stool and oral microbiome and plasma metabolome, whereas saccharin and sucralose significantly impaired glycemic responses. Importantly, gnotobiotic mice conventionalized with microbiomes from multiple top and bottom responders of each of the four NNS-supplemented groups featured glycemic responses largely reflecting those noted in respective human donors, which were preempted by distinct microbial signals, as exemplified by sucralose. Collectively, human NNS consumption may induce person-specific, microbiome-dependent glycemic alterations, necessitating future assessment of clinical implications.
Craddock H. A., Godneva A., Rothschild D., Motro Y., Grinstein D., Lotem-Michaeli Y., Narkiss T., Segal E. & Moran-Gilad J.
(2022)
npj Biofilms and Microbiomes.
8,
66.
Dogs have a key role in law enforcement and military work, and research with the goal of improving working dog performance is ongoing. While there have been intriguing studies from lab animal models showing a potential connection between the gut microbiome and behavior or mental health there is a dearth of studies investigating the microbiome-behavior relationship in working dogs. The overall objective of this study was to characterize the microbiota of working dogs and to determine if the composition of the microbiota is associated with behavioral and performance outcomes. Freshly passed stools from each working canine (Total n = 134) were collected and subject to shotgun metagenomic sequencing using Illumina technology. Behavior, performance, and demographic metadata were collected. Descriptive statistics and prediction models of behavioral/phenotypic outcomes using gradient boosting classification based on Xgboost were used to study associations between the microbiome and outcomes. Regarding machine learning methodology, only microbiome features were used for training and predictors were estimated in cross-validation. Microbiome markers were statistically associated with motivation, aggression, cowardice/hesitation, sociability, obedience to one trainer vs many, and body condition score (BCS). When prediction models were developed based on machine learning, moderate predictive power was observed for motivation, sociability, and gastrointestinal issues. Findings from this study suggest potential gut microbiome markers of performance and could potentially advance care for working canines.
Adequate functioning of the intestinal barrier is required in order to repel invading pathogens while tolerating commensal microbiota and self-antigens. Inflammatory bowel diseases (IBDs), encompassing Crohn's disease (CD) and ulcerative colitis (UC), are characterized by disrupted intestinal barrier integrity, resulting in excessive passage of luminal antigens and the activation of aberrant immune responses against otherwise unexposed antigens. A comprehensive overview of the exact antigens associated with IBD is still lacking, but recent innovative antibody profiling technologies have enabled systematic characterization of humoral immunity in health and disease. Here, we review established serological antibodies and novel high-throughput methods, such as protein arrays, phage-display immunoprecipitation sequencing (PhIP-Seq), and B cell receptor sequencing (BCRseq), and provide an outlook on their applications in disease diagnostics, therapeutic interventions, and opportunities for prevention in IBD.
Leviatan S., Shoer S., Rothschild D., Gorodetski M. & Segal E.
(2022)
Nature Communications.
13,
3863.
The gut is the richest ecosystem of microbes in the human body and has great influence on our health. Despite many efforts, the set of microbes inhabiting this environment is not fully known, limiting our ability to identify microbial content and to research it. In this work, we combine new microbial metagenomic assembled genomes from 51,052 samples, with previously published genomes to produce a curated set of 241,118 genomes. Based on this set, we procure a new and improved human gut microbiome reference set of 3594 high quality species genomes, which successfully matches 83.65% validation samples reads. This improved reference set contains 310 novel species, including one that exists in 19% of validation samples. Overall, this study provides a gut microbial genome reference set that can serve as a valuable resource for further research.
Here, Leviatan et al. produce 241,118 genome assemblies to produce a new human gut microbiome reference set of 3,594 species genomes, of which 310 represent previously undescribed species, making the catalog a valuable resource for further research.
Background: Type 2 diabetes (T2D) accounts for ~90% of all cases of diabetes, resulting in an estimated 6.7 million deaths in 2021, according to the International Diabetes Federation (IDF). Early detection of patients with high risk of developing T2D can reduce the incidence of the disease through a change in lifestyle, diet, or medication. Since populations of lower socio-demographic status are more susceptible to T2D and might have limited resources or access to sophisticated computational resources, there is a need for accurate yet accessible prediction models. Methods: In this study, we analyzed data from 44,709 non-diabetic U.K. Biobank participants aged 40-69, predicting the risk of T2D onset within a selected timeframe (mean of 7.3 years with a standard deviation of 2.3 years). We started with 798 features that we identified as potential predictors for T2D onset. We first analyzed the data using gradient boosting decision trees, survival analysis, and logistic regression methods. We devised one non-laboratory model accessible to the general population and one more precise yet simple model that utilizes laboratory tests. We simplified both models to an accessible scorecard form, tested the models on normoglycemic and prediabetes sub cohorts, and compared the results to the results of the general cohort. We established the non-laboratory model using the following covariates: sex, age, weight, height, waist size, hip circumference, waist-to-hip Ratio (WHR), and Body-Mass Index (BMI). For the laboratory model, we used age and sex together with four common blood tests: HDL (high-density lipoprotein), gamma-glutamyl transferase, glycated hemoglobin, and triglycerides. As an external validation dataset, we used the electronic medical record database of Clalit Health Services. Results: The non-laboratory scorecard model achieved an Area Under the Receiver Operating Curve (auROC) of 0.81 (0.77-0.84 95% Confidence Interval (CI)) and an odds ratio (OR) between the upper and fifth prevalence deciles of 17.2 (5-66 95% CI). Using this model, we classified three risk groups, a group with 1% (0.8-1%), 5% (3-6%), and the third group with a 9% (7-12%) risk of developing T2D. We further analyzed the contribution of the laboratory-based model and devised a blood-test model based on age, sex and the four common blood tests noted above. In this scorecard model, we included age, sex, glycated hemoglobin (HbA1c%), gamma glutamyl-transferase, triglycerides, and HDL cholesterol. Using this model, we achieved an auROC of 0.87 (0.85-0.90 95% CI) and a deciles' OR of x48 (12-109 95% CI). Using this model, we classified the cohort into four risk groups with the following risks: 0.5% (0.4%-7%); 3% (2-4%); 10% (8-12%) and a high-risk group of 23% (10-37%) of developing T2D. When applying the blood tests model using the external validation cohort (Clalit), we achieved an auROC of 0.75 (0.74-0.75 95% CI). We analyzed several additional comprehensive models, which included genotyping data and other environmental factors. We found that these models did not provide cost-efficient benefits over the four blood test model. The commonly used German Diabetes Risk Score (GDRS) and Finnish Diabetes Risk Score (FINDRISC) models, trained using our data, achieved an auROC of 0.73 (0.69-0.76) and 0.66 (0.62-0.70), respectively, inferior to the results achieved by the four blood test model and by the Anthropometry models. Conclusions: The four blood tests and anthropometric models outperformed the commonly used non-laboratory models, the FINDRISC and the GDRS. We suggest that our models be used as tools for decision-makers to assess populations at elevated T2D risk and thus improve medical strategies. These models might also provide a personal catalyst for changing lifestyle, diet, or medication modifications to lower the risk of T2D onset.
Popp C. J., Zhou B., Manigrasso M. B., Li H., Curran M., Hu L., St-Jules D. E., Alemán J. O., Vanegas S. M., Jay M., Bergman M., Segal E., Sevick M. A. & Schmidt A. M.
(2022)
Current Developments in Nutrition.
6,
5,
nzac046.
Background: Accruing evidence indicates that accumulation of advanced glycation end products (AGEs) and activation of the receptor for AGEs (RAGE) play a significant role in obesity and type 2 diabetes. The concentrations of circulating RAGE isoforms, such as soluble RAGE (sRAGE), cleaved RAGE (cRAGE), and endogenous secretory RAGE (esRAGE), collectively sRAGE isoforms, may be implicit in weight loss and energy compensation resulting from caloric restriction. Objectives: We aimed to evaluate whether baseline concentrations of sRAGE isoforms predicted changes (Δ) in body composition [fat mass (FM), fat-free mass (FFM)], resting energy expenditure (REE), and adaptive thermogenesis (AT) during weight loss. Methods: Data were collected during a behavioral weight loss intervention in adults with obesity. At baseline and 3 mo, participants were assessed for body composition (bioelectrical impedance analysis) and REE (indirect calorimetry), and plasma was assayed for concentrations of sRAGE isoforms (sRAGE, esRAGE, cRAGE). AT was calculated using various mathematical models that included measured and predicted REE. A linear regression model that adjusted for age, sex, glycated hemoglobin (HbA1c), and randomization arm was used to test the associations between sRAGE isoforms and metabolic outcomes. Results: Participants (n = 41; 70% female; mean ± SD age: 57 ± 11 y; BMI: 38.7 ± 3.4 kg/m2) experienced modest and variable weight loss over 3 mo. Although baseline sRAGE isoforms did not predict changes in ΔFM or ΔFFM, all baseline sRAGE isoforms were positively associated with ΔREE at 3 mo. Baseline esRAGE was positively associated with AT in some, but not all, AT models. The association between sRAGE isoforms and energy expenditure was independent of HbA1c, suggesting that the relation was unrelated to glycemia. Conclusions: This study demonstrates a novel link between RAGE and energy expenditure in human participants undergoing weight loss. This trial was registered at clinicaltrials.gov as NCT03336411.
Eitan C., Siany A., Barkan E., Olender T., Yanowski E., Marmor-Kollet H., Chapnik E., Ainbinder E., Ben-Dor S., Segal E. & Hornstein E.
(2022)
Nature Neuroscience.
25,
4,
p. 433-445
The noncoding genome is substantially larger than the protein-coding genome but has been largely unexplored by genetic association studies. Here, we performed region-based rare variant association analysis of >25,000 variants in untranslated regions of 6,139 amyotrophic lateral sclerosis (ALS) whole genomes and the whole genomes of 70,403 non-ALS controls. We identified interleukin-18 receptor accessory protein (IL18RAP) 3 untranslated region (3UTR) variants as significantly enriched in non-ALS genomes and associated with a fivefold reduced risk of developing ALS, and this was replicated in an independent cohort. These variants in the IL18RAP 3UTR reduce mRNA stability and the binding of double-stranded RNA (dsRNA)-binding proteins. Finally, the variants of the IL18RAP 3UTR confer a survival advantage for motor neurons because they dampen neurotoxicity of human induced pluripotent stem cell (iPSC)-derived microglia bearing an ALS-associated expansion in C9orf72, and this depends on NF-κB signaling. This study reveals genetic variants that protect against ALS by reducing neuroinflammation and emphasizes the importance of noncoding genetic association studies.
Shilo S., Godneva A., Rachmiel M., Korem T., Kolobkov D., Karady T., Bar N., Wolf B. C., Glantz-Gashai Y., Cohen M., Zuckerman Levin N., Shehadeh N., Gruber N., Levran N., Koren S., Weinberger A., Pinhas-Hamiel O. & Segal E.
(2022)
Diabetes Care.
45,
3,
p. 502-511
OBJECTIVE Despite technological advances, results from various clinical trials have repeatedly shown that many individuals with type 1 diabetes (T1D) do not achieve their glycemic goals. One of the major challenges in disease management is the administration of an accurate amount of insulin for each meal that will match the expected postprandial glycemic response (PPGR). The objective of this study was to develop a prediction model for PPGR in individuals with T1D.
RESEARCH DESIGN AND METHODS We recruited individuals with T1D who were using continuous glucose monitoring and continuous subcutaneous insulin infusion devices simultaneously to a prospective cohort and profiled them for 2 weeks. Participants were asked to report real-time dietary intake using a designated mobile app. We measured their PPGRs and devised machine learning algorithms for PPGR prediction, which integrate glucose measurements, insulin dosages, dietary habits, blood parameters, anthropometrics, exercise, and gut microbiota. Data of the PPGR of 900 healthy individuals to 41,371 meals were also integrated into the model. The performance of the models was evaluated with 10-fold cross validation.
RESULTS A total of 121 individuals with T1D, 75 adults and 46 children, were included in the study. PPGR to 6,377 meals was measured. Our PPGR prediction model substantially outperforms a baseline model with emulation of standard of care (correlation of R = 0.59 compared with R = 0.40 for predicted and observed PPGR respectively; P < 10−10). The model was robust across different subpopulations. Feature attribution analysis revealed that glucose levels at meal initiation, glucose trend 30 min prior to meal, meal carbohydrate content, and meals carbohydrate-to-fat ratio were the most influential features for the model.
CONCLUSIONS Our model enables a more accurate prediction of PPGR and therefore may allow a better adjustment of the required insulin dosage for meals. It can be further implemented in closed loop systems and may lead to rationally designed nutritional interventions personally tailored for individuals with T1D on the basis of meals with expected low glycemic response.
Mashiah J., Karady T., Fliss-Isakov N., Sprecher E., Slodownik D., Artzi O., Samuelov L., Ellenbogen E., Godneva A., Segal E. & Maharshak N.
(2022)
Immunity, inflammation and disease.
10,
3,
e570.
Background: Atopic dermatitis (AD) is a remitting relapsing chronic eczematous pruritic disease. Several studies suggest that gut microbiota may influence AD by immune system regulation. Methods: We performed the first in-human efficacy and safety assessment of fecal microbiota transplantation (FMT) for AD adult patients. All patients received 2 placebo transplantations followed by 4 FMTs each 2 weeks apart. AD severity and fecal microbiome profile were evaluated by the Scoring Atopic Dermatitis Score (SCORAD), the weekly frequency of topical corticosteroids usage, and gut microbiota metagenomic analysis, at the study beginning, before every FMT, and 18 months after the last FMT. Results: Nine patients completed the study protocol. There was no significant change in the SCORAD score following the two placebo transplants. The average SCORAD score significantly decreased from baseline at Weeks 412 (before and 2 weeks after 4 times of FMT) (59.2 ± 34.9%, Wilcoxon p =.011), 50% and 75% decrease was achieved by 7 (77%) and 4 (44%) patients, respectively. At Week 18 (8 weeks after the last FMT) the average SCORAD score decreased from baseline at Week 4 (85.5 ± 8.4%, Wilcoxon p =.018), 50% and 75% decrease was achieved by 7 (77%) and 6 (66.7%) patients respectively. Weekly topical corticosteroids usage was diminished during the study and follow-up period as well. Two patients had a quick relapse and were switched to a different treatment. Two patients developed exacerbations alleviated after an additional fifth FMT. Metagenomic analysis of the fecal microbiota of patients and donors showed bacterial strains transmission from donors to patients. No adverse events were recorded during the study and follow-up period. Conclusions: FMT may be a safe and effective therapeutic intervention for AD patients, associated with transfer of specific microbial species from the donors to the patients. Further studies are required to reconfirm these results.
Rothschild D., Leviatan S., Hanemann A., Cohen Y., Weissbrod O. & Segal E.
(2022)
PLoS ONE.
17,
3 March,
e0265756.
Numerous human conditions are associated with the microbiome, yet studies are inconsistent as to the magnitude of the associations and the bacteria involved, likely reflecting insufficiently employed sample sizes. Here, we collected diverse phenotypes and gut microbiota from 34,057 individuals from Israel and the U.S.. Analyzing these data using a much-expanded microbial genomes set, we derive an atlas of robust and numerous unreported associations between bacteria and physiological human traits, which we show to replicate in cohorts from both continents. Using machine learning models trained on microbiome data, we show prediction accuracy of human traits across two continents. Subsampling our cohort to smaller cohort sizes yielded highly variable models and thus sensitivity to the selected cohort, underscoring the utility of large cohorts and possibly explaining the source of discrepancies across studies. Finally, many of our prediction models saturate at these numbers of individuals, suggesting that similar analyses on larger cohorts may not further improve these predictions.
Shilo S., Godneva A., Rachmiel M., Korem T., Bussi Y., Kolobkov D., Karady T., Bar N., Wolf B. C., Glantz-Gashai Y., Cohen M., Levin N. Z., Shehadeh N., Gruber N., Levran N., Koren S., Weinberger A., Pinhas-Hamiel O. & Segal E.
(2022)
Diabetes Care.
45,
3,
p. 555-563
Previous studies have demonstrated an association between gut microbiota composition and type 1 diabetes (T1D) pathogenesis. However, little is known about the composition and function of the gut microbiome in adults with longstanding T1D or its association with host glycemic control. We performed a metagenomic analysis of the gut microbiome obtained from fecal samples of 74 adults with T1D, 14.6 ± 9.6 years following diagnosis, and compared their microbial composition and function to 296 age-matched healthy control subjects (1:4 ratio). We further analyzed the association between microbial taxa and indices of glycemic control derived from continuous glucose monitoring measurements and blood tests and constructed a prediction model that solely takes microbiome features as input to evaluate the discriminative power of microbial composition for distinguishing individuals with T1D from control subjects. Adults with T1D had a distinct microbial signature that separated them from control subjects when using prediction algorithms on held-out subjects (area under the receiver operating characteristic curve = 0.89 ± 0.03). Linear discriminant analysis showed several bacterial species with significantly higher scores in T1D, including Prevotella copri and Eubacterium siraeum, and species with higher scores in control subjects, including Firmicutes bacterium and Faecalibacterium prausnitzii (P < 0.05, false discovery rate corrected for all). On the functional level, several metabolic pathways were significantly lower in adults with T1D. Several bacterial taxa and metabolic pathways were associated with the host's glycemic control. We identified a distinct gut microbial signature in adults with longstanding T1D and associations between microbial taxa, metabolic pathways, and glycemic control indices. Additional mechanistic studies are needed to identify the role of these bacteria for potential therapeutic strategies.
Rein M., Ben-Yacov O., Godneva A., Shilo S., Zmora N., Kolobkov D., Cohen-Dolev N., Wolf B., Kosower N., Lotan-Pompan M., Weinberger A., Halpern Z., Zelber-Sagi S., Elinav E. & Segal E.
(2022)
BMC Medicine.
20,
56.
Dietary modifications are crucial for managing newly diagnosed type 2 diabetes mellitus (T2DM) and preventing its health complications, but many patients fail to achieve clinical goals with diet alone. We sought to evaluate the clinical effects of a personalized postprandial-targeting (PPT) diet on glycemic control and metabolic health in individuals with newly diagnosed T2DM as compared to the commonly recommended Mediterranean-style (MED) diet.We enrolled 23 adults with newly diagnosed T2DM (aged 53.5 ± 8.9 years, 48% males) for a randomized crossover trial of two 2-week-long dietary interventions. Participants were blinded to their assignment to one of the two sequence groups: either PPT-MED or MED-PPT diets. The PPT diet relies on a machine learning algorithm that integrates clinical and microbiome features to predict personal postprandial glucose responses (PPGR). We further evaluated the long-term effects of PPT diet on glycemic control and metabolic health by an additional 6-month PPT intervention (n = 16). Participants were connected to continuous glucose monitoring (CGM) throughout the study and self-recorded dietary intake using a smartphone application.In the crossover intervention, the PPT diet lead to significant lower levels of CGM-based measures as compared to the MED diet, including average PPGR (mean difference between diets, -19.8 ± 16.3mg/dl × h, p < 0.001), mean glucose (mean difference between diets, -7.8 ± 5.5mg/dl, p < 0.001), and daily time of glucose levels >140mg/dl (mean difference between diets, -2.42 ± 1.7h/day, p < 0.001). Blood fructosamine also decreased significantly more during PPT compared to MED intervention (mean change difference between diets, -16.4 ± 37μmol/dl, p < 0.0001). At the end of 6months, the PPT intervention leads to significant improvements in multiple metabolic health parameters, among them HbA1c (mean ± SD, -0.39 ± 0.48%, p < 0.001), fasting glucose (-16.4 ± 24.2mg/dl, p = 0.02) and triglycerides (-49 ± 46mg/dl, p < 0.001). Importantly, 61% of the participants exhibited diabetes remission, as measured by HbA1c < 6.5%. Finally, some clinical improvements were significantly associated with gut microbiome changes per person.In this crossover trial in subjects with newly diagnosed T2DM, a PPT diet improved CGM-based glycemic measures significantly more than a Mediterranean-style MED diet. Additional 6-month PPT intervention further improved glycemic control and metabolic health parameters, supporting the clinical efficacy of this approach. ClinicalTrials.gov number, NCT01892956.
Talmor-Barkan Y., Bar N., Shaul A. A., Shahaf N., Godneva A., Bussi Y., Lotan-Pompan M., Weinberger A., Shechter A., Chezar-Azerrad C., Arow Z., Hammer Y., Chechi K., Forslund S. K., Fromentin S., Dumas M., Ehrlich S. D., Pedersen O., Kornowski R. & Segal E.
(2022)
Nature Medicine.
28,
2,
p. 295-302
Complex diseases, such as coronary artery disease (CAD), are often multifactorial, caused by multiple underlying pathological mechanisms. Here, to study the multifactorial nature of CAD, we performed comprehensive clinical and multi-omic profiling, including serum metabolomics and gut microbiome data, for 199 patients with acute coronary syndrome (ACS) recruited from two major Israeli hospitals, and validated these results in a geographically distinct cohort. ACS patients had distinct serum metabolome and gut microbial signatures as compared with control individuals, and were depleted in a previously unknown bacterial species of the Clostridiaceae family. This bacterial species was associated with levels of multiple circulating metabolites in control individuals, several of which have previously been linked to an increased risk of CAD. Metabolic deviations in ACS patients were found to be person specific with respect to their potential genetic or environmental origin, and to correlate with clinical parameters and cardiovascular outcomes. Moreover, metabolic aberrations in ACS patients linked to microbiome and diet were also observed to a lesser extent in control individuals with metabolic impairment, suggesting the involvement of these aberrations in earlier dysmetabolic phases preceding clinically overt CAD. Finally, a metabolomics-based model of body mass index (BMI) trained on the non-ACS cohort predicted higher-than-actual BMI when applied to ACS patients, and the excess BMI predictions independently correlated with both diabetes mellitus (DM) and CAD severity, as defined by the number of vessels involved. These results highlight the utility of the serum metabolome in understanding the basis of risk-factor heterogeneity in CAD.
Fromentin S., Forslund S. K., Chechi K., Bar N. & Segal E.
(2022)
Nature Medicine.
28,
2,
p. 303-314
Previous microbiome and metabolome analyses exploring non-communicable diseases have paid scant attention to major confounders of study outcomes, such as common, pre-morbid and co-morbid conditions, or polypharmacy. Here, in the context of ischemic heart disease (IHD), we used a study design that recapitulates disease initiation, escalation and response to treatment over time, mirroring a longitudinal study that would otherwise be difficult to perform given the protracted nature of IHD pathogenesis. We recruited 1,241 middle-aged Europeans, including healthy individuals, individuals with dysmetabolic morbidities (obesity and type 2 diabetes) but lacking overt IHD diagnosis and individuals with IHD at three distinct clinical stages-acute coronary syndrome, chronic IHD and IHD with heart failure-and characterized their phenome, gut metagenome and serum and urine metabolome. We found that about 75% of microbiome and metabolome features that distinguish individuals with IHD from healthy individuals after adjustment for effects of medication and lifestyle are present in individuals exhibiting dysmetabolism, suggesting that major alterations of the gut microbiome and metabolome might begin long before clinical onset of IHD. We further categorized microbiome and metabolome signatures related to prodromal dysmetabolism, specific to IHD in general or to each of its three subtypes or related to escalation or de-escalation of IHD. Discriminant analysis based on specific IHD microbiome and metabolome features could better differentiate individuals with IHD from healthy individuals or metabolically matched individuals as compared to the conventional risk markers, pointing to a pathophysiological relevance of these features.
Bourgonje A. R., Andreu-Sánchez S., Vogl T., Hu S., Vich Vila A., Leviatan S., Kurilshikov A., Klompus S., Kalka I. N., van Dullemen H. M., Weinberger A., Visschedijk M. C., Festen E. A. M., Faber K. N., Wijmenga C., Dijkstra G., Segal E., Fu J., Zhernakova A. & Weersma R. K.
(2022)
Journal of Crohn's and Colitis.
16,
Supplement_1,
p. i100-i102
Rossman H. & Segal E.
(2022)
Nature Microbiology.
7,
1,
p. 16-17
A statistical framework that integrates data from a fine-scale targeted testing scheme and regular randomized surveillance surveys provides unbiased and fine-grained estimates of key SARS-CoV-2 epidemiological parameters that are critical for real-time policy decision-making.
Elmaleh D. R., Downey M. A., Kundakovic L., Wilkinson J. E., Neeman Z. & Segal E.
(2022)
Handbook of Microbiome and Gut-Brain-Axis in Alzheimer's Disease
.
Pasinetti G. M.(eds.).
p. 117-145
Progressive neurodegenerative diseases represent some of the largest growing treatment challenges for public health in modern society. These diseases mainly progress due to aging and are driven by microglial surveillance and activation in response to changes occurring in the aging brain. The lack of efficacious treatment options for Alzheimer's disease (AD), as the focus of this review, and other neurodegenerative disorders has encouraged new approaches to address neuroinflammation for potential treatments. Here we will focus on the increasing evidence that dysbiosis of the gut microbiome is characterized by inflammation that may carry over to the central nervous system and into the brain. Neuroinflammation is the common thread associated with neurodegenerative diseases, but it is yet unknown at what point and how innate immune function turns pathogenic for an individual. This review will address extensive efforts to identify constituents of the gut microbiome and their neuroactive metabolites as a peripheral path to treatment. This approach is still in its infancy in substantive clinical trials and requires thorough human studies to elucidate the metabolic microbiome profile to design appropriate treatment strategies for early stages of neurodegenerative disease. We view that in order to address neurodegenerative mechanisms of the gut, microbiome and metabolite profiles must be determined to pre-screen AD subjects prior to the design of specific, chronic titrations of gut microbiota with low-dose antibiotics. This represents an exciting treatment strategy designed to balance inflammatory microglial involvement in disease progression with an individual's manifestation of AD as influenced by a coercive inflammatory gut.
Kalka I. N., Gavrieli A., Shilo S., Rossman H., Artzi N. S., Yacovzada N. & Segal E.
(2021)
Communications Medicine.
1,
1,
55.
Background Variability of response to medication is a well-known phenomenon, determined by both environmental and genetic factors. Understanding the heritable component of the response to medication is of great interest but challenging due to several reasons, including small study cohorts and computational limitations. Methods Here, we study the heritability of variation in the glycaemic response to metformin, first-line therapeutic agent for type 2 diabetes (T2D), by leveraging 18 years of electronic health records (EHR) data from Israels largest healthcare service provider, consisting of over five million patients of diverse ethnicities and socio-economic background. Our cohort consists of 80,788 T2D patients treated with metformin, with an accumulated number of 1,611,591 HbA1C measurements and 4,581,097 metformin prescriptions. We estimate the explained variance of glycated hemoglobin (HbA1c%) reduction due to inheritance by constructing a six-generation population-size pedigree from national registries and linking it to medical health records. Results Using Linear Mixed Model-based framework, a common-practice method for heritability estimation, we calculate a heritability measure of $${h}^{2}=12.6 \%$$ h 2 = 12.6 % (95% CI, $$6.1 \%\! -\!19.1 \%$$ 6.1 % − 19.1 % ) for absolute reduction of HbA1c% after metformin treatment in the entire cohort, $${h}^{2}=21.0 \%$$ h 2 = 21.0 % (95% CI, $$7.8 \%\! -\!34.4 \%$$ 7.8 % − 34.4 % ) for males and $${h}^{2}=22.9 \%$$ h 2 = 22.9 % (95% CI, $$10.0 \%\! -\!35.7 \%$$ 10.0 % − 35.7 % ) in females. Results remain unchanged after adjusting for pre-treatment HbA1c%, and in proportional reduction of HbA1c%. Conclusions To the best of our knowledge, our work is the first to estimate heritability of drug response using solely EHR data combining a pedigree-based kinship matrix. We demonstrate that while response to metformin treatment has a heritable component, most of the variation is likely due to other factors, further motivating non-genetic analyses aimed at unraveling metformins action mechanism.
Benjamin A., Kuperman Y., Eren N., Rotkopf R., Amitai M., Rossman H., Shilo S., Meir T., Keshet A., Nuttman-Shwartz O., Segal E. & Chen A.
(2021)
Molecular Psychiatry.
26,
11,
p. 6149-6158
The COVID-19 pandemic poses multiple psychologically stressful challenges and is associated with an increased risk for mental illness. Previous studies have focused on the psychopathological symptoms associated with the outbreak peak. Here, we examined the behavioural and mental-health impact of the pandemic in Israel using an online survey, during the six weeks encompassing the end of the first outbreak and the beginning of the second. We used clinically validated instruments to assess anxiety- and depression-related emotional distress, symptoms, and coping strategies, as well as questions designed to specifically assess COVID-19-related concerns. Higher emotional burden was associated with being female, younger, unemployed, living in high socioeconomic status localities, having prior medical conditions, encountering more people, and experiencing physiological symptoms. Our findings highlight the environmental context and its importance in understanding individual ability to cope with the long-term stressful challenges of the pandemic.
The particularly interdisciplinary nature of human microbiome research makes the organization and reporting of results spanning epidemiology, biology, bioinformatics, translational medicine and statistics a challenge. Commonly used reporting guidelines for observational or genetic epidemiology studies lack key features specific to microbiome studies. Therefore, a multidisciplinary group of microbiome epidemiology researchers adapted guidelines for observational and genetic studies to culture-independent human microbiome studies, and also developed new reporting elements for laboratory, bioinformatics and statistical analyses tailored to microbiome studies. The resulting tool, called Strengthening The Organization and Reporting of Microbiome Studies (STORMS), is composed of a 17-item checklist organized into six sections that correspond to the typical sections of a scientific publication, presented as an editable table for inclusion in supplementary materials. The STORMS checklist provides guidance for concise and complete reporting of microbiome studies that will facilitate manuscript preparation, peer review, and reader comprehension of publications and comparative analysis of published results.
Shilo S., Bar N., Keshet A., Talmor-Barkan Y., Rossman H., Godneva A., Aviv Y., Edlitz Y., Reicher L., Kolobkov D., Wolf B. C., Lotan-Pompan M., Levi K., Cohen O., Saranga H., Weinberger A. & Segal E.
(2021)
European Journal of Epidemiology.
36,
11,
p. 1187-1194
The 10 K is a large-scale prospective longitudinal cohort and biobank that was established in Israel. The primary aims of the study include development of prediction models for disease onset and progression and identification of novel molecular markers with a diagnostic, prognostic and therapeutic value. The recruitment was initiated in 2018 and is expected to complete in 2021. Between 28/01/2019 and 13/12/2020, 4,629 from the expected 10,000 participants were recruited (46%). Follow-up visits are scheduled every year for a total of 25 years. The cohort includes individuals between the ages of 40 and 70 years. Predefined medical conditions were determined as exclusions. Information collected at baseline includes medical history, lifestyle and nutritional habits, vital signs, anthropometrics, blood tests results, Electrocardiography, Ankle-brachial pressure index (ABI), liver US and Dual-energy X-ray absorptiometry (DXA) tests. Molecular profiling includes transcriptome, proteome, gut and oral microbiome, metabolome and immune system profiling. Continuous measurements include glucose levels using a continuous glucose monitoring device for 2 weeks and sleep monitoring by a home sleep apnea test device for 3 nights. Blood and stool samples are collected and stored at - 80 °C in a storage facility for future research. Linkage is being established with national disease registries.
Chen C. K., Cheng R., Demeter J., Chen J., Weingarten-Gabbay S., Jiang L., Snyder M. P., Weissman J. S., Segal E., Jackson P. K. & Chang H. Y.
(2021)
Molecular Cell.
81,
20,
p. 4300-4318.e13
The human genome encodes tens of thousands circular RNAs (circRNAs) with mostly unknown functions. Circular RNAs require internal ribosome entry sites (IRES) if they are to undergo translation without a 5 cap. Here, we develop a high-throughput screen to systematically discover RNA sequences that can direct circRNA translation in human cells. We identify more than 17,000 endogenous and synthetic sequences as candidate circRNA IRES. 18S rRNA complementarity and a structured RNA element positioned on the IRES are important for driving circRNA translation. Ribosome profiling and peptidomic analyses show extensive IRES-ribosome association, hundreds of circRNA-encoded proteins with tissue-specific distribution, and antigen presentation. We find that circFGFR1p, a protein encoded by circFGFR1 that is downregulated in cancer, functions as a negative regulator of FGFR1 oncoprotein to suppress cell growth during stress. Systematic identification of circRNA IRES elements may provide important links among circRNA regulation, biological function, and disease.
Bar J., Sarig O., Lotan-Pompan M., Dassa B., Miodovnik M., Weinberger A., Sprecher E., Segal E. & Samuelov L.
(2021)
Clinical and Experimental Dermatology.
46,
7,
p. 1223-1229
Background: The human microbiome project addresses the relationship between bacterial flora and the human host, in both healthy and diseased conditions. The skin is an ecosystem with multiple niches, each featuring unique physiological conditions and thus hosting different bacterial populations. The skin microbiome has been implicated in the pathogenesis of many dermatoses. Given the role of dysbiosis in the pathogenesis of inflammation, which is prominent in dystrophic epidermolysis bullosa (DEB), we undertook a study on the skin microbiome. Aim: To characterize the skin microbiome in a series of patients with DEB. Methods: This was a casecontrol study of eight patients with DEB and nine control cases enrolled between June 2017 and November 2018. The skin of patients with DEB was sampled at three different sites: untreated wound, perilesional skin and normal-appearing (uninvolved) skin. Normal skin on the forearm was sampled from age-matched healthy controls (HCs). We used a dedicated DNA extraction protocol to isolate microbial DNA, which was then analysed using next-generation microbial 16S rRNA sequencing. Data were analysed using a series of advanced bioinformatics tools. Results: The wounds, perilesional and uninvolved skin of patients with DEB demonstrated reduced bacterial diversity compared with HCs, with the flora in DEB wounds being the least diverse. We found an increased prevalence of staphylococci species in the lesional and perilesional skin of patients with DEB, compared with their uninvolved, intact skin. Similarly, the uninvolved skin of patients with DEB displayed increased staphylococcal content and significantly different microbiome diversities (other than staphylococci) compared with HC skin. Conclusions: These findings suggest the existence of a unique DEB-associated skin microbiome signature, which could be targeted by specific pathogen-directed therapies. Moreover, altering the skin microbiome with increasing colonization of bacteria associated with nonchronic wounds may potentially facilitate wound healing in patients with DEB.
Sudre C. H., Keshet A., Graham M. S., Joshi A. D., Shilo S., Rossman H., Murray B., Molten E., Klaser K., Canas L. D., Antonelli M., Nguyen L. H., Drew D. A., Modat M., Pujol J. C., Ganesh S., Wolf J., Meir T., Chan A. T., Steves C. J., Spector T. D., Brownstein J. S., Segal E., Ourselin S. & Astley C. M.
(2021)
The Lancet. Digital health.
3,
9,
p. e577-e586
Background: Multiple voluntary surveillance platforms were developed across the world in response to the COVID-19 pandemic, providing a real-time understanding of population-based COVID-19 epidemiology. During this time, testing criteria broadened and health-care policies matured. We aimed to test whether there were consistent associations of symptoms with SARS-CoV-2 test status across three surveillance platforms in three countries (two platforms per country), during periods of testing and policy changes. Methods: For this observational study, we used data of observations from three volunteer COVID-19 digital surveillance platforms (Carnegie Mellon University and University of Maryland Facebook COVID-19 Symptom Survey, ZOE COVID Symptom Study app, and the Corona Israel study) targeting communities in three countries (Israel, the UK, and the USA; two platforms per country). The study population included adult respondents (age 18100 years at baseline) who were not health-care workers. We did logistic regression of self-reported symptoms on self-reported SARS-CoV-2 test status (positive or negative), adjusted for age and sex, in each of the study cohorts. We compared odds ratios (ORs) across platforms and countries, and we did meta-analyses assuming a random effects model. We also evaluated testing policy changes, COVID-19 incidence, and time scales of duration of symptoms and symptom-to-test time. Findings: Between April 1 and July 31, 2020, 514 459 tests from over 10 million respondents were recorded in the six surveillance platform datasets. Anosmiaageusia was the strongest, most consistent symptom associated with a positive COVID-19 test (robust aggregated rank one, meta-analysed random effects OR 16·96, 95% CI 13·1321·92). Fever (rank two, 6·45, 4·259·81), shortness of breath (rank three, 4·69, 3·147·01), and cough (rank four, 4·29, 3·135·88) were also highly associated with test positivity. The association of symptoms with test status varied by duration of illness, timing of the test, and broader test criteria, as well as over time, by country, and by platform. Interpretation: The strong association of anosmiaageusia with self-reported positive SARS-CoV-2 test was consistently observed, supporting its validity as a reliable COVID-19 signal, regardless of the participatory surveillance platform, country, phase of illness, or testing policy. These findings show that associations between COVID-19 symptoms and test positivity ranked similarly in a wide range of scenarios. Anosmia, fever, and respiratory symptoms consistently had the strongest effect estimates and were the most appropriate empirical signals for symptom-based public health surveillance in areas with insufficient testing or benchmarking capacity. Collaborative syndromic surveillance could enhance real-time epidemiological investigations and public health utility globally. Funding: National Institutes of Health, National Institute for Health Research, Alzheimer's Society, Wellcome Trust, and Massachusetts Consortium on Pathogen Readiness.
Hu L., Illiano P., Pompeii M. L., Popp C. J., Kharmats A. Y., Curran M., Perdomo K., Chen S., Bergman M., Segal E. & Sevick M. A.
(2021)
Contemporary Clinical Trials.
108,
106522.
Objectives: To describe challenges and lessons learned in conducting a remote behavioral weight loss trial. Methods: The Personal Diet Study is an ongoing randomized clinical trial which aims to compare two mobile health (mHealth) weight loss approaches, standardized diet vs. personalized feedback, on glycemic response. Over a six-month period, participants attended dietitian-led group meetings via remote videoconferencing and were encouraged to self-monitor dietary intake using a smartphone app. Descriptive statistics were used to report adherence to counseling sessions and self-monitoring. Challenges were tracked during weekly project meetings. Results: Challenges in connecting to and engaging in the videoconferencing sessions were noted. To address these issues, we provided a step-by-step user manual and video tutorials regarding use of WebEx, encouraged alternative means to join sessions, and sent reminder emails/texts about the WebEx sessions and asking participants to join sessions early. Self-monitoring app-related issue included inability to find specific foods in the app database. To overcome this, the study team incorporated commonly consumed foods as \u201cfavorites\u201d in the app database, provided a manual and video tutorials regarding use of the app and checked the self-monitoring app dashboard weekly to identify nonadherent participants and intervened as appropriate. Among 135 participants included in the analysis, the median attendance rate for the 14 remote sessions was 85.7% (IQR: 64.3%92.9%). Conclusions: Experience and lessons shared in this report may provide critical and timely guidance to other behavioral researchers and interventionists seeking to adapt behavioral counseling programs for remote delivery in the age of COVID-19.
Ben-Yacov O., Godneva A., Rein M., Shilo S., Kolobkov D., Koren N., Cohen Dolev N., Travinsky Shmul T., Wolf B. C., Kosower N., Sagiv K., Lotan-Pompan M., Zmora N., Weinberger A., Elinav E. & Segal E.
(2021)
Diabetes Care.
44,
9,
p. 1980-1991
OBJECTIVE To compare the clinical effects of a personalized postprandial-targeting (PPT) diet versus a Mediterranean (MED) diet on glycemic control and metabolic health in prediabetes. RESEARCH DESIGN AND METHODS We randomly assigned adults with prediabetes (n 5 225) to follow a MED diet or a PPT diet for a 6-month dietary intervention and additional 6-month follow-up. The PPT diet relies on a machine learning algorithm that integrates clinical and microbiome features to predict personal postprandial glucose responses. During the intervention, all participants were connected to continuous glucose monitoring (CGM) and self-reported dietary intake using a smartphone application. RESULTS Among 225 participants randomized (58.7% women, mean ± SD age 50 ± 7 years, BMI 31.3 ± 5.8 kg/m2, HbA1c, 5.9 ± 0.2% [41 ± 2.4 mmol/mol], fasting plasma glucose 114 ± 12 mg/dL [6.33 ± 0.67 mmol/L]), 200 (89%) completed the 6-month intervention. A total of 177 participants also contributed 12-month follow-up data. Both interventions reduced the daily time with glucose levels >140 mg/dL (7.8 mmol/L) and HbA1c levels, but reductions were significantly greater in PPT compared with MED. The mean 6-month change in \u201ctime above 140\u201d was 0.3 ± 0.8 h/day and 1.3 ± 1.5 h/day for MED and PPT, respectively (95% CI between-group difference 1.29 to 0.66, P < 0.001). The mean 6-month change in HbA1c was 0.08 ± 0.19% (0.9 ± 2.1 mmol/ mol) and 0.16 ± 0.24% (1.7 ± 2.6 mmol/mol) for MED and PPT, respectively (95% CI between-group difference 0.14 to 0.02, P 5 0.007). The significant between-group differences were maintained at 12-month follow-up. No significant differences were noted between the groups in a CGM-measured oral glucose tolerance test. CONCLUSIONS In this clinical trial in prediabetes, a PPT diet improved glycemic control significantly more than a MED diet as measured by daily time of glucose levels >140 mg/dL (7.8 mmol/L) and HbA1c. These findings may have implications for dietary advice in clinical practice.
Thomas V., Shelley K., Sigal L., N K. I., Adina W., Cisca W., Jingyuan F., Alexandra Z., K W. R. & Segal E.
(2021)
Nature Medicine.
27,
8,
p. 1442-1450
Serum antibodies can recognize both pathogens and commensal gut microbiota. However, our current understanding of antibody repertoires is largely based on DNA sequencing of the corresponding B-cell receptor genes, and actual bacterial antigen targets remain incompletely characterized. Here we have profiled the serum antibody responses of 997 healthy individuals against 244,000 rationally selected peptide antigens derived from gut microbiota and pathogenic and probiotic bacteria. Leveraging phage immunoprecipitation sequencing (PhIP-Seq) based on phage-displayed synthetic oligo libraries, we detect a wide breadth of individual-specific as well as shared antibody responses against microbiota that associate with age and gender. We also demonstrate that these antibody epitope repertoires are more longitudinally stable than gut microbiome species abundances. Serum samples of more than 200 individuals collected five years apart could be accurately matched and could serve as an immunologic fingerprint. Overall, our results suggest that systemic antibody responses provide a non-redundant layer of information about microbiota beyond gut microbial species composition. Phage immunoprecipitation sequencing illustrates the wide breadth of systemic microbiota-specific antibody responses, which are more longitudinally stable than gut microbiome species abundances in a cohort of healthy individuals.
Klompus S., Leviatan S., Vogl T., Mazor R., Kalka I., Stoler-Barak L., Nathan N., Peres A., Moss L., Godneva A., Kagan Ben Tikva S., Shinar E., Dvashi H. C., Gabizon R., London N., Diskin R., Yaari G., Weinberger A., Shulman Z. & Segal E.
(2021)
Science immunology.
6,
61,
eabe9950.
The spillover of animal coronaviruses (aCoVs) to humans has caused SARS, MERS, and COVID-19. Although antibody responses displaying cross-reactivity between SARS-CoV-2 and seasonal/common cold human coronaviruses (hCoVs) have been reported, potential cross-reactivity with aCoVs and the diagnostic implications are incompletely understood. Here, we probed for antibody binding against all 7 hCoVs and 49 aCoVs represented as 12,924 peptides within a phage-displayed antigen library. Antibody repertoires of 269 recovered patients with COVID-19 showed distinct changes compared with 260 unexposed prepandemic controls, not limited to binding of SARS-CoV-2 antigens but including binding to antigens from hCoVs and aCoVs with shared motifs to SARS-CoV-2. We isolated broadly reactive monoclonal antibodies from recovered patients with COVID-19 who bind a shared motif of SARS-CoV-2, hCoV-OC43, hCoV-HKU1, and several aCoVs, demonstrating that interspecies cross-reactivity can be mediated by a single immunoglobulin. Using antibody binding data against the entire CoV antigen library allowed accurate discrimination of recovered patients with COVID-19 from unexposed individuals by machine learning. Leaving out SARS-CoV-2 antigens and relying solely on antibody binding to other hCoVs and aCoVs achieved equally accurate detection of SARS-CoV-2 infection. The ability to detect SARS-CoV-2 infection without knowledge of its unique antigens solely from cross-reactive antibody responses against other hCoVs and aCoVs suggests a potential diagnostic strategy for the early stage of future pandemics. Creating regularly updated antigen libraries representing the animal coronavirome can provide the basis for a serological assay already poised to identify infected individuals after a future zoonotic transmission event.
Furth N., Shilo S., Cohen N., Erez N., Fedyuk V., Schrager A. M., Weinberger A., Dror A. A., Zigron A., Shehadeh M., Sela E., Srouji S., Amit S., Levy I., Segal E., Dahan R., Jones D., Douek D. C. & Shema E.
(2021)
PLoS ONE.
16,
7,
e0255096.
The COVID-19 pandemic raises the need for diverse diagnostic approaches to rapidly detect different stages of viral infection. The flexible and quantitative nature of single-molecule imaging technology renders it optimal for development of new diagnostic tools. Here we present a proof-of-concept for a single-molecule based, enzyme-free assay for detection of SARS-CoV-2. The unified platform we developed allows direct detection of the viral genetic material from patients' samples, as well as their immune response consisting of IgG and IgM antibodies. Thus, it establishes a platform for diagnostics of COVID-19, which could also be adjusted to diagnose additional pathogens.
Elmaleh D. R., Downey M. A., Kundakovic L., Wilkinson J. E., Neeman Z. & Segal E.
(2021)
Journal of Alzheimer's Disease.
82,
4,
p. 1373-1401
Progressive neurodegenerative diseases represent some of the largest growing treatment challenges for public health in modern society. These diseases mainly progress due to aging and are driven by microglial surveillance and activation in response to changes occurring in the aging brain. The lack of efficacious treatment options for Alzheimers disease (AD), as the focus of this review, and other neurodegenerative disorders has encouraged new approaches to address neuroinflammation for potential treatments. Here we will focus on the increasing evidence that dysbiosis of the gut microbiome is characterized by inflammation that may carry over to the central nervous system and into the brain. Neuroinflammation is the common thread associated with neurodegenerative diseases, but it is yet unknown at what point and how innate immune function turns pathogenic for an individual. This review will address extensive efforts to identify constituents of the gut microbiome and their neuroactive metabolites as a peripheral path to treatment. This approach is still in its infancy in substantive clinical trials and requires thorough human studies to elucidate the metabolic microbiome profile to design appropriate treatment strategies for early stages of neurodegenerative disease. We view that in order to address neurodegenerative mechanisms of the gut, microbiome and metabolite profiles must be determined to pre-screen AD subjects prior to the design of specific, chronic titrations of gut microbiota with low-dose antibiotics. This represents an exciting treatment strategy designed to balance inflammatory microglial involvement in disease progression with an individuals manifestation of AD as influenced by a coercive inflammatory gut.
Rossman H., Shilo S., Meir T., Gorfine M., Shalit U. & Segal E.
(2021)
Nature Medicine.
27,
p. 1055-1061
Studies on the real-life effect of the BNT162b2 vaccine for Coronavirus Disease 2019 (COVID-19) prevention are urgently needed. In this study, we conducted a retrospective analysis of data from the Israeli Ministry of Health collected between 28 August 2020 and 24 February 2021. We studied the temporal dynamics of the number of new COVID-19 cases and hospitalizations after the vaccination campaign, which was initiated on 20 December 2020. To distinguish the possible effects of the vaccination on cases and hospitalizations from other factors, including a third lockdown implemented on 8 January 2021, we performed several comparisons: (1) individuals aged 60 years and older prioritized to receive the vaccine first versus younger age groups; (2) the January lockdown versus the September lockdown; and (3) early-vaccinated versus late-vaccinated cities. A larger and earlier decrease in COVID-19 cases and hospitalization was observed in individuals older than 60 years, followed by younger age groups, by the order of vaccination prioritization. This pattern was not observed in the previous lockdown and was more pronounced in early-vaccinated cities. Our analysis demonstrates the real-life effect of a national vaccination campaign on the pandemic dynamics.
Rossman H., Shilo S., Barbash-Hazan S., Artzi N. S., Hadar E., Balicer R. D., Feldman B., Wiznitzer A. & Segal E.
(2021)
Journal of Pediatrics.
233,
p. 132-140.e1
Objective: To evaluate body mass index (BMI) acceleration patterns in children and to develop a prediction model targeted to identify children at high risk for obesity before the critical time window in which the largest increase in BMI percentile occurs. Study design: We analyzed electronic health records of children from Israel's largest healthcare provider from 2002 to 2018. Data included demographics, anthropometric measurements, medications, diagnoses, and laboratory tests of children and their families. Obesity was defined as BMI ≥95th percentile for age and sex. To identify the time window in which the largest annual increases in BMI z score occurs during early childhood, we first analyzed childhood BMI acceleration patterns among 417 915 adolescents. Next, we devised a model targeted to identify children at high risk before this time window, predicting obesity at 5-6 years of age based on data from the first 2 years of life of 132 262 children. Results: Retrospective BMI analysis revealed that among adolescents with obesity, the greatest acceleration in BMI z score occurred between 2 and 4 years of age. Our model, validated temporally and geographically, accurately predicted obesity at 5-6 years old (area under the receiver operating characteristic curve of 0.803). Discrimination results on subpopulations demonstrated its robustness across the pediatric population. The model's most influential predictors included anthropometric measurements of the child and family. Other impactful predictors included ancestry and pregnancy glucose. Conclusions: Rapid rise in the prevalence of childhood obesity warrant the development of better prevention strategies. Our model may allow an accurate identification of children at high risk of obesity.
By following up the gut microbiome, 51 human phenotypes and plasma levels of 1,183 metabolites in 338 individuals after 4 years, we characterize microbial stability and variation in relation to host physiology. Using these individual-specific and temporally stable microbial profiles, including bacterial SNPs and structural variations, we develop a microbial fingerprinting method that shows up to 85% accuracy in classifying metagenomic samples taken 4 years apart. Application of our fingerprinting method to the independent HMP cohort results in 95% accuracy for samples taken 1 year apart. We further observe temporal changes in the abundance of multiple bacterial species, metabolic pathways, and structural variation, as well as strain replacement. We report 190 longitudinal microbial associations with host phenotypes and 519 associations with plasma metabolites. These associations are enriched for cardiometabolic traits, vitamin B, and uremic toxins. Finally, mediation analysis suggests that the gut microbiome may influence cardiometabolic health through its metabolites.
Levi I., Gurevich M., Perlman G., Magalashvili D., Menascu S., Bar N., Godneva A., Zahavi L., Chermon D., Kosower N., Wolf B. C., Malka G., Lotan-Pompan M., Weinberger A., Yirmiya E., Rothschild D., Leviatan S., Tsur A., Didkin M., Dreyer S., Eizikovitz H., Titngi Y., Mayost S., Sonis P., Dolev M., Stern Y., Achiron A. & Segal E.
(2021)
Cell Reports Medicine.
2,
4,
100246.
Multiple sclerosis (MS) is an immune-mediated disease whose precise etiology is unknown. Several studies found alterations in the microbiome of individuals with MS, but the mechanism by which it may affect MS is poorly understood. Here we analyze the microbiome of 129 individuals with MS and find that they harbor distinct microbial patterns compared with controls. To study the functional consequences of these differences, we measure levels of 1,251 serum metabolites in a subgroup of subjects and unravel a distinct metabolite signature that separates affected individuals from controls nearly perfectly (AUC = 0.97). Individuals with MS are found to be depleted in butyrate-producing bacteria and in bacteria that produce indolelactate, an intermediate in generation of the potent neuroprotective antioxidant indolepropionate, which we found to be lower in their serum. We identify microbial and metabolite candidates that may contribute to MS and should be explored further for their causal role and therapeutic potential.
Levin D., Raab N., Pinto Y., Rothschild D., Zanir G., Godneva A., Mellul N., Futorian D., Gal D., Leviatan S., Zeevi D., Bachelet I. & Segal E.
(2021)
Science.
372,
6539,
eabb5352.
Animals in the wild are able to subsist on pathogen-infected and
poisonous food and show immunity to various diseases. These
characteristics may be contributed largely by the animals microbiota.
However, compared with the human microbiota, which has been extensively
studied, the microbiota of animals in the wild has received less focus.
In this study, we aimed to construct and functionally annotate a
comprehensive database of microbiota sampled from wild animals in their
natural habitats. Several considerations guided our sample collection
and analysis strategy. First, we focused on sampling of animals from the
wild, despite the many challenges that such sampling poses, because
captivity was shown to alter the microbiome of several animal species.
Second, to obtain a broad representation of wild animals, we sampled in
four continents and from a diversity of animals with varied traits and
feeding patterns. We hand-curated traits for each species, including
dietary adaptations, activity hours, and social structures, allowing us
to systematically study the relationships between microbiota composition
and host phenotype. Finally, we adapted a metagenomic genome assembly
pipeline and annotated the assembled genomes taxonomically and
functionally, resulting in a broad collection of genomes that represents
the microbial landscape of wildlife.
Kalaora S., Nagler A., Nejman D., Alon M., Barbolin C., Barnea E., Ketelaars S. L. C., Cheng K., Vervier K., Shental N., Bussi Y., Rotkopf R., Levy R., Benedek G., Trabish S., Dadosh T., Levin-Zaidman S., Geller L. T., Wang K., Greenberg P., Yagel G., Peri A., Fuks G., Bhardwaj N., Reuben A., Hermida L., Johnson S. B., Galloway-Peña J. R., Shropshire W. C., Bernatchez C., Haymaker C., Arora R., Roitman L., Eilam R., Weinberger A., Lotan-Pompan M., Lotem M., Levin Y., Lawley T. D., Adams D. J., Levesque M. P., Besser M. J., Schachter J., Golani O., Segal E., Ruppin E., Kvistborg P., Peterson S. N., Wargo J. A., Straussman R. & Samuels Y.
(2021)
Nature (London).
592,
7852,
p. 138-143
A variety of species of bacteria are known to colonize human tumours111, proliferate within them and modulate immune function, which ultimately affects the survival of patients with cancer and their responses to treatment1214. However, it is not known whether antigens derived from intracellular bacteria are presented by the human leukocyte antigen class I and II (HLA-I and HLA-II, respectively) molecules of tumour cells, or whether such antigens elicit a tumour-infiltrating T cell immune response. Here we used 16S rRNA gene sequencing and HLA peptidomics to identify a peptide repertoire derived from intracellular bacteria that was presented on HLA-I and HLA-II molecules in melanoma tumours. Our analysis of 17 melanoma metastases (derived from 9 patients) revealed 248 and 35 unique HLA-I and HLA-II peptides, respectively, that were derived from 41 species of bacteria. We identified recurrent bacterial peptides in tumours from different patients, as well as in different tumours from the same patient. Our study reveals that peptides derived from intracellular bacteria can be presented by tumour cells and elicit immune reactivity, and thus provides insight into a mechanism by which bacteria influence activation of the immune system and responses to therapy.
Shilo S., Rossman H. & Segal E.
(2021)
Nature Reviews Immunology.
21,
p. 198-199
A rapid coronavirus disease 2019 (COVID-19) vaccination rollout has led Israel to become the country with the highest rate of vaccinated individuals per capita worldwide. Here, we summarize the first signs for the real-world effectiveness and impact of the vaccination campaign.
Rossman H., Meir T., Somer J., Shilo S., Gutman R., Ben Arie A., Segal E., Shalit U. & Gorfine M.
(2021)
Nature Communications.
12,
1,
1904.
The spread of Coronavirus disease 19 (COVID-19) has led to many healthcare systems being overwhelmed by the rapid emergence of new cases. Here, we study the ramifications of hospital load due to COVID-19 morbidity on in-hospital mortality of patients with COVID-19 by analyzing records of all 22,636 COVID-19 patients hospitalized in Israel from mid-July 2020 to mid-January 2021. We show that even under moderately heavy patient load (>500 countrywide hospitalized severely-ill patients; the Israeli Ministry of Health defined 800 severely-ill patients as the maximum capacity allowing adequate treatment), in-hospital mortality rate of patients with COVID-19 significantly increased compared to periods of lower patient load (250-500 severely-ill patients): 14-day mortality rates were 22.1% (Standard Error 3.1%) higher (mid-September to mid-October) and 27.2% (Standard Error 3.3%) higher (mid-December to mid-January). We further show this higher mortality rate cannot be attributed to changes in the patient population during periods of heavier load.
Vogl T., Leviatan S. & Segal E.
(2021)
Cell Reports Medicine.
2,
2,
100191.
Reliable antibody testing against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has the potential to uncover the population-wide spread of coronavirus disease 2019 (COVID-19), which is critical for making informed healthcare and economic decisions. Here we review different types of antibody tests available for SARS-CoV-2 and their application for population-scale testing. Biases because of varying test accuracy, results of ongoing large-scale serological studies, and use of antibody testing for monitoring development of herd immunity are summarized. Although current SARS-CoV-2 antibody testing efforts have generated valuable insights, the accuracy of serological tests and the selection criteria for the tested cohorts need to be evaluated carefully.[Display omitted] Vogl et al. review different types of antibody tests available for SARS-CoV-2 and their application for population-scale testing. Biases because of varying test accuracy, results of ongoing large-scale serological studies, and use of antibody testing for monitoring development of herd immunity are summarized.
Shoer S., Karady T., Keshet A., Shilo S., Rossman H., Gavrieli A., Meir T., Lavon A., Kolobkov D., Kalka I., Godneva A., Cohen O., Kariv A., Hoch O., Zer-Aviv M., Castel N., Sudre C., Zohar A. E., Irony A., Spector T., Geiger B., Hizi D., Shalev V., Balicer R. & Segal E.
(2021)
Med.
2,
2,
p. 196-208.e4
Background: The gold standard for COVID-19 diagnosis is detection of viral RNA through PCR. Due to global limitations in testing capacity, effective prioritization of individuals for testing is essential. Methods: We devised a model estimating the probability of an individual to test positive for COVID-19 based on answers to 9 simple questions that have been associated with SARS-CoV-2 infection. Our model was devised from a subsample of a national symptom survey that was answered over 2 million times in Israel in its first 2 months and a targeted survey distributed to all residents of several cities in Israel. Overall, 43,752 adults were included, from which 498 self-reported as being COVID-19 positive. Findings: Our model was validated on a held-out set of individuals from Israel where it achieved an auROC of 0.737 (CI: 0.7120.759) and auPR of 0.144 (CI: 0.1190.177) and demonstrated its applicability outside of Israel in an independently collected symptom survey dataset from the US, UK, and Sweden. Our analyses revealed interactions between several symptoms and age, suggesting variation in the clinical manifestation of the disease in different age groups. Conclusions: Our tool can be used online and without exposure to suspected patients, thus suggesting worldwide utility in combating COVID-19 by better directing the limited testing resources through prioritization of individuals for testing, thereby increasing the rate at which positive individuals can be identified. Moreover, individuals at high risk for a positive test result can be isolated prior to testing.
To study the effect of host genetics on gut microbiome composition, the MiBioGen consortium curated and analyzed genome-wide genotypes and 16S fecal microbiome data from 18,340 individuals (24 cohorts). Microbial composition showed high variability across cohorts: only 9 of 410 genera were detected in more than 95% of samples. A genome-wide association study of host genetic variation regarding microbial taxa identified 31 loci affecting the microbiome at a genome-wide significant (P
Boddy S. L., Giovannelli I., Sassani M., Cooper-Knock J., Snyder M. P., Segal E., Elinav E., Barker L. A., Shaw P. J. & McDermott C. J.
(2021)
BMC Medicine.
19,
1,
13.
Background: Much progress has been made in mapping genetic abnormalities linked to amyotrophic lateral sclerosis (ALS), but the majority of cases still present with no known underlying cause. Furthermore, even in families with a shared genetic abnormality there is significant phenotypic variability, suggesting that non-genetic elements may modify pathogenesis. Identification of such disease-modifiers is important as they might represent new therapeutic targets. A growing body of research has begun to shed light on the role played by the gut microbiome in health and disease with a number of studies linking abnormalities to ALS. Main body: The microbiome refers to the genes belonging to the myriad different microorganisms that live within and upon us, collectively known as the microbiota. Most of these microbes are found in the intestines, where they play important roles in digestion and the generation of key metabolites including neurotransmitters. The gut microbiota is an important aspect of the environment in which our bodies operate and inter-individual differences may be key to explaining the different disease outcomes seen in ALS. Work has begun to investigate animal models of the disease, and the gut microbiomes of people living with ALS, revealing changes in the microbial communities of these groups. The current body of knowledge will be summarised in this review. Advances in microbiome sequencing methods will be highlighted, as their improved resolution now enables researchers to further explore differences at a functional level. Proposed mechanisms connecting the gut microbiome to neurodegeneration will also be considered, including direct effects via metabolites released into the host circulation and indirect effects on bioavailability of nutrients and even medications. Conclusion: Profiling of the gut microbiome has the potential to add an environmental component to rapidly advancing studies of ALS genetics and move research a step further towards personalised medicine for this disease. Moreover, should compelling evidence of upstream neurotoxicity or neuroprotection initiated by gut microbiota emerge, modification of the microbiome will represent a potential new avenue for disease modifying therapies. For an intractable condition with few current therapeutic options, further research into the ALS microbiome is of crucial importance.
Mizrahi B., Shilo S., Rossman H., Kalkstein N., Marcus K., Barer Y., Keshet A., Shamir-Stein N., Shalev V., Zohar A. E., Chodick G. & Segal E.
(2020)
Nature Communications.
11,
1,
6208.
As the COVID-19 pandemic progresses, obtaining information on symptoms dynamics is of essence. Here, we extracted data from primary-care electronic health records and nationwide distributed surveys to assess the longitudinal dynamics of symptoms prior to and throughout SARS-CoV-2 infection. Information was available for 206,377 individuals, including 2471 positive cases. The two datasources were discordant, with survey data capturing most of the symptoms more sensitively. The most prevalent symptoms included fever, cough and fatigue. Loss of taste and smell 3 weeks prior to testing, either self-reported or recorded by physicians, were the most discriminative symptoms for COVID-19. Additional discriminative symptoms included self-reported headache and fatigue and a documentation of syncope, rhinorrhea and fever. Children had a significantly shorter disease duration. Several symptoms were reported weeks after recovery. By a unique integration of two datasources, our study shed light on the longitudinal course of symptoms experienced by cases in primary care.
Bar N., Korem T., Weissbrod O., Zeevi D., Rothschild Bup D., Peled-Leviatan S., Kosower N., Lotan-Pompan M., Weinberger A. & Segal E.
(2020)
Nature.
588,
7836,
p. 135-140
The serum metabolome contains a plethora of biomarkers and causative agents of various diseases, some of which are endogenously produced and some that have been taken up from the environment1. The origins of specific compounds are known, including metabolites that are highly heritable2,3, or those that are influenced by the gut microbiome4, by lifestyle choices such as smoking5, or by diet6. However, the key determinants of most metabolites are still poorly understood. Here we measured the levels of 1,251 metabolites in serum samples from a unique and deeply phenotyped healthy human cohort of 491 individuals. We applied machine-learning algorithms to predict metabolite levels in held-out individuals on the basis of host genetics, gut microbiome, clinical parameters, diet, lifestyle and anthropometric measurements, and obtained statistically significant predictions for more than 76% of the profiled metabolites. Diet and microbiome had the strongest predictive power, and each explained hundreds of metabolitesin some cases, explaining more than 50% of the observed variance. We further validated microbiome-related predictions by showing a high replication rate in two geographically independent cohorts7,8 that were not available to us when we trained the algorithms. We used feature attribution analysis9 to reveal specific dietary and bacterial interactions. We further demonstrate that some of these interactions might be causal, as some metabolites that we predicted to be positively associated with bread were found to increase after a randomized clinical trial of bread intervention. Overall, our results reveal potential determinants of more than 800 metabolites, paving the way towards a mechanistic understanding of alterations in metabolites under different conditions and to designing interventions for manipulating the levels of circulating metabolites.
When determining whether gut microbes affect human health, it is hard to distinguish between a causal and a correlative relationship. Analysis of microbial links to human traits and habits correlated with disease offers a step forward.
Mars R. A., Yang Y., Ward T., Houtti M., Priya S., Lekatz H. R., Tang X., Sun Z., Kalari K. R., Korem T., Bhattarai Y., Zheng T., Bar N., Frost G., Johnson A. J., van Treuren W., Han S., Ordog T., Grover M., Sonnenburg J., D'Amato M., Camilleri M., Elinav E., Segal E., Blekhman R., Farrugia G., Swann J. R., Knights D. & Kashyap P. C.
(2020)
Cell.
183,
4,
p. 1137-1140
(Cell 182, 14601473.e1e17; September 17, 2020) In preparing the final version of this article, we overlooked some errors and we apologize for these shortcomings. None of these errors involved our data analyses or affected the conclusions presented in the manuscript. These errors have now been corrected online. 1. In Figure 2B the y axis should read \u201clog10 (mg/gram tissue),\u201d not \u201clog10 (mg/gram stool)\u201d as it was labeled originally.2. In Figure 4, panel D was mistakenly labeled as panel \u201cE,\u201d while panel E was mistakenly labeled as panel \u201cD.\u201d3. In the section titled \u201cMicrobiome and Metabolome Data Integrated with Transcriptomic and Epigenetic Differences Reveal Novel Host-Microbiome Interactions in IBS,\u201d we inadvertently wrote \u201cadditional gene-transcript and gene-metabolite associations\u201d which should have been be \u201cgene-microbe and gene-metabolite associations.\u201d4. In the methods section entitled \u201cMulti-omics data integration\u201d we sincorrectly stated outputs from Lasso and stability selection models were inspected and filtered at FDR < 0.1 which should have been FDR < 0.25.5. Finally, the legend for Figure S4F should read \u201chypoxanthine is consistently lower in IBS-C and IBS-D,\u201d instead of \u201chypoxanthine is consistently lower in IBS-C and IBS-C.\u201d
Htet T. D., Godneva A., Liu Z., Chalmers E., Kolobkov D., Snaith J. R., Richens R., Toth K., Danta M., Hng T. M., Elinav E., Segal E., Greenfield J. R. & Samocha-Bonet D.
(2020)
BMJ Open.
10,
10,
e037859.
Introduction Metformin and diets aimed at promoting healthy body weight are the first line in treating type 2 diabetes mellitus (T2DM). Clinical practice, backed by clinical trials, suggests that many individuals do not reach glycaemic targets using this approach alone. The primary aim of the Personalised Medicine in Pre-diabetes-Towards Preventing Diabetes in Individuals at Risk (PREDICT) Study is to test the efficacy of personalised diet as adjuvant to metformin in improving glycaemic control in individuals with dysglycaemia. Methods and analysis PREDICT is a two-Arm, parallel group, single-masked randomised controlled trial in adults with pre-diabetes or early-stage T2DM (with glycated haemoglobin (HbA1c) up to 8.0% (64 mmol/mol)), not treated with glucose-lowering medication. PREDICT is conducted at the Clinical Research Facility at the Garvan Institute of Medical Research (Sydney). Enrolment of participants commenced in December 2018 and expected to complete in December 2021. Participants are commenced on metformin (Extended Release, titrated to a target dose of 1500 mg/day) and randomised with equal allocation to either (1) the Personalised Nutrition Project algorithm-based diet or (2) low-fat high-dietary fibre diet, designed to provide caloric restriction (75%) in individuals with body mass index >25 kg/m 2. Treatment duration is 6 months and participants visit the Clinical Research Facility five times over approximately 7 months. The primary outcome measure is HbA1c. The secondary outcomes are (1) time of interstitial glucose
McGuire A. L., Gabriel S., Tishkoff S. A., Wonkam A., Chakravarti A., Furlong E. E. M., Treutlein B., Meissner A., Chang H. Y., Lopez-Bigas N., Segal E. & Kim J.
(2020)
Nature Reviews Genetics.
21,
10,
p. 581-596
In celebration of the 20th anniversary ofNature Reviews Genetics, we asked 12 leading researchers to reflect on the key challenges and opportunities faced by the field of genetics and genomics. Keeping their particular research area in mind, they take stock of the current state of play and emphasize the work that remains to be done over the next few years so that, ultimately, the benefits of genetic and genomic research can be felt by everyone.
Leshem A., Segal E. & Elinav E.
(2020)
mSystems.
5,
5,
e00665-20.
Nutritional content and timing are increasingly appreciated to constitute important human variables collectively impacting all aspects of human physiology and disease. However, person-specific mechanisms driving nutritional impacts on the human host remain incompletely understood, while current dietary recommendations remain empirical and nonpersonalized. Precision nutrition aims to harness individualized bodies of data, including the human gut microbiome, in predicting person-specific physiological responses (such as glycemic responses) to food. With these advances notwithstanding, many unknowns remain, including the long-term efficacy of such interventions in delaying or reversing human metabolic disease, mechanisms driving these dietary effects, and the extent of the contribution of the gut microbiome to these processes. We summarize these conceptual advances, while highlighting challenges and means of addressing them in the next decade of study of precision medicine, toward generation of insights that may help to evolve precision nutrition as an effective future tool in a variety of \u201cmultifactorial\u201d human disorders.
Mars R. A. T., Ward T., Houtti M., Priya S., Lekatz H. R., Tang X., Sun Z., Kalari K. R., Bhattarai Y., Zheng T., Bar N., Frost G., Johnson A. J., van Treuren W., Han S., Ordog T., Grover M., Sonnenburg J., D'Amato M., Camilleri M., Elinav E., Segal E., Blekhman R., Farrugia G., Swann J. R., Knights D. & Kashyap P. C.
(2020)
Cell.
182,
6,
p. 1460-1473
The gut microbiome has been implicated in multiple human chronic gastrointestinal (GI) disorders. Determining its mechanistic role in disease has been difficult due to apparent disconnects between animal and human studies and lack of an integrated multi-omics view of disease-specific physiological changes. We integrated longitudinal multi-omics data from the gut microbiome, metabolome, host epigenome, and transcriptome in the context of irritable bowel syndrome (IBS) host physiology. We identified IBS subtyp-especific and symptom-related variation in microbial composition and function. A subset of identified changes in microbial metabolites correspond to host physiological mechanisms that are relevant to IBS. By integrating multiple data layers, we identified purine metabolism as a novel host-microbial metabolic pathway in IBS with translational potential. Our study highlights the importance of longitudinal sampling and integrating complementary multi-omics data to identify functional mechanisms that can serve as therapeutic targets in a comprehensive treatment strategy for chronic GI diseases.
Background and Aims: Accurate prediction of glucose levels in patients with type 1 diabetes mellitus (T1DM) is critical both for their glycemic control and for the development of closed-loop systems.Methods: In this study, we utilized real-life, retrospective, continuous glucose monitoring data from 141 T1DM patients (9,083 connection days, 1,592,506 glucose measurements) and in silico data generated by the UVA/Padova T1DM simulator to evaluate different computational methods for glucose prediction. We evaluated the performance of the models using both measures of numerical accuracy, measured by the root mean square error, and clinical accuracy, measured by the percentage of time in each of the Clarke error grid (CEG) zones, and compared the predictions done by autoregressive (AR) models, tree-based methods, artificial neural networks, and a novel model that we devised and optimized for this task.Results: Our novel model, constructed on real-life data, achieved clinical accuracy of 99.3% and 95.8% in predicting the glucose level 30 and 60 min ahead, respectively, and reduced the percentage of glucose predictions in zones C-E of the CEG by 60.6% and 38.4% in these prediction horizons, compared with a standard AR model. The model was superior to all other models across all age groups and achieved higher clinical accuracy in subgroups of patients with high glucose variability and greater time spent in hypoglycemia. Compared with real-life data, when evaluated on in silico data, the model had a higher clinical and numerical accuracy.Conclusions: A model that optimizes for CEG zones may significantly improve clinical accuracy and clinical outcomes of treatment decisions in T1DM patients. Results obtained from simulated data may overestimate the performance of models on real-life data.
Segal E., Zhang F., Lin X., King G., Shalem O., Shilo S., Allen W. E., Alquaddoomi F., Altae-Tran H., Anders S., Balicer R., Bauman T., Bonilla X., Booman G., Chan A. T., Cohen O., Coletti S., Davidson N., Dor Y., Drew D. A., Elemento O., Evans G., Ewels P., Gale J., Gavrieli A., Grad Y. H., Greene C. S., Hajirasouliha I., Jerala R., Kahles A., Kallioniemi O., Keshet A., Kocarev L., Landua G., Meir T., Muller A., Nguyen L. H., Oresic M., Ovchinnikova S., Peterson H., Prodanova J., Rajagopal J., Ratsch G., Rossman H., Rung J., Sboner A., Sigaras A., Spector T., Steinherz R., Stevens I., Vilo J. & Wilmes P.
(2020)
Nature Medicine.
26,
8,
p. 1161-1165
We call upon the research community to standardize efforts to use daily self-reported data about COVID-19 symptoms in the response to the pandemic and to form a collaborative consortium to maximize global gain while protecting participant privacy.
Zuckerman B., Ron M., Mikl M., Segal E. & Ulitsky I.
(2020)
Molecular Cell.
79,
2,
p. 251-267
The core components of the nuclear RNA export pathway are thought to be required for export of virtually all polyadenylated RNAs. Here, we depleted different proteins that act in nuclear export in human cells and quantified the transcriptome-wide consequences on RNA localization. Different genes exhibited substantially variable sensitivities, with depletion of NXF1 and TREX components causing some transcripts to become strongly retained in the nucleus while others were not affected. Specifically, NXF1 is preferentially required for export of single- or few-exon transcripts with long exons or high A/U content, whereas depletion of TREX complex components preferentially affects spliced and G/C-rich transcripts. Using massively parallel reporter assays, we identified short sequence elements that render transcripts dependent on NXF1 for their export and identified synergistic effects of splicing and NXF1. These results revise the current model of how nuclear export shapes the distribution of RNA within human cells.
Mikl M., Pilpel Y. & Segal E.
(2020)
Nature Communications.
11,
1,
3061.
Programmed ribosomal frameshifting (PRF) is the controlled slippage of the translating ribosome to an alternative frame. This process is widely employed by human viruses such as HIV and SARS coronavirus and is critical for their replication. Here, we developed a high-throughput approach to assess the frameshifting potential of a sequence. We designed and tested >12,000 sequences based on 15 viral and human PRF events, allowing us to systematically dissect the rules governing ribosomal frameshifting and discover novel regulatory inputs based on amino acid properties and tRNA availability. We assessed the natural variation in HIV gag-pol frameshifting rates by testing >500 clinical isolates and identified subtype-specific differences and associations between viral load in patients and the optimality of PRF rates. We devised computational models that accurately predict frameshifting potential and frameshifting rates, including subtle differences between HIV isolates. This approach can contribute to the development of antiviral agents targeting PRF.
Nejman D., Livyatan I., Fuks G., Gavert N., Zwang Y., Geller L. T., Rotter-Maskowitz A., Weiser R., Mallel G., Gigi E., Meltser A., Douglas G. M., Kamer I., Gopalakrishnan V., Dadosh T., Levin-Zaidman S., Avnet S., Atlan T., Cooper Z. A., Arora R., Cogdill A. P., Khan M. A. W., Ologun G., Bussi Y., Weinberger A., Lotan-Pompan M., Golani O., Perry G., Rokah M., Bahar-Shany K., Rozeman E. A., Blank C. U., Ronai A., Shaoul R., Amit A., Dorfman T., Kremer R., Cohen Z. R., Harnof S., Siegal T., Yehuda-Shnaidman E., Gal-Yam E. N., Shapira H., Baldini N., Langille M. G. I., Ben-Nun A., Kaufman B., Nissan A., Golan T., Dadiani M., Levanon K., Bar J., Yust-Katz S., Barshack I., Peeper D. S., Raz D. J., Segal E., Wargo J. A., Sandbank J., Shental N. & Straussman R.
(2020)
Science.
368,
6494,
p. 973-980
Bacteria were first detected in human tumors more than 100 years ago, but the characterization of the tumor microbiome has remained challenging because of its low biomass. We undertook a comprehensive analysis of the tumor microbiome, studying 1526 tumors and their adjacent normal tissues across seven cancer types, including breast, lung, ovary, pancreas, melanoma, bone, and brain tumors. We found that each tumor type has a distinct microbiome composition and that breast cancer has a particularly rich and diverse microbiome. The intratumor bacteria are mostly intracellular and are present in both cancer and immune cells. We also noted correlations between intratumor bacteria or their predicted functions with tumor types and subtypes, patients' smoking status, and the response to immunotherapy.
Rossman H., Keshet A., Shilo S., Gavrieli A., Bauman T., Cohen O., Shelly E., Balicer R., Geiger B., Dor Y. & Segal E.
(2020)
Nature Medicine.
26,
5,
p. 634-638
Spokoini-Stern R., Stamov D., Jessel H., Aharoni L., Haschke H., Giron J., Unger R., Segal E., Abu-Horowitz A. & Bachelet I.
(2020)
RNA.
26,
5,
p. 629-636
Long noncoding RNA molecules (lncRNAs) are estimated to account for the majority of eukaryotic genomic transcripts, and have been associated with multiple diseases in humans. However, our understanding of their structure-function relationships is scarce, with structural evidence coming mostly from indirect biochemical approaches or computational predictions. Here we describe direct visualization of the lncRNA HOTAIR (HOx Transcript AntIsense RNA) using atomic force microscopy (AFM) in nucleus-like conditions at 37 degrees. Our observations reveal that HOTAIR has a discernible, although flexible, shape. Fast AFM scanning enabled the quantification of the motion of HOTAIR, and provided visual evidence of physical interactions with genomic DNA segments. Our report provides a biologically plausible description of the anatomy and intrinsic properties of HOTAIR, and presents a framework for studying the structural biology of lncRNAs.
Health data are increasingly being generated at a massive scale, at various levels of phenotyping and from different types of resources. Concurrent with recent technological advances in both data-generation infrastructure and data-analysis methodologies, there have been many claims that these events will revolutionize healthcare, but such claims are still a matter of debate. Addressing the potential and challenges of big data in healthcare requires an understanding of the characteristics of the data. Here we characterize various properties of medical data, which we refer to as 'axes' of data, describe the considerations and tradeoffs taken when such data are generated, and the types of analyses that may achieve the tasks at hand. We then broadly describe the potential and challenges of using big data in healthcare resources, aiming to contribute to the ongoing discussion of the potential of big data resources to advance the understanding of health and disease.
Artzi N. S., Shilo S., Hadar E., Rossman H., Barbash-Hazan S., Ben-Haroush A., Balicer R. D., Feldman B., Wiznitzer A. & Segal E.
(2020)
Nature Medicine.
26,
1,
p. 71-76
Gestational diabetes mellitus (GDM) poses increased risk of short- and long-term complications for mother and offspring(1-4). GDM is typically diagnosed at 24-28 weeks of gestation, but earlier detection is desirable as this may prevent or considerably reduce the risk of adverse pregnancy outcomes(5,6). Here we used a machine-learning approach to predict GDM on retrospective data of 588,622 pregnancies in Israel for which comprehensive electronic health records were available. Our models predict GDM with high accuracy even at pregnancy initiation (area under the receiver operating curve (auROC) = 0.85), substantially outperforming a baseline risk score (auROC = 0.68). We validated our results on both a future validation set and a geographical validation set from the most populated city in Israel, Jerusalem, thereby emulating real-world performance. Interrogating our model, we uncovered previously unreported risk factors, including results of previous pregnancy glucose challenge tests. Finally, we devised a simpler model based on just nine questions that a patient could answer, with only a modest reduction in accuracy (auROC = 0.80). Overall, our models may allow early-stage intervention in high-risk women, as well as a cost-effective screening approach that could avoid the need for glucose tolerance tests by identifying low-risk women. Future prospective studies and studies on additional populations are needed to assess the real-world clinical utility of the model.
Mikl M., Hamburg A., Pilpel Y. & Segal E.
(2019)
Nature Communications.
10,
4572.
Most human genes are alternatively spliced, allowing for a large expansion of the proteome. The multitude of regulatory inputs to splicing limits the potential to infer general principles from investigating native sequences. Here, we create a rationally designed library of >32,000 splicing events to dissect the complexity of splicing regulation through systematic sequence alterations. Measuring RNA and protein splice isoforms allows us to investigate both cause and effect of splicing decisions, quantify diverse regulatory inputs and accurately predict (R-2 = 0.73-0.85) isoform ratios from sequence and secondary structure. By profiling individual cells, we measure the cell-to-cell variability of splicing decisions and show that it can be encoded in the DNA and influenced by regulatory inputs, opening the door for a novel, single-cell perspective on splicing regulation.
Slutskin I. V., Weinberger A. & Segal E.
(2019)
Genome Research.
29,
10,
p. 1635-1647
The cleavage and polyadenylation reaction is a crucial step in transcription termination and pre-mRNA maturation in human cells. Despite extensive research, the encoding of polyadenylation-mediated regulation of gene expression within the DNA sequence is not well understood. Here, we utilized a massively parallel reporter assay to inspect the effect of over 12,000 rationally designed polyadenylation sequences (PASs) on reporter gene expression and cleavage efficiency. We find that the PAS sequence can modulate gene expression by over five orders of magnitude. By using a uniquely designed scanning mutagenesis data set, we gain mechanistic insight into various modes of action by which the cleavage efficiency affects the sensitivity or robustness of the PAS to mutation. Furthermore, we employ motif discovery to identify both known and novel sequence motifs associated with PAS-mediated regulation. By leveraging the large scale of our data, we train a deep learning model for the highly accurate prediction of RNA levels from DNA sequence alone (R = 0.83). Moreover, we devise unique approaches for predicting exact cleavage sites for our reporter constructs and for endogenous transcripts. Taken together, our results expand our understanding of PAS-mediated regulation, and provide an unprecedented resource for analyzing and predicting PAS for regulatory genomics applications.
Blacher E., Bashiardes S., Shapiro H., Rothschild D., Mor U., Dori-Bachash M., Kleimeyer C., Moresi C., Harnik Y., Zur M., Zabari M., Brik R. B., Kviatcovsky D., Zmora N., Cohen Y., Bar N., Levi I., Amar N., Mehlman T., Brandis A., Biton I., Kuperman Y., Tsoory M., Alfahel L., Harmelin A., Schwartz M., Israelson A., Arike L., Johansson M. E. V., Hansson G. C., Gotkine M., Segal E. & Elinav E.
(2019)
Nature.
572,
7770,
p. 474-480
Amyotrophic lateral sclerosis (ALS) is a complex neurodegenerative disorder, in which the clinical manifestations may be influenced by genetic and unknown environmental factors. Here we show that ALS-prone Sod1 transgenic (Sod1-Tg) mice have a pre-symptomatic, vivarium-dependent dysbiosis and altered metabolite configuration, coupled with an exacerbated disease under germ-free conditions or after treatment with broad-spectrum antibiotics. We correlate eleven distinct commensal bacteria at our vivarium with the severity of ALS in mice, and by their individual supplementation into antibiotic-treated Sod1-Tg mice we demonstrate that Akkermansia muciniphila (AM) ameliorates whereas Ruminococcus torques and Parabacteroides distasonis exacerbate the symptoms of ALS. Furthermore, Sod1-Tg mice that are administered AM are found to accumulate AM-associated nicotinamide in the central nervous system, and systemic supplementation of nicotinamide improves motor symptoms and gene expression patterns in the spinal cord of Sod1-Tg mice. In humans, we identify distinct microbiome and metabolite configurations-including reduced levels of nicotinamide systemically and in the cerebrospinal fluid-in a small preliminary study that compares patients with ALS with household controls. We suggest that environmentally driven microbiome-brain interactions may modulate ALS in mice, and we call for similar investigations in the human form of the disease.
Consumption of over-the-counter probiotics for promotion of health and well-being has increased worldwide in recent years. However, although probiotic use has been greatly popularized among the general public, there are conflicting clinical results for many probiotic strains and formulations. Emerging insights from microbiome research enable an assessment of gut colonization by probiotics, strain-level activity, interactions with the indigenous microbiome, safety and impacts on the host, and allow the association of probiotics with physiological effects and potentially useful medical indications. In this Perspective, we highlight key advances, challenges and limitations in striving toward an unbiased interpretation of the large amount of data regarding over-the-counter probiotics, and propose avenues to improve the quality of evidence, transparency, public awareness and regulation of their use.
Zeevi D., Korem T., Godneva A., Bar N., Kurilshikov A., Lotan-Pompan M., Weinberger A., Fu J., Wijmenga C., Zhernakova A. & Segal E.
(2019)
Nature.
568,
7750,
p. 43-48
Differences in the presence of even a few genes between otherwise identical bacterial strains may result in critical phenotypic differences. Here we systematically identify microbial genomic structural variants (SVs) and find them to be prevalent in the human gut microbiome across phyla and to replicate in different cohorts. SVs are enriched for CRISPR-associated and antibiotic-producing functions and depleted from housekeeping genes, suggesting that they have a role in microbial adaptation. We find multiple associations between SVs and host disease risk factors, many of which replicate in an independent cohort. Exploring genes that are clustered in the same SV, we uncover several possible mechanistic links between the microbiome and its host, including a region in Anaerostipes hadrus that encodes a composite inositol catabolism-butyrate biosynthesis pathway, the presence of which is associated with lower host metabolic disease risk. Overall, our results uncover a nascent layer of variability in the microbiome that is associated with microbial adaptation and host health.
Popp C. J., St-Jules D. E., Hu L., Ganguzza L., Illiano P., Curran M., Li H., Schoenthaler A., Bergman M., Schmidt A. M., Segal E., Godneva A. & Sevick M. A.
(2019)
Contemporary Clinical Trials.
79,
p. 80-88
Weight loss reduces the risk of type 2 diabetes mellitus (T2D) in overweight and obese individuals. Although the physiological response to food varies among individuals, standard dietary interventions use a "one-size-fits-all" approach. The Personal Diet Study aims to evaluate two dietary interventions targeting weight loss in people with prediabetes and T2D: (1) a low-fat diet, and (2) a personalized diet using a machine-learning algorithm that predicts glycemic response to meals. Changes in body weight, body composition, and resting energy expenditure will be compared over a 6-month intervention period and a subsequent 6-month observation period intended to assess maintenance effects. The behavioral intervention is delivered via mobile health technology using the Social Cognitive Theory. Here, we describe the design, interventions, and methods used.
Weingarten-Gabbay S., Nir R., Lubliner S., Sharon E., Kalma Y., Weinberger A. & Segal E.
(2019)
Genome Research.
29,
2,
p. 171-183
Despite much research, our understanding of the architecture and cis-regulatory elements of human promoters is still lacking. Here, we devised a high-throughput assay to quantify the activity of approximately 15,000 fully designed sequences that we integrated and expressed from a fixed location within the human genome. We used this method to investigate thousands of native promoters and preinitiation complex (PIC) binding regions followed by in-depth characterization of the sequence motifs underlying promoter activity, including core promoter elements and TF binding sites. We find that core promoters drive transcription mostly unidirectionally and that sequences originating from promoters exhibit stronger activity than those originating from enhancers. By testing multiple synthetic configurations of core promoter elements, we dissect the motifs that positively and negatively regulate transcription as well as the effect of their combinations and distances, including a 10-bp periodicity in the optimal distance between the TATA and the initiator. By comprehensively screening 133 TF binding sites, we find that in contrast to core promoters, TF binding sites maintain similar activity levels in both orientations, supporting a model by which divergent transcription is driven by two distinct unidirectional core promoters sharing bidirectional TF binding sites. Finally, we find a striking agreement between the effect of binding site multiplicity of individual TFs in our assay and their tendency to appear in homotypic clusters throughout the genome. Overall, our study systematically assays the elements that drive expression in core and proximal promoter regions and sheds light on organization principles of regulatory regions in the human genome.
Leviatan S. & Segal E.
(2019)
mSystems.
4,
1,
e00010-19.
Shotgun sequencing of samples taken from the human microbiome often reveals only partial mapping of the sequenced metagenomic reads to existing reference genomes. Such partial mappability indicates that many genomes are missing in our reference genome set. This is particularly true for non-Western populations and for samples that do not originate from the gut. Pasolli et al. (E. Pasolli, F. Asnicar, S. Manara, M. Zolfo, et al., Cell, 2019, https://doi.org/10.1016/j.cell.2019.01.001) perform a grand effort to expand the reference set, and to better classify its members, revealing a wider pangenome of existing species as well as identifying new species of previously unknown taxonomic branches.
Dvir S., Velten L., Sharon E., Zeevi D., Carey L. B., Weinberger A. & Segal E.
(2019)
Proceedings of the National Academy of Sciences of the United States of America.
116,
27,
p. 13701
The authors note that the following statement should be added to the Acknowledgments: \u201cThis work was supported by a grant from the European Research Council (ERC) to E.S.\u201d
Kotler E. & Segal E.
(2018)
Cell.
175,
4,
p. 902-904
Mutation frequencies vary along the genome, but the factors determining this variability are only partially understood. Pich et al. unravel a similar to 10 bp periodicity in mutation rates at nucleosome-proximal regions that follows minor groove orientation. Opposing differential DNA damage and repair processes could shape genetic divergence irrespective of selection.
Zmora N., Zilberman-Schapira G., Suez J., Mor U., Dori-Bachash M., Bashiardes S., Kotler E., Zur M., Regev-Lehavi D., Brik R. B., Federici S., Cohen Y., Linevsky R., Rothschild D., Moor A. E., Ben-Moshe S., Harmelin A., Itzkovitz S., Maharshak N., Shibolet O., Shapiro H., Pevsner-Fischer M., Sharon I., Halpern Z., Segal E. & Elinav E.
(2018)
Cell.
174,
6,
p. 1388-1405.e21
Empiric probiotics are commonly consumed by healthy individuals as means of life quality improvement and disease prevention. However, evidence of probiotic gut mucosal colonization efficacy remains sparse and controversial. We metagenomically characterized the murine and human mucosal-associated gastrointestinal microbiome and found it to only partially correlate with stool microbiome. A sequential invasive multi-omics measurement at baseline and during consumption of an 11-strain probiotic combination or placebo demonstrated that probiotics remain viable upon gastrointestinal passage. In colonized, but not germ-free mice, probiotics encountered a marked mucosal colonization resistance. In contrast, humans featured person-, region- and strain-specific mucosal colonization patterns, hallmarked by predictive baseline host and microbiome features, but indistinguishable by probiotics presence in stool. Consequently, probiotics induced a transient, individualized impact on mucosal community structure and gut transcriptome. Collectively, empiric probiotics supplementation may be limited in universally and persistently impacting the gut mucosa, meriting development of new personalized probiotic approaches.
Suez J., Zmora N., Zilberman-Schapira G., Mor U., Dori-Bachash M., Bashiardes S., Zur M., Regev-Lehavi D., Brik R. B., Federici S., Horn M., Cohen Y., Moor A. E., Zeevi D., Korem T., Kotler E., Harmelin A., Itzkovitz S., Maharshak N., Shibolet O., Pevsner-Fischer M., Shapiro H., Sharon I., Halpern Z., Segal E. & Elinav E.
(2018)
Cell.
174,
6,
p. 1406-1423.e16
Probiotics are widely prescribed for prevention of antibiotics-associated dysbiosis and related adverse effects. However, probiotic impact on post-antibiotic reconstitution of the gut mucosal host-microbiome niche remains elusive. We invasively examined the effects of multi-strain probiotics or autologous fecal microbiome transplantation (aFMT) on post-antibiotic reconstitution of the murine and human mucosal microbiome niche. Contrary to homeostasis, antibiotic perturbation enhanced probiotics colonization in the human mucosa but only mildly improved colonization in mice. Compared to spontaneous post-antibiotic recovery, probiotics induced a markedly delayed and persistently incomplete indigenous stool/mucosal microbiome reconstitution and host transcriptome recovery toward homeostatic configuration, while aFMT induced a rapid and near-complete recovery within days of administration. In vitro, Lactobacillus-secreted soluble factors contributed to probiotics-induced microbiome inhibition. Collectively, potential post-antibiotic probiotic benefits may be offset by a compromised gut mucosal recovery, highlighting a need of developing aFMT or personalized probiotic approaches achieving mucosal protection without compromising microbiome recolonization in the antibiotics-perturbed host. Probiotics perturb rather than aid in microbiota recovery back to baseline after antibiotic treatment in humans.
Kotler E., Segal E. & Oren M.
(2018)
Molecular & cellular oncology.
5,
6,
1511207.
Phenotypic characterization of mutations in the tumor protein p53 (TP53) gene has so far focused on a handful of relatively frequent "hotspot" mutations, accounting for only similar to 30% of cases. We expanded the scope and quantitatively measured the impact of thousands of distinct TP53 mutations in vitro and in vivo, providing insights into the connections between structure, function, evolutionary conservation and clinical impact.
Weissbrod O., Rothschild D., Barkan E. & Segal E.
(2018)
Current Opinion in Microbiology.
44,
p. 9-19
Recent studies indicate that the gut microbiome is partially heritable, motivating the need to investigatem microbiome-host genome associations via microbial genome-wide association studies (mGWAS). Existing mGWAS demonstrate that microbiome host genotype associations are typically weak and are spread across multiple variants, similar to associations often observed in genome-wide association studies (GWAS) of complex traits. Here we reconsider mGWAS by viewing them through the lens of GWAS, and demonstrate that there are striking similarities between the challenges and pitfalls faced by the two study designs. We further advocate the mGWAS community to adopt three key lessons learned over the history of GWAS: firstly, adopting uniform data and reporting formats to facilitate replication and meta-analysis efforts; secondly, enforcing stringent statistical criteria to reduce the number of false positive findings; and thirdly, considering the microbiome and the host genome as distinct entities, rather than studying different taxa and single nucleotide polymorphism (SNPs) separately. Finally, we anticipate that mGWAS sample sizes will have to increase by orders of magnitude to reproducibly associate the host genome with the gut microbiome.
Kotler E., Shani O., Goldfeld G., Lotan-Pompan M., Tarcic O., Gershoni A., Hopf T. A., Marks D. S., Oren M. & Segal E.
(2018)
Molecular Cell.
71,
1,
p. 178-190.e8
The TP53 gene is frequently mutated in human cancer. Research has focused predominantly on six major "hotspot'' codons, which account for only similar to 30% of cancer-associated p53 mutations. To comprehensively characterize the consequences of the p53 mutation spectrum, we created a synthetically designed library and measured the functional impact of similar to 10,000 DNA-binding domain (DBD) p53 variants in human cells in culture and in vivo. Our results highlight the differential outcome of distinct p53 mutations in human patients and elucidate the selective pressure driving p53 conservation throughout evolution. Furthermore, while loss of anti-proliferative functionality largely correlates with the occurrence of cancer-associated p53 mutations, we observe that selective gain-of-function may further favor particular mutants in vivo. Finally, when combined with additional acquired p53 mutations, seemingly neutral TP53 SNPs may modulate phenotypic outcome and, presumably, tumor progression.
Abelson S., Collord G., Ng S. W. K., Weissbrod O., Mendelson Cohen N., Niemeyer E., Barda N., Zuzarte P. C., Heisler L., Sundaravadanam Y., Luben R., Hayat S., Wang T. T., Zhao Z., Cirlan I., Pugh T. J., Soave D., Ng K., Latimer C., Hardy C., Raine K., Jones D., Hoult D., Britten A., McPherson J. D., Johansson M., Mbabaali F., Eagles J., Miller J. K., Pasternack D., Timms L., Krzyzanowski P., Awadalla P., Costa R., Segal E., Bratman S. V., Beer P., Behjati S., Martincorena I., Wang J. C. Y., Bowles K. M., Quirós J. R., Karakatsani A., La Vecchia C., Trichopoulou A., Salamanca-Fernández E., Huerta J. M., Barricarte A., Travis R. C., Tumino R., Masala G., Boeing H., Panico S., Kaaks R., Krämer A., Sieri S., Riboli E., Vineis P., Foll M., McKay J., Polidoro S., Sala N., Khaw K., Vermeulen R., Campbell P. J., Papaemmanuil E., Minden M. D., Tanay A., Balicer R. D., Wareham N. J., Gerstung M., Dick J. E., Brennan P., Vassiliou G. S. & Shlush L. I.
(2018)
Nature.
559,
7714,
p. 400-404
The incidence of acute myeloid leukaemia (AML) increases with age and mortality exceeds 90% when diagnosed after age 65. Most cases arise without any detectable early symptoms and patients usually present with the acute complications of bone marrow failure1. The onset of such de novo AML cases is typically preceded by the accumulation of somatic mutations in preleukaemic haematopoietic stem and progenitor cells (HSPCs) that undergo clonal expansion2,3. However, recurrent AML mutations also accumulate in HSPCs during ageing of healthy individuals who do not develop AML, a phenomenon referred to as age-related clonal haematopoiesis (ARCH)4-8. Here we use deep sequencing to analyse genes that are recurrently mutated in AML to distinguish between individuals who have a high risk of developing AML and those with benign ARCH. We analysed peripheral blood cells from 95 individuals that were obtained on average 6.3 years before AML diagnosis (pre-AML group), together with 414 unselected age- and gender-matched individuals (control group). Pre-AML cases were distinct from controls and had more mutations per sample, higher variant allele frequencies, indicating greater clonal expansion, and showed enrichment of mutations in specific genes. Genetic parameters were used to derive a model that accurately predicted AML-free survival; this model was validated in an independent cohort of 29 pre-AML cases and 262 controls. Because AML is rare, we also developed an AML predictive model using a large electronic health record database that identified individuals at greater risk. Collectively our findings provide proof-of-concept that it is possible to discriminate ARCH from pre-AML many years before malignant transformation. This could in future enable earlier detection and monitoring, and may help to inform intervention.
Wang J., Kurilshikov A., Radjabzadeh D., Turpin W., Croitoru K., Bonder M. J., Jackson M. A., Medina-Gomez C., Frost F., Homuth G., Rühlemann M., Hughes D., Kim H. N., Spector T. D., Bell J. T., Steves C. J., Timpson N., Franke A., Wijmenga C., Meyer K., Kacprowski T., Franke L., Paterson A. D., Raes J., Kraaij R., Zhernakova A., Ahluwalia T., Barkan E., Bedrani L., Bisgaard H., Boehnke M., Bønnelykke K., Boomsma D. I., Croitoru K., Davies G. E., Geus E. d., Degenhardt F., Damato M., Ehli E. A., Espin-Garcia O., Finnicum C. T., Fornage M., Frost F., Fu J., Heinsen F. A., Homuth G., Ijzerman R., Jackson M. A., Jessen L. E., Jonkers D., Kacprowski T., Kim H. L., Kraaij R., Laakso M., Launer L., Lerch M. M., Lüll K., Lusis A. J., Mangino M., Mayerle J., Mbarek H., Medina M. C., Meyer K., Mohlke K. L., Org E., Paterson A., Payami H., Radjabzadeh D., Raes J., Rothschild D., Rühle-Mann M., Sanna S., Segal E., Shah S., Smith M., Stokholm J., Szopinska J. W., Thorsen J., Timpson N., Turpin W., Uit-Terlinden A. G., Vasquez A. A., Völzke H., Vosa U., Wallen Z., Wang J., Weiss F. U., Weissbrod O., Wijmenga C., Willemsen G., Xu W. & Yun Y.
(2018)
Microbiome.
6,
1,
101.
Background: In recent years, human microbiota, especially gut microbiota, have emerged as an important yet complex trait influencing human metabolism, immunology, and diseases. Many studies are investigating the forces underlying the observed variation, including the human genetic variants that shape human microbiota. Several preliminary genome-wide association studies (GWAS) have been completed, but more are necessary to achieve a fuller picture. Results: Here, we announce the MiBioGen consortium initiative, which has assembled 18 population-level cohorts and some 19,000 participants. Its aim is to generate new knowledge for the rapidly developing field of microbiota research. Each cohort has surveyed the gut microbiome via 16S rRNA sequencing and genotyped their participants with full-genome SNP arrays. We have standardized the analytical pipelines for both the microbiota phenotypes and genotypes, and all the data have been processed using identical approaches. Our analysis of microbiome composition shows that we can reduce the potential artifacts introduced by technical differences in generating microbiota data. We are now in the process of benchmarking the association tests and performing meta-analyses of genome-wide associations. All pipeline and summary statistics results will be shared using public data repositories. Conclusion: We present the largest consortium to date devoted to microbiota-GWAS. We have adapted our analytical pipelines to suit multi-cohort analyses and expect to gain insight into host-microbiota cross-talk at the genome-wide level. And, as an open consortium, we invite more cohorts to join us (by contacting one of the corresponding authors) and to follow the analytical pipeline we have developed.
Bashiardes S., Godneva A., Elinav E. & Segal E.
(2018)
Current Opinion in Biotechnology.
51,
p. 57-63
Generalized dietary and lifestyle guidelines have been formulated and published for decades now from a variety of relevant agencies in an attempt to guide people towards healthy choices. As the pandemic rise in metabolic diseases continues to increase, it has become clear that the one-fit-forall diet approach does not work and that there is a significant variation in inter-individual responses to diet and lifestyle interventions. Recent technological advances have given an unprecedented insight into the sources of this variation, pointing towards our genome and microbiome as potentially and previously under-explored culprits contributing to individually unique dietary responses. Variations in our genome influence the bioavailability and metabolism of nutrients between individuals, while inter-individual compositional variation of commensal gut microbiota leads to different microbe functional potential, metabolite production and metabolism modulation. Quantifying and incorporating these factors into a comprehensive personalized nutrition approach may enable practitioners to rationally incorporate individual nutritional recommendations in combating the metabolic syndrome pandemic.
Rothschild D., Weissbrod O., Barkan E., Kurilshikov A., Korem T., Zeevi D., Costea P. I., Godneva A., Kalka I. N., Bar N., Shilo S., Lador D., Vila A. V., Zmora N., Pevsner-Fischer M., Israeli D., Kosower N., Malka G., Wolf B. C., Avnit-Sagi T., Lotan-Pompan M., Weinberger A., Halpern Z., Carmi S., Fu J., Wijmenga C., Zhernakova A., Elinav E. & Segal E.
(2018)
Nature.
555,
7695,
p. 210-215
Human gut microbiome composition is shaped by multiple factors but the relative contribution of host genetics remains elusive. Here we examine genotype and microbiome data from 1,046 healthy individuals with several distinct ancestral origins who share a relatively common environment, and demonstrate that the gut microbiome is not significantly associated with genetic ancestry, and that host genetics have a minor role in determining microbiome composition. We show that, by contrast, there are significant similarities in the compositions of the microbiomes of genetically unrelated individuals who share a household, and that over 20% of the inter-person microbiome variability is associated with factors related to diet, drugs and anthropometric measurements. We further demonstrate that microbiome data significantly improve the prediction accuracy for many human traits, such as glucose and obesity measures, compared to models that use only host genetic and environmental data. These results suggest that microbiome alterations aimed at improving clinical outcomes may be carried out across diverse genetic backgrounds.
Slutskin I. V., Weingarten-Gabbay S., Nir R., Weinberger A. & Segal E.
(2018)
Nature Communications.
9,
1,
529.
Despite extensive research, the sequence features affecting microRNA-mediated regulation are not well understood, limiting our ability to predict gene expression levels in both native and synthetic sequences. Here we employed a massively parallel reporter assay to investigate the effect of over 14,000 rationally designed 3' UTR sequences on reporter construct repression. We found that multiple factors, including microRNA identity, hybridization energy, target accessibility, and target multiplicity, can be manipulated to achieve a predictable, up to 57-fold, change in protein repression. Moreover, we predict protein repression and RNA levels with high accuracy (R = 0.84 and R = 0.80, respectively) using only 3' UTR sequence, as well as the effect of mutation in native 3' UTRs on protein repression (R = 0.63). Taken together, our results elucidate the effect of different sequence features on miRNA-mediated regulation and demonstrate the predictability of their effect on gene expression with applications in regulatory genomics and synthetic biology.
Sherf-Dagan S., Zelber-Sagi S., Zilberman-Schapira G., Webb M., Buch A., Keidar A., Raziel A., Sakran N., Goitein D., Goldenberg N., Mahdi J. A., Pevsner-Fischer M., Zmora N., Dori-Bachash M., Segal E., Elinav E. & Shibolet O.
(2018)
International Journal of Obesity.
42,
2,
p. 147-155
BACKGROUND: Probiotics are commonly used after bariatric surgery; however, uncertainty remains regarding their efficacy. Our aim was to compare the effect of probiotics vs placebo on hepatic, inflammatory and clinical outcomes following laparoscopic sleeve gastrectomy (LSG).METHODS: This randomized, double-blind, placebo-controlled, trial of 6-month treatment with probiotics (Bio-25; Supherb) vs placebo and 6 months of additional follow-up was conducted among 100 morbidly obese nonalcoholic fatty liver disease (NAFLD) patients who underwent LSG surgery. The primary outcome was a reduction in liver fat content, measured by abdominal ultrasound, and secondary outcomes were improvement of fibrosis, measured by shear-wave elastography, metabolic and inflammatory parameters, anthropometrics and quality of life (QOL). Fecal samples were collected and analyzed for microbial composition.RESULTS: One hundred patients (60% women, mean age of 41.9 +/- 9.8 years and body mass index of 42.3 +/- 4.7 kg m(-2)) were randomized, 80% attended the 6-month visit and 77% completed the 12-month follow-up. Fat content and NAFLD remission rate were similarly reduced in the probiotics and placebo groups at 6 months postsurgery (-0.9 +/- 0.5 vs -0.7 +/- 0.4 score; P = 0.059 and 52.5 vs 40%; P = 0.262, respectively) and at 12 months postsurgery. Fibrosis, liver-enzymes, C-reactive protein (CRP), leptin and cytokeratin-18 levels were significantly reduced and QOL significantly improved within groups (P = 0.173 for all) at 6 and 12 months postsurgery. Within-sample microbiota diversity (alpha-diversity) increased at 6-month postsurgery compared with baseline in both study arms (PCONCLUSIONS: Probiotics administration does not improve hepatic, inflammatory and clinical outcomes 6-and 12 months post-LSG.
The genomic revolution promises to transform our approach to treat patients by individualizing treatments, reducing adverse events, and decreasing health care costs. The early advances using this have been realized primarily by optimizing preventive and therapeutic approaches in cancer using human genome sequencing. The ability to characterize the microbiome, which includes all the microbes that reside within and upon us and all their genetic elements, using next-generation sequencing allows us to now incorporate this important contributor to human disease into developing new preventive and therapeutic strategies. In this review we highlight the importance of the microbiome in all aspects of human disease, including pathogenesis, phenotype, prognosis, and response to treatment, as well as their role as diagnostic and therapeutic biomarkers. We provide a role for next-generation sequencing in both precise microbial identification of infectious diseases and characterization of microbial communities and their function. Taken together, the microbiome is emerging as an integral part of precision medicine approach as it not only contributes to interindividual variability in all aspects of a disease but also represents a potentially modifiable factor that is amenable to targeting by therapeutics. (C) 2017 Mayo Foundation for Medical Education and Research
Gritsenko A. A., Weingarten-Gabbay S., Elias-Kirma S., Nir R., De Ridder R. D. & Segal E.
(2017)
PLoS Computational Biology.
13,
9,
e1005734.
Translation of mRNAs through Internal Ribosome Entry Sites (IRESs) has emerged as a prominent mechanism of cellular and viral initiation. It supports cap-independent translation of select cellular genes under normal conditions, and in conditions when cap-dependent translation is inhibited. IRES structure and sequence are believed to be involved in this process. However due to the small number of IRESs known, there have been no systematic investigations of the determinants of IRES activity. With the recent discovery of thousands of novel IRESs in human and viruses, the next challenge is to decipher the sequence determinants of IRES activity. We present the first in-depth computational analysis of a large body of IRESs, exploring RNA sequence features predictive of IRES activity. We identified predictive k-mer features resembling IRES trans-acting factor (ITAF) binding motifs across human and viral IRESs, and found that their effect on expression depends on their sequence, number and position. Our results also suggest that the architecture of retroviral IRESs differs from that of other viruses, presumably due to their exposure to the nuclear environment. Finally, we measured IRES activity of synthetically designed sequences to confirm our prediction of increasing activity as a function of the number of short IRES elements.
Korem T., Zeevi D., Zmora N., Weissbrod O., Bar N., Lotan-Pompan M., Avnit Sagi S. T., Kosower N., Malka G., Rein M., Suez J., Goldberg B. Z., Weinberger A., Levy A., Elinav E. & Segal E.
(2017)
Cell Metabolism.
25,
6,
p. 1243-1253
Bread is consumed daily by billions of people, yet evidence regarding its clinical effects is contradicting. Here, we performed a randomized crossover trial of two 1-week-long dietary interventions comprising consumption of either traditionally made sourdough-leavened whole-grain bread or industrially made white bread. We found no significant differential effects of bread type on multiple clinical parameters. The gut microbiota composition remained person specific throughout this trial and was generally resilient to the intervention. We demonstrate statistically significant interpersonal variability in the glycemic response to different bread types, suggesting that the lack of phenotypic difference between the bread types stems from a person-specific effect. We further show that the type of bread that induces the lower glycemic response in each person can be predicted based solely on microbiome data prior to the intervention. Together, we present marked personalization in both bread metabolism and the gut microbiome, suggesting that understanding dietary effects requires integration of person-specific factors.
Rowan S., Jiang S., Korem T., Szymanski J., Chang M., Szelog J., Cassalman C., Dasuri K., McGuire C., Nagai R., Du X., Brownlee M., Rabbani N., Thornalley P. J., Baleja J. D., Deik A. A., Pierce K. A., Scott J. M., Clish C. B., Smith D. E., Weinberger A., Avnit Sagi T., Lotan-Pompan M., Segal E. & Taylor A.
(2017)
Proceedings of the National Academy of Sciences of the United States of America.
114,
22,
p. E4472-E4481
Age-related macular degeneration (AMD) is the major cause of blindness in developed nations. AMD is characterized by retinal pigmented epithelial (RPE) cell dysfunction and loss of photoreceptor cells. Epidemiologic studies indicate important contributions of dietary patterns to the risk for AMD, but the mechanisms relating diet to disease remain unclear. Here we investigate the effect on AMD of isocaloric diets that differ only in the type of dietary carbohydrate in a wild-type aged-mouse model. The consumption of a high-glycemia (HG) diet resulted in many AMD features (AMDf), including RPE hypopigmentation and atrophy, lipofuscin accumulation, and photoreceptor degeneration, whereas consumption of the lower-glycemia (LG) diet did not. Critically, switching from the HG to the LG diet late in life arrested or reversed AMDf. LG diets limited the accumulation of advanced glycation end products, long-chain polyunsaturated lipids, and their peroxidation end-products and increased C3-carnitine in retina, plasma, or urine. Untargeted metabolomics revealed microbial cometabolites, particularly serotonin, as protective against AMDf. Gut microbiota were responsive to diet, and we identified microbiota in the Clostridiales order as being associated with AMDf and the HG diet, whereas protection from AMDf was associated with the Bacteroidales order and the LG diet. Network analysis revealed a nexus of metabolites and microbiota that appear to act within a gut-retina axis to protect against diet- A nd age-induced AMDf. The findings indicate a functional interaction between dietary carbohydrates, the metabolome, including microbial cometabolites, and AMDf. Our studies suggest a simple dietary intervention that may be useful in patients to arrest AMD.
Levo M., Avnit-Sagi T., Lotan-Pompan M., Kalma Y., Weinberger A., Yakhini Z. & Segal E.
(2017)
Molecular Cell.
65,
4,
p. 604-+
Precise gene expression patterns are established by transcription factor (TFs) binding to regulatory sequences. While these events occur in the context of chromatin, our understanding of how TF-nucleosome interplay affects gene expression is highly limited. Here, we present an assay for high-resolution measurements of both DNA occupancy and gene expression on large-scale libraries of systematically designed regulatory sequences. Our assay reveals occupancy patterns at the single-cell level. It provides an accurate quantification of the fraction of the population bound by a nucleosome and captures distinct, even adjacent, TF binding events. By applying this assay to over 1,500 promoter variants in yeast, we reveal pronounced differences in the dependency of TF activity on chromatin and classify TFs by their differential capacity to alter chromatin and promote expression. We further demonstrate how different regulatory sequences give rise to nucleosome-mediated TF collaborations that quantitatively account for the resulting expression.
Van Dijk D. D., Sharon E., Lotan-Pompan M., Weinberger A., Segal E. & Carey L.
(2017)
Genome Research.
27,
1,
p. 87-94
Transcription factors (TFs) are key mediators that propagate extracellular and intracellular signals through to changes in gene expression profiles. However, the rules by which promoters decode the amount of active TF into target gene expression are not well understood. To determine the mapping between promoter DNA sequence, TF concentration, and gene expression output, we have conducted in budding yeast a large-scale measurement of the activity of thousands of designed promoters at six different levels of TF. We observe that maximum promoter activity is determined by TF concentration and not by the number of binding sites. Surprisingly, the addition of an activator site often reduces expression. A thermodynamic model that incorporates competition between neighboring binding sites for a local pool of TF molecules explains this behavior and accurately predicts both absolute expression and the amount by which addition of a site increases or reduces expression. Taken together, our findings support a model in which neighboring binding sites interact competitively when TF is limiting but otherwise act additively.
Thaiss C. A., Itav S., Rothschild D., Meijer M. T., Levy M., Moresi C., Dohnalová L., Braverman S., Rozin S., Malitsky S., Dori-Bachash M., Kuperman Y., Biton I., Gertler A., Harmelin A., Shapiro H., Halpern Z., Aharoni A., Segal E. & Elinav E.
(2016)
Nature.
540,
7634,
p. 544-551
In tackling the obesity pandemic, considerable efforts are devoted to the development of effective weight reduction strategies, yet many dieting individuals fail to maintain a long-term weight reduction, and instead undergo excessive weight regain cycles. The mechanisms driving recurrent post-dieting obesity remain largely elusive. Here we identify an intestinal microbiome signature that persists after successful dieting of obese mice and contributes to faster weight regain and metabolic aberrations upon re-exposure to obesity-promoting conditions. Faecal transfer experiments show that the accelerated weight regain phenotype can be transmitted to germ-free mice. We develop a machine-learning algorithm that enables personalized microbiome-based prediction of the extent of post-dieting weight regain. Additionally, we find that the microbiome contributes to diminished post-dieting flavonoid levels and reduced energy expenditure, and demonstrate that flavonoid-based â post-biotic' intervention ameliorates excessive secondary weight gain. Together, our data highlight a possible microbiome contribution to accelerated post-dieting weight regain, and suggest that microbiome-targeting approaches may help to diagnose and treat this common disorder.
Thaiss C., Levy M., Korem T., Dohnalova L., Shapiro H., Jaitin D., David E., Winter D., Gury-BenAri M., Tatirovsky E., Tuganbaev T., Federici S., Zmora N., Zeevi D., Dori-Bachash M., Pevsner-Fischer M., Kartvelishvily E., Brandis A., Harmelin A., Shibolet O., Halpern Z., Honda K., Amit I., Segal E. & Elinav E.
(2016)
Cell.
167,
6,
p. 1495-1510.e12
The intestinal microbiota undergoes diurnal compositional and functional oscillations that affect metabolic homeostasis, but the mechanisms by which the rhythmic microbiota influences host circadian activity remain elusive. Using integrated multi-omics and imaging approaches, we demonstrate that the gut microbiota features oscillating biogeographical localization and metabolome patterns that determine the rhythmic exposure of the intestinal epithelium to different bacterial species and their metabolites over the course of a day. This diurnal microbial behavior drives, in turn, the global programming of the host circadian transcriptional, epigenetic, and metabolite oscillations. Surprisingly, disruption of homeostatic microbiome rhythmicity not only abrogates normal chromatin and transcriptional oscillations of the host, but also incites genome-wide de novo oscillations in both intestine and liver, thereby impacting diurnal fluctuations of host physiology and disease susceptibility. As such, the rhythmic biogeography and metabolome of the intestinal microbiota regulates the temporal organization and functional outcome of host transcriptional and epigenetic programs.
Keren L., Hausser J., Lotan-Pompan M., Vainberg Slutskin I., Alisar H., Kaminski S., Weinberger A., Alon U., Milo R. & Segal E.
(2016)
Cell.
166,
5,
p. 1282-1294.e18
Data of gene expression levels across individuals, cell types, and disease states is expanding, yet our understanding of how expression levels impact phenotype is limited. Here, we present a massively parallel system for assaying the effect of gene expression levels on fitness in Saccharomyces cerevisiae by systematically altering the expression level of 100 genes at 100 distinct levels spanning a 500-fold range at high resolution. We show that the relationship between expression levels and growth is gene and environment specific and provides information on the function, stoichiometry, and interactions of genes. Wild-type expression levels in some conditions are not optimal for growth, and genes whose fitness is greatly affected by small changes in expression level tend to exhibit lower cell-to-cell variability in expression. Our study addresses a fundamental gap in understanding the functional significance of gene expression regulation and offers a framework for evaluating the phenotypic effects of expression variation.
Barenholz U., Keren L., Segal E. & Milo R.
(2016)
PLoS ONE.
11,
4,
0153344.
Most proteins show changes in level across growth conditions. Many of these changes seem to be coordinated with the specific growth rate rather than the growth environment or the protein function. Although cellular growth rates, gene expression levels and gene regulation have been at the center of biological research for decades, there are only a few models giving a base line prediction of the dependence of the proteome fraction occupied by a gene with the specific growth rate. We present a simple model that predicts a widely coordinated increase in the fraction of many proteins out of the proteome, proportionally with the growth rate. The model reveals how passive redistribution of resources, due to active regulation of only a few proteins, can have proteome wide effects that are quantitatively predictable. Our model provides a potential explanation for why and how such a coordinated response of a large fraction of the proteome to the specific growth rate arises under different environmental conditions. The simplicity of our model can also be useful by serving as a baseline null hypothesis in the search for active regulation. We exemplify the usage of the model by analyzing the relationship between growth rate and proteome composition for the model microorganism E.coli as reflected in recent proteomics data sets spanning various growth conditions. We find that the fraction out of the proteome of a large number of proteins, and from different cellular processes, increases proportionally with the growth rate. Notably, ribosomal proteins, which have been previously reported to increase in fraction with growth rate, are only a small part of this group of proteins. We suggest that, although the fractions of many proteins change with the growth rate, such changes may be partially driven by a global effect, not necessarily requiring specific cellular control mechanisms.
Zeevi D., Korem T. & Segal E.
(2016)
GENOME BIOLOGY.
17,
1,
50.
A report on the first EMBO conference entitled "Next Gen Immunology-From Host Genome to the Microbiome: Immunity in the Genomic Era", held at the Weizmann Institute of Science, Israel, 14-16 February, 2016.
Qu K., Garamszegi S., Wu F., Thorvaldsdottir H., Liefeld T., Ocana M., Borges-Rivera D., Pochet N., Robinson J. T., Demchak B., Hull T., Ben-Artzi G., Blankenberg D., Barber G. P., Lee B. T., Kuhn R. M., Nekrutenko A., Segal E., Ideker T., Reich M., Regev A., Y Chang H. & Mesirov J. P.
(2016)
Nature Methods.
13,
3,
p. 245-247
Complex biomedical analyses require the use of multiple software tools in concert and remain challenging for much of the biomedical research community. We introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource that currently supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate integrative analysis by non-programmers, it offers a growing set of 'recipes', short workflows to guide investigators through high-utility analysis tasks.
Weingarten-Gabbay S., Elias-Kirma S., Nir R., Gritsenko A. A., Stern-Ginossar N., Yakhini Z., Weinberger A. & Segal E.
(2016)
Science.
351,
6270,
aad4939.
To investigate gene specificity at the level of translation in both the human genome and viruses, we devised a high-throughput bicistronic assay to quantify cap-independent translation.We uncovered thousands of novel cap-independent translation sequences, and we provide insights on the landscape of translational regulation in both humans and viruses.We find extensive translational elements in the 3? untranslated region of human transcripts and the polyprotein region of uncapped RNA viruses. Through the characterization of regulatory elements underlying cap-independent translation activity, we identify potential mechanisms of secondary structure, short sequence motif, and base pairing with the 18S ribosomal RNA (rRNA). Furthermore, we systematically map the 18S rRNA regions for which reverse complementarity enhances translation. Thus, we make available insights into the mechanisms of translational control in humans and viruses.
Zmora N., Zeevi D., Korem T., Segal E. & Elinav E.
(2016)
Cell Host & Microbe.
19,
1,
p. 12-20
The genomic revolution enabled the clinical inclusion of an immense body of person-specific information to an extent that is revolutionizing medicine and science. The gut microbiome, our "second genome," dynamically integrates signals from the host and its environment, impacting health and risk of disease. Herein, we summarize how individualized characterization of the microbiome composition and function may assist in personalized diagnostic assessment, risk stratification, disease prevention, treatment decision-making, and patients' follow up. We further discuss the limitations, pitfalls, and challenges that the microbiome field faces in integrating patient-specific microbial data into the clinical realm. Finally, we highlight how recent insights into personalized modulation of the microbiome, by nutritional and pre-, pro-, and post-biotic intervention, may lead to development of individualized approaches that may enable us to harness the microbiome as a central precision medicine target.
Levy M., Thaiss C. A., Zeevi D., Dohnalova L., Zilberman-Schapira G., Mahdi J. A., David E., Savidor A., Korem T., Herzig Y., Pevsner-Fischer M., Shapiro H., Christ A., Harmelin A., Halpern Z., Latz E., Flavell R. A., Amit I., Segal E. & Elinav E.
(2015)
Cell.
163,
6,
p. 1428-1443
Summary Host-microbiome co-evolution drives homeostasis and disease susceptibility, yet regulatory principles governing the integrated intestinal host-commensal microenvironment remain obscure. While inflammasome signaling participates in these interactions, its activators and microbiome-modulating mechanisms are unknown. Here, we demonstrate that the microbiota-associated metabolites taurine, histamine, and spermine shape the host-microbiome interface by co-modulating NLRP6 inflammasome signaling, epithelial IL-18 secretion, and downstream anti-microbial peptide (AMP) profiles. Distortion of this balanced AMP landscape by inflammasome deficiency drives dysbiosis development. Upon fecal transfer, colitis-inducing microbiota hijacks this microenvironment-orchestrating machinery through metabolite-mediated inflammasome suppression, leading to distorted AMP balance favoring its preferential colonization. Restoration of the metabolite-inflammasome-AMP axis reinstates a normal microbiota and ameliorates colitis. Together, we identify microbial modulators of the NLRP6 inflammasome and highlight mechanisms by which microbiome-host interactions cooperatively drive microbial community stability through metabolite-mediated innate immune modulation. Therefore, targeted "postbiotic" metabolomic intervention may restore a normal microenvironment as treatment or prevention of dysbiosis-driven diseases.
Zeevi D., Korem T., Zmora N., Israeli D., Rothschild D., Weinberger A., Ben-Yacov O., Lador D., Avnit Sagi S. T., Lotan-Pompan M., Suez J., Mahdi J. A., Matot E., Malka G., Kosower N., Rein M., Zilberman-Schapira G., Dohnalova L., Pevsner-Fischer M., Bikovsky R., Halpern Z., Elinav E. & Segal E.
(2015)
Cell.
163,
5,
p. 1079-1094
Summary Elevated postprandial blood glucose levels constitute a global epidemic and a major risk factor for prediabetes and type II diabetes, but existing dietary methods for controlling them have limited efficacy. Here, we continuously monitored week-long glucose levels in an 800-person cohort, measured responses to 46,898 meals, and found high variability in the response to identical meals, suggesting that universal dietary recommendations may have limited utility. We devised a machine-learning algorithm that integrates blood parameters, dietary habits, anthropometrics, physical activity, and gut microbiota measured in this cohort and showed that it accurately predicts personalized postprandial glycemic response to real-life meals. We validated these predictions in an independent 100-person cohort. Finally, a blinded randomized controlled dietary intervention based on this algorithm resulted in significantly lower postprandial responses and consistent alterations to gut microbiota configuration. Together, our results suggest that personalized diets may successfully modify elevated postprandial blood glucose and its metabolic consequences.
Keren L., Van Dijk D. D., Weingarten-Gabbay S., Davidi D., Jona G., Weinberger A., Milo R. & Segal E.
(2015)
Genome Research.
25,
p. 1893-1902
Genetically identical cells exposed to the same environment display variability in gene expression (noise), with important consequences for the fidelity of cellular regulation and biological function. Although population average gene expression is tightly coupled to growth rate, the effects of changes in environmental conditions on expression variability are not known. Here, we measure the single-cell expression distributions of approximately 900 Saccharomyces cerevisiæ promoters across four environmental conditions using flow cytometry, and find that gene expression noise is tightly coupled to the environment and is generally higher at lower growth rates. Nutrient-poor conditions, which support lower growth rates, display elevated levels of noise for most promoters, regardless of their specific expression values. We present a simple model of noise in expression that results from having an asynchronous population, with cells at different cell-cycle stages, and with different partitioning of the cells between the stages at different growth rates. This model predicts non-monotonic global changes in noise at different growth rates as well as overall higher variability in expression for cell-cycle-regulated genes in all conditions. The consistency between this model and our data, as well as with noise measurements of cells growing in a chemostat at well-defined growth rates, suggests that cell-cycle heterogeneity is a major contributor to gene expression noise. Finally, we identify gene and promoter features that play a role in gene expression noise across conditions. Our results show the existence of growth-related global changes in gene expression noise and suggest their potential phenotypic implications.
Korem T., Zeevi D., Suez J., Weinberger A., Avnit Sagi S. T., Pompan-Lotan M., Matot E., Jona G., Harmelin A., Cohen N., Sirota-Madi A., Thaiss C. A., Pevsner-Fischer M., Sorek R., Xavier R. J., Elinav E. & Segal E.
(2015)
Science (New York, N.Y.).
349,
6252,
p. 1101-1106
Metagenomic sequencing increased our understanding of the role of the microbiome in health and disease, yet it only provides a snapshot of a highly dynamic ecosystem. Here, we show that the pattern of metagenomic sequencing read coverage for different microbial genomes contains a single trough and a single peak, the latter coinciding with the bacterial origin of replication. Furthermore, the ratio of sequencing coverage between the peak and trough provides a quantitative measure of a species' growth rate. We demonstrate this in vitro and in vivo, under different growth conditions, and in complex bacterial communities. For several bacterial species, peak-to-trough coverage ratios, but not relative abundances, correlated with the manifestation of inflammatory bowel disease and type II diabetes.
Lubliner S., Regev I., Lotan-Pompan M., Edelheit S., Weinberger A. & Segal E.
(2015)
Genome Research.
25,
7,
p. 1008-1017
The core promoter is the regulatory sequence to which RNA polymerase is recruited and where it acts to initiate transcription. Here, we present the first comprehensive study of yeast core promoters, providing massively parallel measurements of core promoter activity and of TSS locations and relative usage for thousands of native and designed sequences. We found core promoter activity to be highly correlated to the activity of the entire promoter and that sequence variation in different core promoter regions substantially tunes its activity in a predictable way. We also show that location, orientation, and flanking bases critically affect TATA element function, that transcription initiation in highly active core promoters is focused within a narrow region, that poly(dA:dT) orientation has a functional consequence at the 3 end of promoters, and that orthologous core promoters across yeast species have conserved activities. Our results demonstrate the importance of core promoters in the quantitative study of gene regulation.
Manor O. & Segal E.
(2015)
Bioinformatics.
31,
11,
p. 1848-1850
Summary: Understanding the effect of single nucleotide polymorphisms (SNPs) on the expression level of genes is an important goal. We recently published a study in which we devised a multi- SNP predictive model for gene expression in Lymphoblastoid cell lines (LCL), and showed that it can robustly predict the expression of a small number of genes in test individuals. Here, we validate the generality of our models by predicting expression profiles for genes in LCL in an independent study, and extend the pool of predictable genes for which we are able to explain more than 25% of their expression variability to 232 genes across 14 different cell types. As the number of people who obtained their SNP profiles through companies such as 23andMe is rising rapidly, we developed GenoExp, a web-based tool in which users can upload their individual SNP data and obtain predicted expression levels for the set of predictable genes across the 14 different cell types. Our tool thus allows users with biological knowledge to study the possible effects that their set of SNPs might have on these genes and predict their cell-specific expression levels relative to the population average. Availability and implementation: GenoExp is freely available at http://genie.weizmann.ac.il/pubs/ GenoExp/.
Thaiss C. A., Zeevi D., Levy M., Segal E. & Elinav E.
(2015)
Gut Microbes.
6,
2,
p. 137-142
Life on Earth is dictated by circadian changes in the environment, caused by the planets rotation around its own axis. All forms of life have evolved clock systems to adapt their physiology to the daily variations in geophysical parameters. The intestinal microbiome serves as a signaling hub in the communication between the host and its environment. We recently discovered that the microbiota undergoes diurnal oscillations in composition and function, and that these oscillations are required for metabolic homeostasis of the host. Here, we highlight these findings from the perspectives of microbial system stability and metaorganismal metabolic health. We also discuss the contribution of nutrition and biotic interventions on diurnal processes of the microbiota and their potential involvement in diseases commonly associated with circadian disruption.
Suez J., Korem T., Zilberman-Schapira G., Segal E. & Elinav E.
(2015)
Gut Microbes.
6,
2,
p. 149-155
Non-caloric artificial sweeteners (NAS) are common food supplements consumed by millions worldwide as means of combating weight gain and diabetes, by retaining sweet taste without increasing caloric intake. While they are considered safe, there is increasing controversy regarding their potential ability to promote metabolic derangements in some humans. We recently demonstrated that NAS consumption could induce glucose intolerance in mice and distinct human subsets, by functionally altering the gut microbiome. In this commentary, we discuss these findings in the context of previous and recent works demonstrating the effects of NAS on host health and the microbiome, and the challenges and open questions that need to be addressed in understanding the effects of NAS consumption on human health.
Shalem O., Sharon E., Lubliner S., Regev I., Lotan-Pompan M., Yakhini Z. & Segal E.
(2015)
PLoS Genetics.
11,
4,
e1005147.
The 3end genomic region encodes a wide range of regulatory process including mRNA stability, 3 end processing and translation. Here, we systematically investigate the sequence determinants of 3 end mediated expression control by measuring the effect of 13,000 designed 3 end sequence variants on constitutive expression levels in yeast. By including a high resolution scanning mutagenesis of more than 200 native 3 end sequences in this designed set, we found that most mutations had only a mild effect on expression, and that the vast majority (~90%) of strongly effecting mutations localized to a single positive TA-rich element, similar to a previously described 3 end processing efficiency element, and resulted in up to ten-fold decrease in expression. Measurements of 3 UTR lengths revealed that these mutations result in mRNAs with aberrantly long 3UTRs, confirming the role for this element in 3 end processing. Interestingly, we found that other sequence elements that were previously described in the literature to be part of the polyadenylation signal had a minor effect on expression. We further characterize the sequence specificities of the TA-rich element using additional synthetic 3 end sequences and show that its activity is sensitive to single base pair mutations and strongly depends on the A/T content of the surrounding sequences. Finally, using a computational model, we show that the strength of this element in native 3 end sequences can explain some of their measured expression variability (R = 0.41). Together, our results emphasize the importance of efficient 3 end processing for endogenous protein levels and contribute to an improved understanding of the sequence elements involved in this process.
Levo M., Zalckvar E., Sharon E., Machado A. C. D., Kalma Y., Lotam-Pompan M., Weinberger A., Yakhini Z., Rohs R. & Segal E.
(2015)
Genome Research.
25,
7,
p. 1018-1029
Binding of transcription factors (TFs) to regulatory sequences is a pivotal step in the control of gene expression. Despite many advances in the characterization of sequence motifs recognized by TFs, our ability to quantitatively predict TF binding to different regulatory sequences is still limited. Here, we present a novel experimental assay termed BunDLE-seq that provides quantitative measurements of TF binding to thousands of fully designed sequences of 200 bp in length within a single experiment. Applying this binding assay to two yeast TFs, we demonstrate that sequences outside the core TF binding site profoundly affect TF binding. We show that TF-specific models based on the sequence or DNA shape of the regions flanking the core binding site are highly predictive of the measured differential TF binding. We further characterize the dependence of TF binding, accounting for measurements of single and co-occurring binding events, on the number and location of binding sites and on the TF concentration. Finally, by coupling our in vitro TF binding measurements, and another application of our method probing nucleosome formation, to in vivo expression measurements carried out with the same template sequences serving as promoters, we offer insights into mechanisms that may determine the different expression outcomes observed. Our assay thus paves the way to a more comprehensive understanding of TF binding to regulatory sequences and allows the characterization of TF binding determinants within and outside of core binding sites.
Suez J., Korem T., Zeevi D., Zilberman-Schapira G., Thaiss C. A., Maza O., Israeli D., Zmora N., Gilad S., Weinberger A., Kuperman Y., Harmelin A., Kolodkin-Gal I., Shapiro H., Halpern Z., Segal E. & Elinav E.
(2015)
Obstetrical & Gynecological Survey.
70,
1,
p. 31-32
Noncaloric artificial sweeteners (NASs) are popular because of their low caloric intake and perceived health benefits for weight loss and normalization of blood sugar levels. Artificial sweeteners have been increasingly introduced as an additive into common foods as an alternative to high-caloric sugars. However, increased consumption has coincided with a dramatic increase worldwide in obesity and diabetes epidemics. Scientific data supporting the safety and benefits of NAS consumption are sparse and controversial.Most NASs are not digested in the gastrointestinal tract and directly encounter the intestinal microbiota. The diet modulates microbiota composition and function in the healthy/lean state as well as in obesity and diabetes mellitus. Intestinal dysbiosis has been associated with propensity to metabolic syndrome.The investigators studied NAS-mediated changes of microbiota composition and function of mice to determine whether chronic NAS consumption exacerbates glucose intolerance in mice. Formulations of saccharin, sucralose, or aspartame were added to the drinking water of mice. The data provide conclusive proof that NAS-mediated intestinal dysbiosis is directly responsible for the development of glucose intolerance in mice. Treating mice with antibiotics eradicated many intestinal bacteria and fully reversed artificial sweetenersʼ effects on glucose metabolism. Transfer of fecal microbiota from mice that consumed artificial sweeteners to \u201cgerm-free\u201d mice resulted in a complete transmission of the glucose intolerance into the recipient mice. Incubating the microbiota anerobically with artificial sweeteners also induced glucose intolerance in the sterile mice. Profound changes in the population of intestinal bacteria have been linked to host susceptibility to obesity, diabetes, and other metabolic diseases in both mice and humans. Similar NAS-induced dysbiosis and glucose intolerance were demonstrated in healthy human subjects.These findings show that NAS consumption in both mice and humans increases the risk of glucose intolerance through adverse metabolic effects mediated by intestinal dysbiosis. The data suggest that the widespread use of NAS should be reassessed.
Weingarten-Gabbay S. & Segal E.
(2014)
Nature Genetics.
46,
12,
p. 1253-1254
A new study detects unstable nascent RNAs and uncovers thousands of transcription initiation sites in promoters and enhancers. Detailed analysis shows that these initiation sites have a similar architecture and that they are differentiated by post-transcriptional regulation rather than transcription initiation.
Zeevi D., Lubliner S., Lotan-Pompan M., Hodis E., Vesterman R., Weinberger A. & Segal E.
(2014)
Genome Research.
24,
12,
p. 1991-1999
Recent studies have shown a surprising phenomenon, whereby orthologous regulatory regions from different species drive similar expression levels despite being highly diverged in sequence. Here, we investigated this phenomenon by genomically integrating hundreds of ribosomal protein (RP) promoters from nine different yeast species into S. cerevisiae and accurately measuring their activity. We found that orthologous RP promoters have extreme expression conservation even across evolutionarily distinct yeast species. Notably, our measurements reveal two distinct mechanisms that underlie this conservation and which act in different regions of the promoter. In the core promoter region, we found compensatory changes, whereby effects of sequence variations in one part of the core promoter were reversed by variations in another part. In contrast, we observed robustness in Rap1 transcription factor binding sites, whereby significant sequence variations had little effect on promoter activity. Finally, cases in which orthologous promoter activities were not conserved could largely be explained by the sequence variation within the core promoter. Together, our results provide novel insights into the mechanisms by which expression is conserved throughout evolution across diverged promoter sequences.
Thaiss C. A., Zeevi D., Levy M., Zilberman-Schapira G., Suez J., Tengeler A. C., Abramson L., Katz M. N., Korem T., Zmora N., Kuperman Y., Biton I., Gilad S., Harmelin A., Shapiro H., Halpern Z., Segal E. & Elinav E.
(2014)
Cell.
159,
3,
p. 514-529
All domains of life feature diverse molecular clock machineries that synchronize physiological processes to diurnal environmental fluctuations. However, no mechanisms are known to cross-regulate prokaryotic and eukaryotic circadian rhythms in multikingdom ecosystems. Here, we show that the intestinal microbiota, in both mice and humans, exhibits diurnal oscillations that are influenced by feeding rhythms, leading to time-specific compositional and functional profiles over the course of a day. Ablation of host molecular clock components or induction of jet lag leads to aberrant microbiota diurnal fluctuations and dysbiosis, driven by impaired feeding rhythmicity. Consequently, jet-lag-induced dysbiosis in both mice and humans promotes glucose intolerance and obesity that are transferrable to germ-free mice upon fecal transplantation. Together, these findings provide evidence of coordinated metaorganism diurnal rhythmicity and offer a microbiome-dependent mechanism for common metabolic disturbances in humans with aberrant circadian rhythms, such as those documented in shift workers and frequent flyers.
Suez J., Korem T., Zeevi D., Zilberman-Schapira G., Thaiss C. A., Maza O., Israeli D., Zmora N., Gilad S., Weinberger A., Kuperman Y., Harmelin A., Kolodkin-Gal I., Shapiro H., Halpern Z., Segal E. & Elinav E.
(2014)
Nature.
514,
7521,
p. 181-186
Non-caloric artificial sweeteners (NAS) are among the most widely used food additives worldwide, regularly consumed by lean and obese individuals alike. NAS consumption is considered safe and beneficial owing to their low caloric content, yet supporting scientific data remain sparse and controversial. Here we demonstrate that consumption of commonly used NAS formulations drives the development of glucose intolerance through induction of compositional and functional alterations to the intestinal microbiota. These NAS-mediated deleterious metabolic effects are abrogated by antibiotic treatment, and are fully transferrable to germ-free mice upon faecal transplantation of microbiota configurations from NAS-consuming mice, or of microbiota anaerobically incubated in the presence of NAS. We identify NAS-altered microbial metabolic pathways that are linked to host susceptibility to metabolic disease, and demonstrate similar NAS-induced dysbiosis and glucose intolerance in healthy human subjects. Collectively, our results link NAS consumption, dysbiosis and metabolic abnormalities, thereby calling for a reassessment of massive NAS usage.
Pilosof S., Fortuna M. A., Cosson J. F., Galan M., Kittipong C., Ribas A., Segal E., Krasnov B. R., Morand S. & Bascompte J.
(2014)
Nature Communications.
5,
5172.
Genes of the major histocompatibility complex (MHC) encode proteins that recognize foreign antigens and are thus crucial for immune response. In a population of a single host species, parasite-mediated selection drives MHC allelic diversity. However, in a community-wide context, species interactions may modulate selection regimes because the prevalence of a given parasite in a given host may depend on its prevalence in other hosts. By combining network analysis with immunogenetics, we show that host species infected by similar parasites harbour similar alleles with similar frequencies. We further show, using a Bayesian approach, that the probability of mutual occurrence of a functional allele and a parasite in a given host individual is nonrandom and depends on other host-parasite interactions, driving co-evolution within subgroups of parasite species and functional alleles. Therefore, indirect effects among hosts and parasites can shape host MHC diversity, scaling it from the population to the community level.
Sharon E., Van Dijk D. D., Kalma Y., Keren L., Manor O., Yakhini Z. & Segal E.
(2014)
Genome Research.
24,
10,
p. 1698-1706
Genetically identical cells exhibit large variability (noise) in gene expression, with important consequences for cellular function. Although the amount of noise decreases with and is thus partly determined by the mean expression level, the extent to which different promoter sequences can deviate away from this trend is not fully known. Here, we present a high-throughput method for measuring promoter-driven noise for thousands of designed synthetic promoters in parallel. We use it to investigate how promoters encode different noise levels and find that the noise levels of promoters with similar mean expression levels can vary more than one order of magnitude, with nucleosome-disfavoring sequences resulting in lower noise and more transcription factor binding sites resulting in higher noise. We propose a kinetic model of gene expression that takes into account the nonspecific DNA binding and one-dimensional sliding along the DNA, which occurs when transcription factors search for their target sites. We show that this assumption can improve the prediction of the mean-independent component of expression noise for our designed promoter sequences, suggesting that a transcription factor target search may affect gene expression noise. Consistent with our findings in designed promoters, we find that binding-site multiplicity in native promoters is associated with higher expression noise. Overall, our results demonstrate that small changes in promoter DNA sequence can tune noise levels in a manner that is predictable and partly decoupled from effects on the mean expression levels. These insights may assist in designing promoters with desired noise levels.
Levo M. & Segal E.
(2014)
Nature Reviews Genetics.
15,
7,
p. 453-468
Instructions for when, where and to what level each gene should be expressed are encoded within regulatory sequences. The importance of motifs recognized by DNA-binding regulators has long been known, but their extensive characterization afforded by recent technologies only partly accounts for how regulatory instructions are encoded in the genome. Here, we review recent advances in our understanding of regulatory sequences that influence transcription and go beyond the description of motifs. We discuss how understanding different aspects of the sequence-encoded regulation can help to unravel the genotype-phenotype relationship, which would lead to a more accurate and mechanistic interpretation of personal genome sequences.
Weingarten-Gabbay S. & Segal E.
(2014)
Human Genetics.
133,
6,
p. 701-711
Eukaryotes employ combinatorial strategies to generate a variety of expression patterns from a relatively small set of regulatory DNA elements. As in any other language, deciphering the mapping between DNA and expression requires an understanding of the set of rules that govern basic principles in transcriptional regulation, the functional elements involved, and the ways in which they combine to orchestrate a transcriptional output. Here, we review the current understanding of various grammatical rules, including the effect on expression of the number of transcription factor binding sites, their location, orientation, affinity and activity; co-association with different factors; and intrinsic nucleosome organization. We review different methods that are used to study the grammar of transcription regulation, highlight gaps in current understanding, and discuss how recent technological advances may be utilized to bridge them.
Wan Y., Qu K., Zhang Q. C., Flynn R. A., Manor O., Ouyang Z., Zhang J., Spitale R. C., Snyder M. P., Segal E. & Chang H. Y.
(2014)
Nature.
505,
7485,
p. 706-709
In parallel to the genetic code for protein synthesis, a second layer of information is embedded in all RNA transcripts in the form of RNA structure. RNA structure influences practically every step in the gene expression program. However, the nature of most RNA structures or effects of sequence variation on structure are not known. Here we report the initial landscape and variation of RNA secondary structures (RSSs) in a human family trio (mother, father and their child). This provides a comprehensive RSS map of human coding and non-coding RNAs. We identify unique RSS signatures that demarcate open reading frames and splicing junctions, and define authentic microRNA-binding sites. Comparison of native deproteinized RNA isolated from cells versus refolded purified RNA suggests that the majority of the RSS information is encoded within RNA sequence. Over 1,900 transcribed single nucleotide variants (approximately 15% of all transcribed single nucleotide variants) alter local RNA structure. We discover simple sequence and spacing rules that determine the ability of point mutations to impact RSSs. Selective depletion of 'riboSNitches' versus structurally synonymous variants at precise locations suggests selection for specific RNA shapes at thousands of sites, including 3 untranslated regions, binding sites of microRNAs and RNA-binding proteins genome-wide. These results highlight the potentially broad contribution of RNA structure and its variation to gene regulation.
Keren L. & Segal E.
(2013)
GENOME BIOLOGY.
14,
11,
138.
A new study exploits the time-dependence of formaldehyde cross-linking in the commonly used chromatin immunoprecipitation (ChIP) assay to infer the on and off rates for site-specific chromatin interactions.
Meyer P., Siwo G., Zeevi D., Sharon E., Norel R., Segal E., Stolovitzky G., Rider A. K., Tan A., Pinapati R. S., Emrich S., Chawla N., Ferdig M. T., Tung Y. A., Chen Y. S., Chen M. J. M., Chen C. Y., Knight J. M., Sahraeian S. M. E. & Esfahani M. S.
(2013)
Genome Research.
23,
11,
p. 1928-1937
The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites.
Keren L., Zackay O., Lotan-Pompan M., Barenholz U., Dekel E., Sasson V., Aidelberg G., Bren A., Zeevi D., Weinberger A., Alon U., Milo R. & Segal E.
(2013)
Molecular Systems Biology.
9,
701.
Most genes change expression levels across conditions, but it is unclear which of these changes represents specific regulation and what determines their quantitative degree. Here, we accurately measured activities of ∼900 S. cerevisiae and ∼1800 E. coli promoters using fluorescent reporters. We show that in both organisms 60-90% of promoters change their expression between conditions by a constant global scaling factor that depends only on the conditions and not on the promoter's identity. Quantifying such global effects allows precise characterization of specific regulation - promoters deviating from the global scale line. These are organized into few functionally related groups that also adhere to scale lines and preserve their relative activities across conditions. Thus, only several scaling factors suffice to accurately describe genome-wide expression profiles across conditions. We present a parameter-free passive resource allocation model that quantitatively accounts for the global scaling factors. It suggests that many changes in expression across conditions result from global effects and not specific regulation, and provides means for quantitative interpretation of expression profiles.
Zalckvar E., Paulus C., Tillo D., Asbach-Nitzsche A., Lubling Y., Winterling C., Strieder N., Mücke K., Goodrum F., Segal E. & Nevels M.
(2013)
Proceedings of the National Academy of Sciences of the United States of America.
110,
32,
p. 13126-13131
Human CMV (hCMV) establishes lifelong infections in most of us, causing developmental defects in human embryos and life-threatening disease in immunocompromised individuals. During productive infection, the viral >230,000-bp dsDNA genome is expressed widely and in a temporal cascade. The hCMV genome does not carry histones when encapsidated but has been proposed to form nucleosomes after release into the host cell nucleus. Here, we present hCMV genome-wide nucleosome occupancy and nascent transcript maps during infection of permissive human primary cells. We show that nucleosomes occupy nuclear viral DNA in a nonrandom and highly predictable fashion. At early times of infection, nucleosomes associate with the hCMV genome largely according to their intrinsic DNA sequence preferences, indicating that initial nucleosome formation is genetically encoded in the virus. However, as infection proceeds to the late phase, nucleosomes redistribute extensively to establish patterns mostly determined by nongenetic factors. We propose that these factors include key regulators of viral gene expression encoded at the hCMV major immediate-early (IE) locus. Indeed, mutant virus genomes deficient for IE1 expression exhibit globally increased nucleosome loads and reduced nucleosome dynamics compared with WT genomes. The temporal nucleosome occupancy differences between IE1-deficient and WT viruses correlate inversely with changes in the pattern of viral nascent and total transcript accumulation. These results provide a framework of spatial and temporal nucleosome organization across the genome of a major human pathogen and suggest that an hCMV major IE protein governs overall viral chromatin structure and function.
Manor O. & Segal E.
(2013)
PLoS Computational Biology.
9,
8,
e1003200.
Genome-wide association studies (GWAS) are widely used to search for genetic loci that underlie human disease. Another goal is to predict disease risk for different individuals given their genetic sequence. Such predictions could either be used as a "black box" in order to promote changes in life-style and screening for early diagnosis, or as a model that can be studied to better understand the mechanism of the disease. Current methods for risk prediction typically rank single nucleotide polymorphisms (SNPs) by the p-value of their association with the disease, and use the top-associated SNPs as input to a classification algorithm. However, the predictive power of such methods is relatively poor. To improve the predictive power, we devised BootRank, which uses bootstrapping in order to obtain a robust prioritization of SNPs for use in predictive models. We show that BootRank improves the ability to predict disease risk of unseen individuals in the Wellcome Trust Case Control Consortium (WTCCC) data and results in a more robust set of SNPs and a larger number of enriched pathways being associated with the different diseases. Finally, we show that combining BootRank with seven different classification algorithms improves performance compared to previous studies that used the WTCCC data. Notably, diseases for which BootRank results in the largest improvements were recently shown to have more heritability than previously thought, likely due to contributions from variants with low minimum allele frequency (MAF), suggesting that BootRank can be beneficial in cases where SNPs affecting the disease are poorly tagged or have low MAF. Overall, our results show that improving disease risk prediction from genotypic information may be a tangible goal, with potential implications for personalized disease screening and treatment.
Dvir S., Velten L., Sharon E., Zeevi D., Carey L. B., Weinberger A. & Segal E.
(2013)
Proceedings of the National Academy of Sciences of the United States of America.
110,
30,
p. E2792-E2801
The 5'-untranslated region (5'-UTR) of mRNAs contains elements that affect expression, yet the rules by which these regions exert their effect are poorly understood. Here, we studied the impact of 5'-UTR sequences on protein levels in yeast, by constructing a large-scale library of mutants that differ only in the 10 bp preceding the translational start site of a fluorescent reporter. Using a high-throughput sequencing strategy, we obtained highly accurate measurements of protein abundance for over 2,000 unique sequence variants. The resulting pool spanned an approximately sevenfold range of protein levels, demonstrating the powerful consequences of sequence manipulations of even 1-10 nucleotides immediately upstream of the start codon. We devised computational models that predicted over 70% of the measured expression variability in held-out sequence variants. Notably, a combined model of the most prominent features successfully explained protein abundance in an additional, independently constructed library, whose nucleotide composition differed greatly from the library used to parameterize the model. Our analysis reveals the dominant contribution of the start codon context at positions -3 to -1, mRNA secondary structure, and out-of-frame upstream AUGs (uAUGs) to phenotypic diversity, thereby advancing our understanding of how protein levels are modulated by 5'-UTR sequences, and paving the way toward predictably tuning protein expression through manipulations of 5'-UTRs.
Dadiani M., Van Dijk D. D., Segal B., Field Y., Ben Artzi A. G., Raveh - Sadka -. S. T., Levo M., Kaplow I., Weinberger A. & Segal E.
(2013)
Genome Research.
23,
6,
p. 966-976
Individual cells from a genetically identical population exhibit substantial variation in gene expression. A significant part of this variation is due to noise in the process of transcription that is intrinsic to each gene, and is determined by factors such as the rate with which the promoter transitions between transcriptionally active and inactive states, and the number of transcripts produced during the active state. However, we have a limited understanding of how the DNA sequence affects such promoter dynamics. Here, we used single-cell time-lapse microscopy to compare the effect on transcriptional dynamics of two distinct types of sequence changes in the promoter that can each increase the mean expression of a cell population by similar amounts but through different mechanisms. We show that increasing expression by strengthening a transcription factor binding site results in slower promoter dynamics and higher noise as compared with increasing expression by adding nucleosome-disfavoring sequences. Our results suggest that when achieving the same mean expression, the strategy of using stronger binding sites results in a larger number of transcripts produced from the active state, whereas the strategy of adding nucleosome-disfavoring sequences results in a higher frequency of promoter transitions between active and inactive states. In the latter strategy, this increased sampling of the active state likely reduces the expression variability of the cell population. Our study thus demonstrates the effect of cis-regulatory elements on expression variability and points to concrete types of sequence changes that may allow partial decoupling of expression level and noise.
Lubliner S., Keren L. & Segal E.
(2013)
Nucleic Acids Research.
41,
11,
p. 5569-5581
The core promoter is the region in which RNA polymerase II is recruited to the DNA and acts to initiate transcription, but the extent to which the core promoter sequence determines promoter activity levels is largely unknown. Here, we identified several base content and k-mer sequence features of the yeast core promoter sequence that are highly predictive of maximal promoter activity. These features are mainly located in the region 75 bp upstream and 50 bp downstream of the main transcription start site, and their associations hold for both constitutively active promoters and promoters that are induced or repressed in specific conditions. Our results unravel several architectural features of yeast core promoters and suggest that the yeast core promoter sequence downstream of the TATA box (or of similar sequences involved in recruitment of the pre-initiation complex) is a major determinant of maximal promoter activity. We further show that human core promoters also contain features that are indicative of maximal promoter activity; thus, our results emphasize the important role of the core promoter sequence in transcriptional regulation.
Carey L. B., van Dijk D., Sloot P. M., Kaandorp J. A. & Segal E.
(2013)
PLoS Biology.
11,
4,
e1001528.
The ability of cells to accurately control gene expression levels in response to extracellular cues is limited by the inherently stochastic nature of transcriptional regulation. A change in transcription factor (TF) activity results in changes in the expression of its targets, but the way in which cell-to-cell variability in expression (noise) changes as a function of TF activity, and whether targets of the same TF behave similarly, is not known. Here, we measure expression and noise as a function of TF activity for 16 native targets of the transcription factor Zap1 that are regulated by it through diverse mechanisms. For most activated and repressed Zap1 targets, noise decreases as expression increases. Kinetic modeling suggests that this is due to two distinct Zap1-mediated mechanisms that both change the frequency of transcriptional bursts. Notably, we found that another mechanism of repression by Zap1, which is encoded in the promoter DNA, likely decreases the size of transcriptional bursts, producing a unique transcriptional state characterized by low expression and low noise. In addition, we find that further reduction in noise is achieved when a single TF both activates and represses a single target gene. Our results suggest a global principle whereby at low TF concentrations, the dominant source of differences in expression between promoters stems from differences in burst frequency, whereas at high TF concentrations differences in burst size dominate. Taken together, we show that the precise amount by which noise changes with expression is specific to the regulatory mechanism of transcription and translation that acts at each gene.
Struhl K. & Segal E.
(2013)
Nature Structural & Molecular Biology.
20,
3,
p. 267-273
Nucleosome positioning is critical for gene expression and most DNA-related processes. Here we review the dominant patterns of nucleosome positioning that have been observed and summarize the current understanding of their underlying determinants. The genome-wide pattern of nucleosome positioning is determined by the combination of DNA sequence, ATP-dependent nucleosome remodeling enzymes and transcription factors that include activators, components of the preinitiation complex and elongating RNA polymerase II. These determinants influence each other such that the resulting nucleosome positioning patterns are likely to differ among genes and among cells in a population, with consequent effects on gene expression.
Shalem O., Carey L., Zeevi D., Sharon E., Keren L., Weinberger A., Dahan O., Pilpel Y. & Segal E.
(2013)
PLoS Computational Biology.
9,
3,
e1002934.
A full understanding of gene regulation requires an understanding of the contributions that the various regulatory regions have on gene expression. Although it is well established that sequences downstream of the main promoter can affect expression, our understanding of the scale of this effect and how it is encoded in the DNA is limited. Here, to measure the effect of native S. cerevisiae 3 end sequences on expression, we constructed a library of 85 fluorescent reporter strains that differ only in their 3 end region. Notably, despite being driven by the same strong promoter, our library spans a continuous twelve-fold range of expression values. These measurements correlate with endogenous mRNA levels, suggesting that the 3 end contributes to constitutive differences in mRNA levels. We used deep sequencing to map the 3UTR ends of our strains and show that determination of polyadenylation sites is intrinsic to the local 3 end sequence. Polyadenylation mapping was followed by sequence analysis, we found that increased A/T content upstream of the main polyadenylation site correlates with higher expression, both in the library and genome-wide, suggesting that native genes differ by the encoded efficiency of 3 end processing. Finally, we use single cells fluorescence measurements, in different promoter activation levels, to show that 3 end sequences modulate protein expression dynamics differently than promoters, by predominantly affecting the size of protein production bursts as opposed to the frequency at which these bursts occur. Altogether, our results lead to a more complete understanding of gene regulation by demonstrating that 3 end regions have a unique and sequence dependent effect on gene expression.
Manor O. & Segal E.
(2013)
PLoS Genetics.
9,
3,
e1003396.
Many genetic variants that are significantly correlated to gene expression changes across human individuals have been identified, but the ability of these variants to predict expression of unseen individuals has rarely been evaluated. Here, we devise an algorithm that, given training expression and genotype data for a set of individuals, predicts the expression of genes of unseen test individuals given only their genotype in the local genomic vicinity of the predicted gene. Notably, the resulting predictions are remarkably robust in that they agree well between the training and test sets, even when the training and test sets consist of individuals from distinct populations. Thus, although the overall number of genes that can be predicted is relatively small, as expected from our choice to ignore effects such as environmental factors and trans sequence variation, the robust nature of the predictions means that the identity and quantitative degree to which genes can be predicted is known in advance. We also present an extension that incorporates heterogeneous types of genomic annotations to differentially weigh the importance of the various genetic variants, and we show that assigning higher weights to variants with particular annotations such as proximity to genes and high regional G/C content can further improve the predictions. Finally, genes that are successfully predicted have, on average, higher expression and more variability across individuals, providing insight into the characteristics of the types of genes that can be predicted from their cis genetic variation.
Wan Y., Qu K., Ouyang Z., Kertesz M., Li J., Tibshirani R., Makino D. L., Nutter R. C., Segal E. & Chang H. Y.
(2012)
Molecular Cell.
48,
2,
p. 169-181
RNA structural transitions are important in the function and regulation of RNAs. Here, we reveal a layer of transcriptome organization in the form of RNA folding energies. By probing yeast RNA structures at different temperatures, we obtained relative melting temperatures (Tm) for RNA structures in over 4000 transcripts. Specific signatures of RNA Tm demarcated the polarity of mRNA open reading frames and highlighted numerous candidate regulatory RNA motifs in 3' untranslated regions. RNA Tm distinguished noncoding versus coding RNAs and identified mRNAs with distinct cellular functions. We identified thousands of putative RNA thermometers, and their presence is predictive of the pattern of RNA decay in vivo during heat shock. The exosome complex recognizes unpaired bases during heat shock to degrade these RNAs, coupling intrinsic structural stabilities to gene regulation. Thus, genome-wide structural dynamics of RNA can parse functional elements of the transcriptome and reveal diverse biological insights.
Raveh - Sadka -. S. T., Levo M., Shabi U., Shany B., Keren L., Lotan-Pompan M., Zeevi D., Sharon E., Weinberger A. & Segal E.
(2012)
Nature Genetics.
44,
7,
p. 743-750
Understanding how precise control of gene expression is specified within regulatory DNA sequences is a key challenge with far-reaching implications. Many studies have focused on the regulatory role of transcription factor-binding sites. Here, we explore the transcriptional effects of different elements, nucleosome-disfavoring sequences and, specifically, poly(dA:dT) tracts that are highly prevalent in eukaryotic promoters. By measuring promoter activity for a large-scale promoter library, designed with systematic manipulations to the properties and spatial arrangement of poly(dA:dT) tracts, we show that these tracts significantly and causally affect transcription. We show that manipulating these elements offers a general genetic mechanism, applicable to promoters regulated by different transcription factors, for tuning expression in a predictable manner, with resolution that can be even finer than that attained by altering transcription factor sites. Overall, our results advance the understanding of the regulatory code and suggest a potential mechanism by which promoters yielding prespecified expression patterns can be designed.
Sharon E., Kalma Y., Sharp A., Raveh - Sadka -. S. T., Levo M., Zeevi D., Keren L., Yakhini Z., Weinberger A. & Segal E.
(2012)
Nature Biotechnology.
30,
6,
p. 521-+
Despite extensive research, our understanding of the rules according to which cis-regulatory sequences are converted into gene expression is limited. We devised a method for obtaining parallel, highly accurate gene expression measurements from thousands of designed promoters and applied it to measure the effect of systematic changes in the location, number, orientation, affinity and organization of transcription-factor binding sites and nucleosome-disfavoring sequences. Our analyses reveal a clear relationship between expression and binding-site multiplicity, as well as dependencies of expression on the distance between transcription-factor binding sites and gene starts which are transcription-factor specific, including a striking similar to 10-bp periodic relationship between gene expression and binding-site location. We show how this approach can measure transcription-factor sequence specificities and the sensitivity of transcription-factor sites to the surrounding sequence context, and compare the activity of 75 yeast transcription factors. Our method can be used to study both cis and trans effects of genotype on transcriptional, post-transcriptional and translational control.
Reizel Y., Itzkovitz S., Adar R., Elbaz J., Jinich A., Chapal Ilani I. N., Maruvka Y. E., Nevo N., Marx Z., Horovitz I., Wasserstrom A., Mayo A., Shur I., Benayahu D., Skorecki K., Segal E., Dekel N. & Shapiro E.
(2012)
PLoS Genetics.
8,
2,
e1002477.
Fundamental aspects of embryonic and post-natal development, including maintenance of the mammalian female germline, are largely unknown. Here we employ a retrospective, phylogenetic-based method for reconstructing cell lineage trees utilizing somatic mutations accumulated in microsatellites, to study female germline dynamics in mice. Reconstructed cell lineage trees can be used to estimate lineage relationships between different cell types, as well as cell depth (number of cell divisions since the zygote). We show that, in the reconstructed mouse cell lineage trees, oocytes form clusters that are separate from hematopoietic and mesenchymal stem cells, both in young and old mice, indicating that these populations belong to distinct lineages. Furthermore, while cumulus cells sampled from different ovarian follicles are distinctly clustered on the reconstructed trees, oocytes from the left and right ovaries are not, suggesting a mixing of their progenitor pools. We also observed an increase in oocyte depth with mouse age, which can be explained either by depth-guided selection of oocytes for ovulation or by post-natal renewal. Overall, our study sheds light on substantial novel aspects of female germline preservation and development.
Cayrou C., Coulombe P., Puy A., Rialle S., Kaplan N., Segal E. & Mechali M.
(2012)
Cell Cycle.
11,
4,
p. 658-667
We recently reported the identification and characterization of DNA replication origins (Oris) in metazoan cell lines. Here, we describe additional bioinformatic analyses showing that the previously identified GC-rich sequence elements form origin G-rich repeated elements (OGREs) that are present in 67% to 90% of the DNA replication origins from Drosophila to human cells, respectively. Our analyses also show that initiation of DNA synthesis takes place precisely at 160 bp (Drosophila) and 280 bp (mouse) from the OGRE. We also found that in most CpG islands, an OGRE is positioned in opposite orientation on each of the two DNA strands and detected two sites of initiation of DNA synthesis upstream or downstream of each OGRE. Conversely, Oris not associated with CpG islands have a single initiation site. OGRE density along chromosomes correlated with previously published replication timing data. Ori sequences centered on the OGRE are also predicted to have high intrinsic nucleosome occupancy. Finally, OGREs predict G-quadruplex structures at Oris that might be structural elements controlling the choice or activation of replication origins.
Zeevi D., Sharon E., Lotan-Pompan M., Lubling Y., Shipony Z., Raveh - Sadka -. S. T., Keren L., Levo M., Weinberger A. & Segal E.
(2011)
Genome Research.
21,
12,
p. 2114-2128
Coordinate regulation of ribosomal protein (RP) genes is key for controlling cell growth. In yeast, it is unclear how this regulation achieves the required equimolar amounts of the different RP components, given that some RP genes exist in duplicate copies, while others have only one copy. Here, we tested whether the solution to this challenge is partly encoded within the DNA sequence of the RP promoters, by fusing 110 different RP promoters to a fluorescent gene reporter, allowing us to robustly detect differences in their promoter activities that are as small as ∼10%. We found that single-copy RP promoters have significantly higher activities, suggesting that proper RP stoichiometry is indeed partly encoded within the RP promoters. Notably, we also partially uncovered how this regulation is encoded by finding that RP promoters with higher activity have more nucleosome-disfavoring sequences and characteristic spatial organizations of these sequences and of binding sites for key RP regulators. Mutations in these elements result in a significant decrease of RP promoter activity. Thus, our results suggest that intrinsic (DNA-dependent) nucleosome organization may be a key mechanism by which genomes encode biologically meaningful promoter activities. Our approach can readily be applied to uncover how transcriptional programs of other promoters are encoded.
Wan Y., Kertesz M., Spitale R. C., Segal E. & Chang H. Y.
(2011)
Nature Reviews Genetics.
12,
9,
p. 641-655
RNA structure is crucial for gene regulation and function. In the past, transcriptomes have largely been parsed by primary sequences and expression levels, but it is now becoming feasible to annotate and compare transcriptomes based on RNA structure. In addition to computational prediction methods, the recent advent of experimental techniques to probe RNA structure by high-throughput sequencing has enabled genome-wide measurements of RNA structure and has provided the first picture of the structural organization of a eukaryotic transcriptome - the 'RNA structurome'. With additional advances in method refinement and interpretation, structural views of the transcriptome should help to identify and validate regulatory RNA motifs that are involved in diverse cellular processes and thereby increase understanding of RNA function.
Rabani M., Kertesz M. & Segal E.
(2011)
Rna Detection And Visualization
: Methods And Protocols
.
p. 467-479
mRNA molecules are tightly regulated, mostly through interactions with proteins and other RNAs, but the mechanisms that confer the specificity of such interactions are poorly understood. It is clear, however, that this specificity is determined by both the nucleotide sequence and secondary structure of the mRNA. We developed RNApromo, an efficient computational tool for identifying structural elements within mRNAs that are involved in specifying post-transcriptional regulations. Using RNApromo, we predicted putative motifs in sets of mRNAs with substantial experimental evidence for common post-transcriptional regulation, including mRNAs with similar decay rates, mRNAs that are bound by the same RNA binding protein, and mRNAs with a common cellular localization. Our new RNA motif discovery tool reveals unexplored layers of post-transcriptional regulations in groups of RNAs, and is therefore an important step toward a better understanding of the regulatory information conveyed within RNA molecules.
Ercan S., Lubling Y., Segal E. & Lieb J. D.
(2011)
Genome Research.
21,
2,
p. 237-244
We mapped nucleosome occupancy by paired-end Illumina sequencing in C. elegans embryonic cells, adult somatic cells, and a mix of adult somatic and germ cells. In all three samples, the nucleosome occupancy of gene promoters on the X chromosome differed from autosomal promoters. While both X and autosomal promoters exhibit a typical nucleosome-depleted region upstream of transcript start sites and a well-positioned +1 nucleosome, X-linked gene promoters on average exhibit higher nucleosome occupancy relative to autosomal promoters. We show that the difference between X and autosomes does not depend on the somatic dosage compensation machinery. Instead, the chromatin difference at promoters is partly encoded by DNA sequence, because a model trained on nucleosome sequence preferences from S. cerevisiae in vitro data recapitulate nearly completely the experimentally observed difference between X and autosomal promoters. The model predictions also correlate very well with experimentally determined occupancy values genome-wide. The nucleosome occupancy differences observed on X promoters may bear on mechanisms of X chromosome dosage compensation in the soma, and chromosome-wide repression of X in the germline.
Field Y., Sharon E. & Segal E.
(2011)
Handbook Of Transcription Factors
.
p. 193-204
(trueSubcellular Biochemistry).
Binding of transcription factors to functional sites is a fundamental step in transcriptional regulation. In this chapter, we discuss how transcription factors are thought to achieve specificity to their functional targets, despite their typically low concentrations and degenerate binding specificities, and the fact that in large genomes their functional binding sites must compete with their widespread alternative binding sites. We highlight the importance of the chromatin structure context of the binding sites in this process, and its dependency on the genomic DNA sequence.
Gerstein M. B., Lu Z. J., Van Nostrand E. L., Cheng C., Arshinoff B. I., Liu T., Yip K. Y., Robilotto R., Rechtsteiner A., Ikegami K., Alves P., Chateigner A., Perry M., Morris M., Auerbach R. K., Feng X., Leng J., Vielle A., Niu W., Rhrissorrakrai K., Agarwal A., Alexander R. P., Barber G., Brdlik C. M., Brennan J., Brouillet J. J., Carr A., Cheung M. S., Clawson H., Contrino S., Dannenberg L. O., Dernburg A. F., Desai A., Dick L., Dosé A. C., Du J., Egelhofer T., Ercan S., Euskirchen G., Ewing B., Feingold E. A., Gassmann R., Good P. J., Green P., Gullier F., Gutwein M., Guyer M. S., Habegger L., Han T., Henikoff J. G., Henz S. R., Hinrichs A., Holster H., Hyman T., Iniguez A. L., Janette J., Jensen M., Kato M., Kent W. J., Kephart E., Khivansara V., Khurana E., Kim J. K., Kolasinska-Zwierz P., Lai E. C., Latorre I., Leahey A., Lewis S., Lloyd P., Lochovsky L., Lowdon R. F., Lubling Y., Lyne R., MacCoss M., Mackowiak S. D., Mangone M., McKay S., Mecenas D., Merrihew G., Miller D. M., Muroyama A., Murray J. I., Ooi S. L., Pham H., Phippen T., Preston E. A., Rajewsky N., Rätsch G., Rosenbaum H., Rozowsky J., Rutherford K., Ruzanov P., Sarov M., Sasidharan R., Sboner A., Scheid P., Segal E., Shin H., Shou C., Slack F. J., Slightam C., Smith R., Spencer W. C., Stinson E. O., Taing S., Takasaki T., Vafeados D., Voronina K., Wang G., Washington N. L., Whittle C. M., Wu B., Yan K. K., Zeller G., Zha Z., Zhong M., Zhou X., Ahringer J., Strome S., Gunsalus K. C., Micklem G., Liu X. S., Reinke V., Kim S. K., Hillier L. W., Henikoff S., Piano F., Snyder M., Stein L., Lieb J. D. & Waterston R. H.
(2010)
Science.
330,
6012,
p. 1775-1787
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
Kenigsberg E., Bar A., Segal E. & Tanay A.
(2010)
PLoS Computational Biology.
6,
12,
e1001039.
Evolution maintains organismal fitness by preserving genomic information. This is widely assumed to involve conservation of specific genomic loci among species. Many genomic encodings are now recognized to integrate small contributions from multiple genomic positions into quantitative dispersed codes, but the evolutionary dynamics of such codes are still poorly understood. Here we show that in yeast, sequences that quantitatively affect nucleosome occupancy evolve under compensatory dynamics that maintain heterogeneous levels of A+T content through spatially coupled A/T-losing and A/Tgaining substitutions. Evolutionary modeling combined with data on yeast polymorphisms supports the idea that these substitution dynamics are a consequence of weak selection. This shows that compensatory evolution, so far believed to affect specific groups of epistatically linked loci like paired RNA bases, is a widespread phenomenon in the yeast genome, affecting the majority of intergenic sequences in it. The model thus derived suggests that compensation is inevitable when evolution conserves quantitative and dispersed genomic functions.
Kaplan N., Hughes T. R., Lieb J. D., Widom J. & Segal E.
(2010)
GENOME BIOLOGY.
11,
11,
140.
We propose definitions and procedures for comparing nucleosome maps and discuss current agreement and disagreement on the effect of histone sequence preferences on nucleosome organization in vivo.
Itzkovitz S., Hodis E. & Segal E.
(2010)
Genome Research.
20,
11,
p. 1582-1589
Genomes encode multiple signals, raising the question of how these different codes are organized along the linear genome sequence. Within protein-coding regions, the redundancy of the genetic code can, in principle, allow for the overlapping encoding of signals in addition to the amino acid sequence, but it is not known to what extent genomes exploit this potential and, if so, for what purpose. Here, we systematically explore whether protein-coding regions accommodate overlapping codes, by comparing the number of occurrences of each possible short sequence within the protein-coding regions of over 700 species from viruses to plants, to the same number in randomizations that preserve amino acid sequence and codon bias. We find that coding regions across all phyla encode additional information, with bacteria carrying more information than eukaryotes. The detailed signals consist of both known and potentially novel codes, including position-dependent secondary RNA structure, bacteria-specific depletion of transcription and translation initiation signals, and eukaryote-specific enrichment of microRNA target sites. Our results suggest that genomes may have evolved to encode extensive overlapping information within protein-coding regions.
Nili E. L., Field Y., Lubling Y., Widom J., Oren M. & Segal E.
(2010)
Genome Research.
20,
10,
p. 1361-1368
The human transcription factor TP53 is a pivotal roadblock against cancer. A key unresolved question is how the p53 protein selects its genomic binding sites in vivo out of a large pool of potential consensus sites. We hypothesized that chromatin may play a significant role in this site-selection process. To test this, we used a custom DNA microarray to measure p53 binding at approximately 2000 sites predicted to possess high-sequence specificity, and identified both strongly bound and weakly bound sites. When placed within a plasmid, weakly bound sites become p53 responsive and regain p53 binding when stably integrated into random genomic locations. Notably, strongly bound sites reside preferentially within genomic regions whose DNA sequence is predicted to encode relatively high intrinsic nucleosome occupancy. Using in vivo nucleosome occupancy measurements under conditions where p53 is inactive, we experimentally confirmed this prediction. Furthermore, upon p53 activation, nucleosomes are partially displaced from a relatively broad region surrounding the bound p53 sites, and this displacement is rapidly reversed upon inactivation of p53. Thus, in contrast to the general assumption that transcription-factor binding is preferred in sites that have low nucleosome occupancy prior to factor activation, we find that p53 binding occurs preferentially within a chromatin context of high intrinsic nucleosome occupancy.
Kertesz M., Wan Y., Mazor E., Rinn J. L., Nutter R. C., Chang H. Y. & Segal E.
(2010)
Nature.
467,
7311,
p. 103-107
The structures of RNA molecules are often important for their function and regulation1-6, yet there are no experimental techniques for genome-scale measurement of RNA structure. Here we describe a novel strategy termed parallel analysis of RNA structure (PARS), which is based on deep sequencing fragments of RNAs that were treated with structure-specific enzymes, thus providing simultaneous in vitro profiling of the secondary structure of thousands of RNA species at single nucleotide resolution. We apply PARS to profile the secondary structure of the messenger RNAs (mRNAs) of the budding yeast Saccharomyces cerevisiae and obtain structural profiles for over 3,000 distinct transcripts. Analysis of these profiles reveals several RNA structural properties of yeast transcripts, including the existence of more secondary structure over coding regions compared with untranslated regions, a three-nucleotide periodicity of secondary structure across coding regions and an anti-correlation between the efficiency with which an mRNA is translated and the structure over its translation start site. PARS is readily applicable to other organisms and to profiling RNA structure in diverse conditions, thus enabling studies of the dynamics of secondary structure at a genomic scale.
Tsai M., Manor O., Wan Y., Mosammaparast N., Wang J. K., Lan F., Shi Y., Segal E. & Chang H. Y.
(2010)
Science.
329,
5992,
p. 689-693
Long intergenic noncoding RNAs (lincRNAs) regulate chromatin states and epigenetic inheritance. Here, we show that the lincRNA HOTAIR serves as a scaffold for at least two distinct histone modification complexes. A 5 domain of HOTAIR binds polycomb repressive complex 2 (PRC2), whereas a 3 domain of HOTAIR binds the LSD1/CoREST/REST complex. The ability to tether two distinct complexes enables RNA-mediated assembly of PRC2 and LSD1 and coordinates targeting of PRC2 and LSD1 to chromatin for coupled histone H3 lysine 27 methylation and lysine 4 demethylation. Our results suggest that lincRNAs may serve as scaffolds by providing binding surfaces to assemble select histone modification enzymes, thereby specifying the pattern of histone modifications on target genes.
Kaplan N., Moore I., Fondufe-Mittendorf Y., Gossett A. J., Tillo D., Field Y., Hughes T. R., Lieb J. D., Widom J. & Segal E.
(2010)
Nature Structural & Molecular Biology.
17,
8,
p. 918-920
Tillo D., Kaplan N., Moore I. K., Fondufe-Mittendorf Y., Gossett A. J., Field Y., Lieb J. D., Widom J., Segal E. & Hughes T. R.
(2010)
PLoS ONE.
5,
2,
e9129.
Active eukaryotic regulatory sites are characterized by open chromatin, and yeast promoters and transcription factor binding sites (TFBSs) typically have low intrinsic nucleosome occupancy. Here, we show that in contrast to yeast, DNA at human promoters, enhancers, and TFBSs generally encodes high intrinsic nucleosome occupancy. In most cases we examined, these elements also have high experimentally measured nucleosome occupancy in vivo. These regions typically have high G+C content, which correlates positively with intrinsic nucleosome occupancy, and are depleted for nucleosome-excluding poly-A sequences. We propose that high nucleosome preference is directly encoded at regulatory sequences in the human genome to restrict access to regulatory information that will ultimately be utilized in only a subset of differentiated cells.
Basu A., Rose K. L., Zhang J., Beavis R. C., Ueberheide B., Garcia B. A., Chait B., Zhao Y., Hunt D. F., Segal E., Allis C. D. & Hake S. B.
(2009)
Proceedings of the National Academy of Sciences of the United States of America.
106,
33,
p. 13785-13790
Acetylation is a well-studied posttranslational modification that has been associated with a broad spectrum of biological processes, notably gene regulation. Many studies have contributed to our knowledge of the enzymology underlying acetylation, including efforts to understand the molecular mechanism of substrate recognition by several acetyltransferases, but traditional experiments to determine intrinsic features of substrate site specificity have proven challenging. Here,wecombine experimental methods with clustering analysis of protein sequences to predict protein acetylation based on the sequence characteristics of acetylated lysines within histones with our unique prediction tool PredMod. We define a local amino acid sequence composition that represents potential acetylation sites by implementing a clustering analysis of histone and nonhistone sequences. We show that this sequence composition has predictive power on 2 independent experimental datasets of acetylation marks. Finally, we detect acetylation for selected putative substrates using mass spectrometry, and report several nonhistone acetylated substrates in budding yeast. Our approach, combined with more traditional experimental methods, may be useful for identifying acetylated substrates proteome-wide.
Segal E. & Widom J.
(2009)
Trends in Genetics.
25,
8,
p. 335-343
The DNA of eukaryotic genomes is wrapped in nucleosomes, which strongly distort and occlude the DNA from access to most DNA-binding proteins. An understanding of the mechanisms that control nucleosome positioning along the DNA is thus essential to understanding the binding and action of proteins that carry out essential genetic functions. New genome-wide data on in vivo and in vitro nucleosome positioning greatly advance our understanding of several factors that can influence nucleosome positioning, including DNA sequence preferences, DNA methylation, histone variants and post-translational modifications, higher order chromatin structure, and the actions of transcription factors, chromatin remodelers and other DNA-binding proteins. We discuss how these factors function and ways in which they might be integrated into a unified framework that accounts for both the preservation of nucleosome positioning and the dynamic nucleosome repositioning that occur across biological conditions, cell types, developmental processes and disease.
Raveh - Sadka -. S. T., Levo M. & Segal E.
(2009)
Genome Research.
19,
8,
p. 1480-1496
Transcriptional control is central to many cellular processes, and, consequently, much effort has been devoted to understanding its underlying mechanisms. The organization of nucleosomes along promoter regions is important for this process, sincemost transcription factors cannot bind nucleosomal sequences and thus compete with nucleosomes for DNA access. This competition is governed by the relative concentrations of nucleosomes and transcription factors and by their respective sequence binding preferences. However, despite its importance, a mechanistic understanding of the quantitative effects that the competition between nucleosomes and factors has on transcription is still missing. Here we use a thermodynamic framework based on fundamental principles of statistical mechanics to explore theoretically the effect that different nucleosome organizations along promoters have on the activation dynamics of promoters in response to varying concentrations of the regulating factors. We show that even simple landscapes of nucleosome organization reproduce experimental results regarding the effect of nucleosomes as general repressors and as generators of obligate binding cooperativity between factors. Our modeling framework also allows us to characterize the effects that various sequence elements of promoters have on the induction threshold and on the shape of the promoter activation curves. Finally,we showthat using only sequence preferences for nucleosomes and transcription factors, our model can also predict expression behavior of real promoter sequences, thereby underscoring the importance of the interplay between nucleosomes and factors in determining expression kinetics.
Segal E. & Widom J.
(2009)
Nature Reviews Genetics.
10,
7,
p. 443-456
Complex transcriptional behaviours are encoded in the DNA sequences of gene regulatory regions. Advances in our understanding of these behaviours have been recently gained through quantitative models that describe how molecules such as transcription factors and nucleosomes interact with genomic sequences. An emerging view is that every regulatory sequence is associated with a unique binding affinity landscape for each molecule and, consequently, with a unique set of molecule-binding configurations and transcriptional outputs. We present a quantitative framework based on existing methods that unifies these ideas. This framework explains many experimental observations regarding the binding patterns of factors and nucleosomes and the dynamics of transcriptional activation. It can also be used to model more complex phenomena such as transcriptional noise and the evolution of transcriptional regulation.
Field Y., Fondufe-Mittendorf Y., Moore I. K., Mieczkowski P., Kaplan N., Lubling Y., Lieb J. D., Widom J. & Segal E.
(2009)
Nature Genetics.
41,
4,
p. 438-445
Eukaryotic transcription occurs within a chromatin environment, whose organization has an important regulatory function and is partly encoded in cis by the DNA sequence itself. Here, we examine whether evolutionary changes in gene expression are linked to changes in the DNA-encoded nucleosome organization of promoters. We find that in aerobic yeast species, where cellular respiration genes are active under typical growth conditions, the promoter sequences of these genes encode a relatively open (nucleosome-depleted) chromatin organization. This nucleosome-depleted organization requires only DNA sequence information, is independent of any cofactors and of transcription, and is a general property of growth-related genes. In contrast, in anaerobic yeast species, where cellular respiration genes are relatively inactive under typical growth conditions, respiration gene promoters encode relatively closed (nucleosome-occupied) chromatin organizations. Our results suggest a previously unidentified genetic mechanism underlying phenotypic diversity, consisting of DNA sequence changes that directly alter the DNA-encoded nucleosome organization of promoters.
Kaplan N., Moore I. K., Fondufe-Mittendorf Y., Gossett A. J., Tillo D., Field Y., LeProust E. M., Hughes T. R., Lieb J. D., Widom J. & Segal E.
(2009)
Nature.
458,
7236,
p. 362-366
Nucleosome organization is critical for gene regulation. In living cells this organization is determined by multiple factors, including the action of chromatin remodellers, competition with site-specific DNA-binding proteins, and the DNA sequence preferences of the nucleosomes themselves. However, it has been difficult to estimate the relative importance of each of these mechanisms in vivo, because in vivo nucleosome maps reflect the combined action of all influencing factors. Here we determine the importance of nucleosome DNA sequence preferences experimentally by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map, in which nucleosome occupancy is governed only by the intrinsic sequence preferences of nucleosomes, is similar to in vivo nucleosome maps generated in three different growth conditions. In vitro, nucleosome depletion is evident at many transcription factor binding sites and around gene start and end sites, indicating that nucleosome depletion at these sites in vivo is partly encoded in the genome. We confirm these results with a micrococcal nuclease-independent experiment that measures the relative affinity of nucleosomes for ∼40,000 double-stranded 150-base-pair oligonucleotides. Using our in vitro data, we devise a computational model of nucleosome sequence preferences that is significantly correlated with in vivo nucleosome occupancy in Caenorhabditis elegans. Our results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization of nucleosomes in vivo.
Segal E. & Widom J.
(2009)
Current Opinion in Structural Biology.
19,
1,
p. 65-71
Homopolymeric stretches of deoxyadenosine nucleotides (A's) on one strand of double-stranded DNA, referred to as poly(dA:dT) tracts or A-tracts, are overabundant in eukaryotic genomes. They have unusual structural, dynamic, and mechanical properties, and may resist sharp bending. Such unusual material properties, together with their overabundance in eukaryotes, raised the possibility that poly(dA:dT) tracts might function in eukaryotes to influence the organization of nucleosomes at many genomic regions. Recent genome-wide studies strongly confirm these ideas and suggest that these tracts play major roles in chromatin organization and genome function. Here we review what is known about poly(dA:dT) tracts and how they work.
Lubliner S. & Segal E.
(2009)
Bioinformatics.
25,
12,
p. i348-i355
Motivation: Understanding the mechanisms that govern nucleosome positioning over genomes in vivo is essential for unraveling the role of chromatin organization in transcriptional regulation. Until now, models for predicting genome-wide nucleosome occupancy have assumed that the DNA associations of neighboring nucleosomes on the genome are independent. We present a new model that relaxes this independence assumption by modeling interactions between adjacent nucleosomes. Results: We show that modeling interactions between adjacent nucleosomes improves genome-wide nucleosome occupancy predictions in an in vitro system that includes only nucleosomes and purified DNA, where the resulting model has a preference for short spacings (linkers) of less than 20 bp in length between neighboring nucleosomes. Since nucleosome occupancy in vitro depends only on properties intrinsic to nucleosomes, these results suggest that the interactions we find are intrinsic to nucleosomes and do not depend on other factors, such as transcription factors and chromatin remodelers. We also show that modeling these intrinsic interactions significantly improves genome-wide predictions of nucleosome occupancy in vivo.
Wong D. J., Segal E. & Chang H. Y.
(2008)
Cell Cycle.
7,
23,
p. 3622-3624
The ability of cancers to grow indefinitely has fueled the idea that cancer and stem cells may have common underlying mechanisms. Detailed gene expression maps have now shown the diversity and distinctiveness in gene expression programs associated with stemness in embryonic and adult stem cells. These maps have further revealed a shared transcriptional program in embryonic stem cells (ESC) and cancer stem cells. Surprisingly, forced activation of an ESC-like gene expression program in adult epithelial cells can reprogram them into human cancer stem cells and achieve pathologic self-renewal. The ability to create induced cancer stem cells (iCSC) may provide opportunities to better define the biology of cancer stem cells in order to trace or eliminate them in human patients.
Field Y., Kaplan N., Fondufe-Mittendorf Y., Moore I. K., Sharon E., Lubling Y., Widom J. & Segal E.
(2008)
PLoS Computational Biology.
4,
11,
e1000216.
The detailed positions of nucleosomes profoundly impact gene regulation and are partly encoded by the genomic DNA sequence. However, less is known about the functional consequences of this encoding. Here, we address this question using a genome-wide map of ∼380,000 yeast nucleosomes that we sequenced in their entirety. Utilizing the high resolution of our map, we refine our understanding of how nucleosome organizations are encoded by the DNA sequence and demonstrate that the genomic sequence is highly predictive of the in vivo nucleosome organization, even across new nucleosome-bound sequences that we isolated from fly and human. We find that Poly(dA:dT) tracts are an important component of these nucleosome positioning signals and that their nucleosome-disfavoring action results in large nucleosome depletion over them and over their flanking regions and enhances the accessibility of transcription factors to their cognate sites. Our results suggest that the yeast genome may utilize these nucleosome positioning signals to regulate gene expression with different transcriptional noise and activation kinetics and DNA replication with different origin efficiency. These distinct functions may be achieved by encoding both relatively closed (nucleosome-covered) chromatin organizations over some factor binding sites, where factors must compete with nucleosomes for DNA access, and relatively open (nucleosome-depleted) organizations over other factor sites, where factors bind without competition.
Rabani M., Kertesz M. & Segal E.
(2008)
Proceedings of the National Academy of Sciences of the United States of America.
105,
39,
p. 14885-14890
Messenger RNA molecules are tightly regulated, mostly through interactions with proteins and other RNAs, but the mechanisms that confer the specificity of such interactions are poorly understood. It is clear, however, that this specificity is determined by both the nucleotide sequence and secondary structure of the mRNA. Here, we develop RNApromo, an efficient computational tool for identifying structural elements within mRNAs that are involved in specifying posttranscriptional regulations. By analyzing experimental data on mRNA decay rates, we identify common structural elements in fast-decaying and slow-decaying mRNAs and link them with binding preferences of several RNA binding proteins. We also predict structural elements in sets of mRNAs with common subcellular localization in mouse neurons and fly embryos. Finally, by analyzing pre-microRNA stem-loops, we identify structural differences between pre-microRNAs of animals and plants, which provide insights into the mechanism of microRNA biogenesis. Together, our results reveal unexplored layers of posttranscriptional regulations in groups of RNAs and are therefore an important step toward a better understanding of the regulatory information conveyed within RNA molecules. Our new RNA motif discovery tool is available online.
Wang J., Fondufe-Mittendorf Y., Xi L., Tsai G., Segal E. & Widom J.
(2008)
PLoS Computational Biology.
4,
9,
The exact lengths of linker DNAs connecting adjacent nucleosomes specify the intrinsic three- dimensional structures of eukaryotic chromatin fibers. Some studies suggest that linker DNA lengths preferentially occur at certain quantized values, differing one from another by integral multiples of the DNA helical repeat, similar to 10 bp; however, studies in the literature are inconsistent. Here, we investigate linker DNA length distributions in the yeast Saccharomyces cerevisiae genome, using two novel methods: a Fourier analysis of genomic dinucleotide periodicities adjacent to experimentally mapped nucleosomes and a duration hidden Markov model applied to experimentally defined dinucleosomes. Both methods reveal that linker DNA lengths in yeast are preferentially periodic at the DNA helical repeat (similar to 10 bp), obeying the forms 10n+5 bp ( integer n). This 10 bp periodicity implies an ordered superhelical intrinsic structure for the average chromatin fiber in yeast.
Sharon E., Lubliner S. & Segal E.
(2008)
PLoS Computational Biology.
4,
8,
e1000154.
Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF-DNA interactions, based on log-linear models. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model and devise an algorithm for learning its structural features from binding site data. We also developed a discriminative motif finder, which discovers de novo FMMs that are enriched in target sets of sequences compared to background sets. We evaluate our approach on synthetic data and on the widely used TF chromatin immunoprecipitation (ChIP) dataset of Harbison et al. We then apply our algorithm to high-throughput TF ChIP data from mouse and human, reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs. Our FMM learning and motif finder software are available at http://genie.weizmann.ac.il/.
Itzkovitz S., Baruch L., Shapiro E. & Segal E.
(2008)
Proceedings of the National Academy of Sciences of the United States of America.
105,
27,
p. 9278-9283
The nervous system contains trillions of neurons, each forming thousands of synaptic connections. It has been suggested that this complex connectivity is determined by a synaptic "adhesive code," where connections are dictated by a variable set of cell surface proteins, combinations of which form neuronal addresses. The estimated number of neuronal addresses is orders of magnitude smaller than the number of neurons. Here, we show that the limited number of addresses dictates constraints on the possible neuronal network topologies. We show that to encode arbitrary networks, in which each neuron can potentially connect to any other neuron, the number of neuronal addresses needed scales linearly with network size. In contrast, the number of addresses needed to encode the wiring of geometric networks grows only as the square root of network size. The more efficient encoding in geometric networks is achieved through the reutilization of the same addresses in physically independent portions of the network. We also find that ordered geometric networks, in which the same connectivity patterns are iterated throughout the network, further reduce the required number of addresses. We demonstrate our findings using simulated networks and the C. elegans neuronal network. Geometric neuronal connectivity with recurring connectivity patterns have been suggested to confer an evolutionary advantage by saving biochemical resources on the one hand and reutilizing functionally efficient neuronal circuits. Our study suggests an additional advantage of these prominent topological features - the facilitation of the ability to genetically encode neuronal networks given constraints on the number of addresses.
Baruch L., Itzkovitz S., Golan Mashiach M. M., Shapiro E. & Segal E.
(2008)
PLoS Computational Biology.
4,
7,
e1000120.
Synaptic wiring of neurons in Caenorhabditis elegans is largely invariable between animals. It has been suggested that this feature stems from genetically encoded molecular markers that guide the neurons in the final stage of synaptic formation. Identifying these markers and unraveling the logic by which they direct synapse formation is a key challenge. Here, we address this task by constructing a probabilistic model that attempts to explain the neuronal connectivity diagram of C. elegans as a function of the expression patterns of its neurons. By only considering neuron pairs that are known to be connected by chemical or electrical synapses, we focus on the final stage of synapse formation, in which neurons identify their designated partners. Our results show that for many neurons the neuronal expression map of C. elegans can be used to accurately predict the subset of adjacent neurons that will be chosen as its postsynaptic partners. Notably, these predictions can be achieved using the expression patterns of only a small number of specific genes that interact in a combinatorial fashion.
Wasserstrom A., Frumkin D., Adar R., Itzkovitz S., Stern T., Kaplan S., Shefer G., Shur I., Zangi L., Reizel Y., Harmelin A., Dor Y., Dekel N., Reisner Y., Benayahu D., Tzahor E., Segal E. & Shapiro E.
(2008)
PLoS Computational Biology.
4,
5,
1000058.
The depth of a cell of a multicellular organism is the number of cell divisions it underwent since the zygote, and knowing this basic cell property would help address fundamental problems in several areas of biology. At present, the depths of the vast majority of human and mouse cell types are unknown. Here, we show a method for estimating the depth of a cell by analyzing somatic mutations in its microsatellites, and provide to our knowledge for the first time reliable depth estimates for several cells types in mice. According to our estimates, the average depth of oocytes is 29, consistent with previous estimates. The average depth of B cells ranges from 34 to 79, linearly related to the mouse age, suggesting a rate of one cell division per day. In contrast, various types of adult stem cells underwent on average fewer cell divisions, supporting the notion that adult stem cells are relatively quiescent. Our method for depth estimation opens a window for revealing tissue turnover rates in animals, including humans, which has important implications for our knowledge of the body under physiological and pathological conditions.
Wasserstrom A., Adar R., Shefer G., Frumkin D., Itzkovitz S., Stern T., Shur I., Zangi L., Kaplan S., Harmelin A., Reisner Y., Benayahu D., Tzahor E., Segal E. & Shapiro E.
(2008)
PLoS ONE.
3,
4,
1939.
The cell lineage tree of a multicellular organism represents its history of cell divisions from the very first cell, the zygote. A new method for high-resolution reconstruction of parts of such cell lineage trees was recently developed based on phylogenetic analysis of somatic mutations accumulated during normal development of an organism. In this study we apply this method in mice to reconstruct the lineage trees of distinct cell types. We address for the first time basic questions in developmental biology of higher organisms, namely what is the correlation between the lineage relation among cells and their (1) function, (2) physical proximity and (3) anatomical proximity. We analyzed B-cells, kidney-, mesenchymal- and hematopoietic-stem cells, as well as satellite cells, which are adult skeletal muscle stem cells isolated from their niche on the muscle fibers (myofibers) from various skeletal muscles. Our results demonstrate that all analyzed cell types are intermingled in the lineage tree, indicating that none of these cell types are single exclusive clones. We also show a significant correlation between the physical proximity of satellite cells within muscles and their lineage. Furthermore, we show that satellite cells obtained from a single myofiber are significantly clustered in the lineage tree, reflecting their common developmental origin. Lineage analysis based on somatic mutations enables performing high resolution reconstruction of lineage trees in mice and humans, which can provide fundamental insights to many aspects of their development and tissue maintenance.
Wong D. J., Liu H., Ridky T. W., Cassarino D., Segal E. & Chang H. Y.
(2008)
Cell Stem Cell.
2,
4,
p. 333-344
Self-renewal is a hallmark of stem cells and cancer, but existence of a shared stemness program remains controversial. Here, we construct a gene module map to systematically relate transcriptional programs in embryonic stem cells (ESCs), adult tissue stem cells, and human cancers. This map reveals two predominant gene modules that distinguish ESCs and adult tissue stem cells. The ESC-like transcriptional program is activated in diverse human epithelial cancers and strongly predicts metastasis and death. c-Myc, but not other oncogenes, is sufficient to reactivate the ESC-like program in normal and cancer cells. In primary human keratinocytes transformed by Ras and IκBα, c-Myc increases the fraction of tumor-initiating cells by 150-fold, enabling tumor formation and serial propagation with as few as 500 cells. c-Myc-enhanced tumor initiation is cell-autonomous and independent of genomic instability. Thus, activation of an ESC-like transcriptional program in differentiated adult cells may induce pathologic self-renewal characteristic of cancer stem cells.
Adler A. S., Kawahara T. L., Segal E. & Chang H. Y.
(2008)
Cell Cycle.
7,
5,
p. 556-559
Genetic studies in model organisms such as yeast, worms, flies and mice leading to lifespan extension suggest that longevity is subject to regulation. In addition, various system-wide interventions in old animals can reverse features of aging. To better understand these processes, much effort has been put into the study of aging on a molecular level. In particular, genome-wide microarray analysis of differently aged individual organisms or tissues has been used to track the global expression changes that occur during normal aging. Although these studies consistently implicate specific pathways in aging processes, there is little conservation between the individual genes that change. To circumvent this problem, we have recently developed a novel computational approach to discover transcription factors that may be responsible for driving global expression changes with age. We identified the transcription factor NF?B as a candidate activator of aging-related transcriptional changes in multiple human and mouse tissues. Genetic blockade of NF?B in the skin of chronologically aged mice reversed the global gene expression program and tissue characteristics to those of young mice, demonstrating for the first time that disruption of a single gene is sufficient to reverse features of aging, at least for the short-term.
Sinha S., Adler A. S., Field Y., Chang H. Y. & Segal E.
(2008)
Genome Research.
18,
3,
p. 477-488
A large number of cis-regulatory motifs involved in transcriptional control have been identified, but the regulatory context and biological processes in which many of them function are unknown. Here, we computationally identify the sets of human core promoters targeted by motifs, and systematically characterize their function by using a robust gene-set-based approach and diverse sources of biological data. We find that the target sets of most motifs contain both genes with similar function and genes that are coregulated in vivo, thereby suggesting both the biological process regulated by the motifs and the conditions in which this regulation may occur. Our analysis also identifies many motifs whose target sets are predicted to be regulated by a common microRNA, suggesting a connection between transcriptional and post-transcriptional control processes. Finally, we predict novel roles for uncharacterized motifs in the regulation of specific biological processes and certain types of human cancer, and experimentally validate four such predictions, suggesting regulatory roles for four uncharacterized motifs in cell cycle progression. Our analysis thus provides a concrete framework for uncovering the biological function of cis-regulatory motifs genome wide.
Segal E., Raveh - Sadka -. S. T., Schroeder M., Unnerstall U. & Gaul U.
(2008)
Nature.
451,
7178,
p. 535-540
The establishment of complex expression patterns at precise times and locations is key to metazoan development, yet a mechanistic understanding of the underlying transcription control networks is still missing. Here we describe a novel thermodynamic model that computes expression patterns as a function of cis-regulatory sequence and of the binding-site preferences and expression of participating transcription factors. We apply this model to the segmentation gene network of Drosophila melanogaster and find that it predicts expression patterns of cis-regulatory modules with remarkable accuracy, demonstrating that positional information is encoded in the regulatory sequence and input factor distribution. Our analysis reveals that both strong and weaker binding sites contribute, leading to high occupancy of the module DNA, and conferring robustness against mutation; short-range homotypic clustering of weaker sites facilitates cooperative binding, which is necessary to sharpen the patterns. Our computational framework is generally applicable to most protein-DNA interaction systems.
Wong D. J., Nuyten D. S., Regev A., Lin M., Adler A. S., Segal E., Van De Vijver M. J. & Chang H. Y.
(2008)
Cancer Research.
68,
2,
p. 369-378
A major goal of cancer research is to match specific therapies to molecular targets in cancer. Genome-scale expression profiling has identified new subtypes of cancer based on consistent patterns of variation in gene expression, leading to improved prognostic predictions. However, how these new genetic subtypes of cancers should be treated is unknown. Here, we show that a gene module map can guide the prospective identification of targeted therapies for genetic subtypes of cancer. By visualizing genome-scale gene expression in cancer as combinations of activated and deactivated functional modules, gene module maps can reveal specific functional pathways associated with each subtype that might be susceptible to targeted therapies. We show that in human breast cancers, activation of a poor-prognosis "wound signature" is strongly associated with induction of both a mitochondria gene module and a proteasome gene module. We found that 3-bromopyruvic acid, which inhibits glycolysis, selectively killed breast cells expressing the mitochondria and wound signatures. In addition, inhibition of proteasome activity by bortezomib, a drug approved for human use in multiple myeloma, abrogated wound signature expression and selectively killed breast cells expressing the wound signature. Thus, gene module maps may enable rapid translation of complex genomic signatures in human disease to targeted therapeutic strategies.
Minsky N., Shema E., Field Y., Schuster M., Segal E. & Oren M.
(2008)
Nature Cell Biology.
10,
4,
p. 483-488
Histone modifications have emerged as important regulators of transcription. Histone H2B monoubiquitination has also been implicated in transcription; however, better understanding of the biological significance of this modification in mammalian cells has been hindered by the lack of suitable reagents, particularly antibodies capable of specifically recognizing ubiquitinated H2B (ubH2B). Here, we report the generation of anti-ubH2B monoclonal antibodies using a branched peptide as immunogen. These antibodies provide a powerful tool for exploring the biochemical functions of H2B monoubiquitination at both a genome-wide and gene-specific level. Application of these antibodies in high resolution chromatin immunoprecipitation (ChIP)-chip experiments in human cells, using tiling arrays, revealed preferential association of ubiquitinated H2B with the transcribed regions of highly expressed genes. Unlike dimethylated H3K4, ubH2B was not associated with distal promoter regions. Furthermore, experimental modulation of the transcriptional activity of the tumour suppressor p53 was accompanied by rapid changes in the H2B ubiquitination status of its p21 target gene, attesting to the dynamic nature of this process. It has recently been demonstrated that the apparent extent of gene expression often reflects elongation rather than initiation rates; thus, our findings suggest that H2B ubiquitination is intimately linked with global transcriptional elongation in mammalian cells.
Shalem O., Dahan O., Levo M., Martinez M. R., Furman I., Segal E. & Pilpel Y.
(2008)
Molecular Systems Biology.
4,
4.
The state of the transcriptome reflects a balance between mRNA production and degradation. Yet how these two regulatory arms interact in shaping the kinetics of the transcriptome in response to environmental changes is not known. We subjected yeast to two stresses, one that induces a fast and transient response, and another that triggers a slow enduring response. We then used microarrays following transcriptional arrest to measure genome-wide decay profiles under each condition. We found condition-specific changes in mRNA decay rates and coordination between mRNA production and degradation. In the transient response, most induced genes were surprisingly destabilized, whereas repressed genes were somewhat stabilized, exhibiting counteraction between production and degradation. This strategy can reconcile high steady-state level with short response time among induced genes. In contrast, the stress that induces the slow response displays the more expected behavior, whereby most induced genes are stabilized, and repressed genes are destabilized. Our results show genome-wide interplay between mRNA production and degradation, and that alternative modes of such interplay determine the kinetics of the transcriptome in response to stress.
Adler A. S., Sinha S., Kawahara T. L., Zhang J. Y., Segal E. & Chang H. Y.
(2007)
GENES & DEVELOPMENT.
21,
24,
p. 3244-3257
Aging is characterized by specific alterations in gene expression, but their underlying mechanisms and functional consequences are not well understood. Here we develop a systematic approach to identify combinatorial cis-regulatory motifs that drive age-dependent gene expression across different tissues and organisms. Integrated analysis of 365 microarrays spanning nine tissue types predicted fourteen motifs as major regulators of age-dependent gene expression in human and mouse. The motif most strongly associated with aging was that of the transcription factor NF-κB. Inducible genetic blockade of NF-κB for 2 wk in the epidermis of chronologically aged mice reverted the tissue characteristics and global gene expression programs to those of young mice. Age-specific NF-κB blockade and orthogonal cell cycle interventions revealed that NF-κB controls cell cycle exit and gene expression signature of aging in parallel but not sequential pathways. These results identify a conserved network of regulatory pathways underlying mammalian aging and show that NF-κB is continually required to enforce many features of aging in a tissue-specific manner.
Kertesz M., Iovino N., Unnerstall U., Gaul U. & Segal E.
(2007)
Nature Genetics.
39,
10,
p. 1278-1284
MicroRNAs are key regulators of gene expression, but the precise mechanisms underlying their interaction with their mRNA targets are still poorly understood. Here, we systematically investigate the role of target-site accessibility, as determined by base-pairing interactions within the mRNA, in microRNA target recognition. We experimentally show that mutations diminishing target accessibility substantially reduce microRNA-mediated translational repression, with effects comparable to those of mutations that disrupt sequence complementarity. We devise a parameter-free model for microRNA-target interaction that computes the difference between the free energy gained from the formation of the microRNA-target duplex and the energetic cost of unpairing the target to make it accessible to the microRNA. This model explains the variability in our experiments, predicts validated targets more accurately than existing algorithms, and shows that genomes accommodate site accessibility by preferentially positioning targets in highly accessible regions. Our study thus demonstrates that target accessibility is a critical factor in microRNA function.
Wang H., Segal E., Ben-Hur A., Li Q. R., Vidal M. & Koller D.
(2007)
GENOME BIOLOGY.
8,
9,
R192.
We propose InSite, a computational method that integrates high-throughput protein and sequence data to infer the specific binding regions of interacting protein pairs. We compared our predictions with binding sites in Protein Data Bank and found significantly more binding events occur at sites we predicted. Several regions containing disease-causing mutations or cancer polymorphisms in human are predicted to be binding for protein pairs related to the disease, which suggests novel mechanistic hypotheses for several diseases.
Achiron A., Gurevich M., Snir Y., Segal E. & Mandel M.
(2007)
Clinical and Experimental Immunology.
149,
2,
p. 235-242
Multiple sclerosis (MS) is a demyelinating disease characterized by an unpredictable clinical course with intermittent relapses that lead over time to significant neurological disability. Clinical and radiological variables are limited in the ability to predict disease course. Peripheral blood genome scale analyses were used to characterize MS patients with different disease types, but not for prediction of outcome. Using complementary-DNA microarrays we studied peripheral-blood gene expression patterns in 53 relapsing-remitting MS patients. Patients were classified into good, intermediate and poor clinical outcome established after 2-year follow-up. A training set of 26 samples was used to identify clinical outcome differentiating gene-expression signature. Supervised learning and feature selection algorithms were applied to identify a predictive signature that was validated in an independent group of 27 patients. Key genes within the predictive signature were confirmed by quantitative reverse transcription-polymerase chain reaction in an additional 10 patients. The analysis identified 431 differentiating genes between patients with good and poor clinical outcome (change in neurological disability by the expanded disability status scale was -0·33 ± 0·24 and 1·6 ± 0·35, P = 0·0002, total number of relapses were 0 and 1·80 ± 0·35, P = 0·00009, respectively). An optimal set of 29 genes was depicted as a clinical outcome predictive gene expression signature and classified appropriately 88·9% of patients. This predictive signature was enriched by genes related biologically to zinc-ion binding and cytokine activity regulation pathways involved in inflammation and apoptosis. Our findings provide a basis for monitoring patients by prediction of disease outcome and can be incorporated into clinical decision-making in relapsing-remitting MS.
Rinn J. L., Kertesz M., Wang J. K., Squazzo S. L., Xu X., Brugmann S. A., Goodnough L. H., Helms J. A., Farnham P. J., Segal E. & Chang H. Y.
(2007)
Cell.
129,
7,
p. 1311-1323
Noncoding RNAs (ncRNA) participate in epigenetic regulation but are poorly understood. Here we characterize the transcriptional landscape of the four human HOX loci at five base pair resolution in 11 anatomic sites and identify 231 HOX ncRNAs that extend known transcribed regions by more than 30 kilobases. HOX ncRNAs are spatially expressed along developmental axes and possess unique sequence motifs, and their expression demarcates broad chromosomal domains of differential histone methylation and RNA polymerase accessibility. We identified a 2.2 kilobase ncRNA residing in the HOXC locus, termed HOTAIR, which represses transcription in trans across 40 kilobases of the HOXD locus. HOTAIR interacts with Polycomb Repressive Complex 2 (PRC2) and is required for PRC2 occupancy and histone H3 lysine-27 trimethylation of HOXD locus. Thus, transcription of ncRNA may demarcate chromosomal domains of gene silencing at a distance; these results have broad implications for gene regulation in development and disease states.
Segal E., Sirlin C. B., Ooi C., Adler A. S., Gollub J., Chen X., Chan B. K., Matcuk G. R., Barry C. T., Chang H. Y. & Kuo M. D.
(2007)
Nature biotechnology.
25,
6,
p. 675-680
Paralleling the diversity of genetic and protein activities, pathologic human tissues also exhibit diverse radiographic features. Here we show that dynamic imaging traits in non-invasive computed tomography (CT) systematically correlate with the global gene expression programs of primary human liver cancer. Combinations of twenty-eight imaging traits can reconstruct 78% of the global gene expression profiles, revealing cell proliferation, liver synthetic function, and patient prognosis. Thus, genomic activity of human liver cancers can be decoded by noninvasive imaging, thereby enabling noninvasive, serial and frequent molecular profiling for personalized medicine.
Liu H., Adler A. S., Segal E. & Chang H. Y.
(2007)
PLoS Genetics.
3,
6,
p. 996-1008
The balance of quiescence and cell division is critical for tissue homeostasis and organismal health. Serum stimulation of fibroblasts is well studied as a classic model of entry into the cell division cycle, but the induction of cellular quiescence, such as by serum deprivation (SD), is much less understood. Here we show that SS and SD activate distinct early transcriptional responses genome-wide that converge on a late symmetric transcriptional program. Several serum deprivation early response genes (SDERGs), including the putative tumor suppressor genes SALL2 and MXI1, are required for cessation of DNA synthesis in response to SD and induction of additional SD genes. SDERGs are coordinately repressed in many types of human cancers compared to their normal counterparts, and repression of SDERGs predicts increased risk of cancer progression and death in human breast cancers. These results identify a gene expression program uniquely responsive to loss of growth factor signaling; members of SDERGs may constitute novel growth inhibitors that prevent cancer.
Amit I., Citri A., Shay T., Lu Y., Katz M., Zhang F., Tarcic G., Siwak D., Lahad J., Jacob-Hirsch J., Amariglio N., Vaisman N., Segal E., Rechavi G., Alon U., Mills G. B., Domany E. & Yarden Y.
(2007)
Nature Genetics.
39,
4,
p. 503-512
Signaling pathways invoke interplays between forward signaling and feedback to drive robust cellular response. In this study, we address the dynamics of growth factor signaling through profiling of protein phosphorylation and gene expression, demonstrating the presence of a kinetically defined cluster of delayed early genes that function to attenuate the early events of growth factor signaling. Using epidermal growth factor receptor signaling as the major model system and concentrating on regulation of transcription and mRNA stability, we demonstrate that a number of genes within the delayed early gene cluster function as feedback regulators of immediate early genes. Consistent with their role in negative regulation of cell signaling, genes within this cluster are downregulated in diverse tumor types, in correlation with clinical outcome. More generally, our study proposes a mechanistic description of the cellular response to growth factors by defining architectural motifs that underlie the function of signaling networks.
A feature-based approach to modeling protein-DNA interactions
Sharon E. & Segal E.
(2007)
RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS.
4453,
p. 77-91
Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matriX (PSSM), which assumes independence between binding positions. In many cases this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF-DNA interactions, based on Markov networks. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our models, and devise an algorithm for learning their structural features from binding site data. We evaluate our approach on synthetic data, and then apply it to binding site and ChIP-chip data from yeast. We reveal sequence features that are present in the binding specificities of yeast TFs, and show that FMMs explain the binding data significantly better than PSSMs.
Segal E., Fondufe-Mittendorf Y., Chen L., Thastroem A., Field Y., Moore I. K., Wang J. Z. & Widom J.
(2006)
Nature.
442,
7104,
p. 772-778
Eukaryotic genomes are packaged into nucleosome particles that occlude the DNA from interacting with most DNA binding proteins. Nucleosomes have higher affinity for particular DNA sequences, reflecting the ability of the sequence to bend sharply, as required by the nucleosome structure. However, it is not known whether these sequence preferences have a significant influence on nucleosome position in vivo, and thus regulate the access of other proteins to DNA. Here we isolated nucleosome-bound sequences at high resolution from yeast and used these sequences in a new computational approach to construct and validate experimentally a nucleosome-DNA interaction model, and to predict the genome-wide organization of nucleosomes. Our results demonstrate that genomes encode an intrinsic nucleosome organization and that this intrinsic organization can explain ∼50% of the in vivo nucleosome positions. This nucleosome positioning code may facilitate specific chromosome functions including transcription factor binding, transcription initiation, and even remodelling of the nucleosomes themselves.
Kuttner Y., Kozer N., Segal E., Schreiber G. & Haran G.
(2005)
Journal of the American Chemical Society.
127,
43,
p. 15138-15144
The association of two proteins is preceded by a mutual diffusional search in solution. The role of translational and rotational diffusion in this process has been studied theoretically for many years. However, systematic experimental verification of theoretical results is still lacking. We report here measurements of association rates of the proteins β-lactamase (TEM) and β-lactamase inhibitor protein (BLIP) in solutions of glycerol and poly(ethylene glycol) of increasing viscosity. We also measured translational and rotational diffusion in the same solutions, using fluorescence correlation spectroscopy and fluorescence anisotropy, respectively. It is found that in glycerol both translational and rotational diffusion rates are inversely dependent on viscosity, as predicted by the classical Stokes-Einstein relations, while the association rate depends nonlinearly on viscosity. In contrast, the association rate depends only weakly on the viscosity of the polymer solutions, which results in a similar weak dependence of kon on viscosity. The data are modeled using the theory of diffusion-limited association. Deviations from the theory are explained by a short-range solute-induced repulsion between the proteins in glycerol solution and an attractive depletion interaction generated by the polymers. These results open the way to the creation of a unified framework for all nonspecific effects involved in the protein association process, as well as to better theoretical understanding of these effects. Further, they reflect on the complex factors controlling protein association within the crowded environment of cells and suggest that a high concentration of macromolecules does not significantly impede protein association.
Segal E., Shapira M., Regev A., Pe'er D., Botstein D., Koller D. & Friedman N.
(2003)
Nature Genetics.
34,
2,
p. 166-176
Much of a cell's activity is organized as a network of interacting modules: sets of genes coregulated to respond to different conditions. We present a probabilistic method for identifying regulatory modules from gene expression data. Our procedure identifies modules of coregulated genes, their regulators and the conditions under which regulation occurs, generating testable hypotheses in the form 'regulator X regulates module Y under conditions W. We applied the method to a Saccharomyces cerevisiae expression data set, showing its ability to identify functionally coherent modules and their correct regulators. We present microarray experiments supporting three novel predictions, suggesting regulatory roles for previously uncharacterized proteins.
Lerner U., Segal E. & Koller D.
(2001)
Conference on Uncertainty in Artificial Intelligence
: UAI2001
.
p. 319-328
Many real life domains contain a mixture of discrete and continuous variables and can be modeled as hybrid Bayesian Networks (BNs). An important subclass of hybrid BNs are conditional linear Gaussian (CLG) networks, where the conditional distribution of the continuous variables given an assignment to the discrete variables is a multivariate Gaussian. Lauritzen's extension to the clique tree algorithm can be used for exact inference in CLG networks. However, many domains include discrete variables that depend on continuous ones, and CLG networks do not allow such dependencies to be represented. In this paper, we propose the first "exact" inference algorithm for augmented CLG networks -- CLG networks augmented by allowing discrete children of continuous parents. Our algorithm is based on Lauritzen's algorithm, and is exact in a similar sense: it computes the exact distributions over the discrete nodes, and the exact first and second moments of the continuous ones, up to inaccuracies resulting from numerical integration used within the algorithm. In the special case of softmax CPDs, we show that integration can often be done efficiently, and that using the first two moments leads to a particularly accurate approximation. We show empirically that our algorithm achieves substantially higher accuracy at lower cost than previous algorithms for this task.