Publications

2026

RNAcentral in 2026: genes and literature integration
Green A., Ribas C. E., Jandalala I. et_al. (2026) Nucleic Acids Research. 54, D1, p. D303-D313 Abstract
RNAcentral was founded in 2014 to serve as a comprehensive database of non-coding RNA sequences. It began by providing a single unified interface to more specialized resources and now contains 45 million sequences. It has grown beyond providing a single interface to many specialized resources and now provides several services and analyses. These include secondary structure prediction with R2DT, sequence search, and analysis with Rfam. Since its last publication in 2021, RNAcentral has developed two major features. First, literature integration with the development of LitScan and LitSumm. LitScan automatically identifies and links relevant publications to RNA entries, while LitSumm uses natural language processing to generate functional summaries from the literature. Together, these tools address the critical challenge of connecting sequence data with scattered functional knowledge across thousands of publications. Second, RNAcentral has created gene-level entries. Gene-level entries represent a large structural change to RNAcentral. While RNAcentral previously organized data exclusively at the sequence level, we now group related transcripts into gene-centric views. This allows researchers to explore all isoforms, splice variants, and related sequences for a gene in a unified interface, better reflecting biological organization and facilitating comparative analyses. RNAcentral is freely available at https://rnacentral.org.

2024

Talin1 dysfunction is genetically linked to systemic capillary leak syndrome
Elefant N., Rouni G., Arapatzi C. et_al. (2024) JCI insight. 9, 24, e173664. Abstract
Systemic capillary leak syndrome (SCLS) is a rare life-threatening disorder due to profound vascular leak. The trigger and the cause of the disease are currently unknown and there is no specific treatment. Here, we identified a rare heterozygous splice-site variant in the TLN1 gene in a familial SCLS case, suggestive of autosomal dominant inheritance with incomplete penetrance. Talin1 has a key role in cell adhesion by activating and linking integrins to the actin cytoskeleton. This variant causes in-frame skipping of exon 54 and is predicted to affect talins C-terminal actin-binding site (ABS3). Modeling the SCLS-TLN1 variant in TLN1-heterozygous endothelial cells (ECs) disturbed the endothelial barrier function. Similarly, mimicking the predicted actin-binding disruption in TLN1-heterozygous ECs resulted in disorganized endothelial adherens junctions. Mechanistically, we established that the SCLS-TLN1 variant, through the disruption of talins ABS3, sequestrates talins interacting partner, vinculin, at cellextracellular matrix adhesions, leading to destabilization of the endothelial barrier. We propose that pathogenic variants in TLN1 underlie SCLS, providing insight into the molecular mechanism of the disease that can be explored for future therapeutic interventions.
PIONEER big data platform for prostate cancer: lessons for advancing future real-world evidence research
Lawlor A., Beyer K., Russell B. et_al. (2024) Nature Reviews Urology. 22, 2, p. 116-124 e058267. Abstract
Prostate Cancer Diagnosis and Treatment Enhancement through the Power of Big Data in Europe (PIONEER) is a European network of excellence for big data in prostate cancer. PIONEER brings together 34 private and public stakeholders from 9 countries in one multidisciplinary research consortium with the aim of positively transforming the field of prostate cancer clinical care by answering pressing questions related to prostate cancer screening, diagnosis and treatment. PIONEER has developed a unique state-of-the-art big data analytic platform by integrating existing data sources from patients with prostate cancer. PIONEER leveraged this platform to address prioritized research questions, filling knowledge gaps in the characterization, management and core outcomes of prostate cancer across the different disease stages. The network has benefited from sustained patient and stakeholder involvement and engagement, but many challenges remain when using real-world data for big data projects. To continue to advance prostate cancer care, data need to be available, suitable methodologies should be selected and mechanisms for knowledge sharing must be in place. Now acting as the prostate cancer arm of the European Association of Urologys new endeavour, UroEvidenceHub, PIONEER maintains its goal of maximizing the potential of big data to improve prostate cancer care.
Expanding and Enriching the LncRNA GeneDisease Landscape Using the GeneCaRNA Database
Aggarwal S., Rosenblum C., Gould M. et_al. (2024) Biomedicines. 12, 6, 1305. Abstract
The GeneCaRNA human gene database is a member of the GeneCards Suite. It presents ~280,000 human non-coding RNA genes, identified algorithmically from ~690,000 RNAcentral transcripts. This expands by ~tenfold the ncRNA gene count relative to other sources. GeneCaRNA thus contains ~120,000 long non-coding RNAs (LncRNAs, >200 bases long), including ~100,000 novel genes. The latter have sparse functional information, a vast terra incognita for future research. LncRNA genes are uniformly represented on all nuclear chromosomes, with 10 genes on mitochondrial DNA. Data obtained from MalaCards, another GeneCards Suite member, finds 1547 genes associated with 1 to 50 diseases. About 15% of the associations portray experimental evidence, with cancers tending to be multigenic. Preliminary text mining within GeneCaRNA discovers interactions of lncRNA transcripts with target gene products, with 25% being ncRNAs and 75% proteins. GeneCaRNA has a biological pathways section, which at present shows 131 pathways for 38 lncRNA genes, a basis for future expansion. Finally, our GeneHancer database provides regulatory elements for ~110,000 lncRNA genes, offering pointers for co-regulated genes and genetic linkages from enhancers to diseases. We anticipate that the broad vista provided by GeneCaRNA will serve as an essential guide for further lncRNA research in disease decipherment.
The GARD Prebiotic Reproduction Model Described in Order and Complexity
Mayer C., Lancet D. & Markovitch O. (2024) Life. 14, 3, 288. Abstract
Early steps in the origin of life were necessarily connected to the unlikely formation of self-reproducing structures from chaotic chemistry. Simulations of chemical kinetics based on the graded autocatalysis replication domain (GARD) model demonstrate the ability of a micellar system to become self-reproducing units away from equilibrium. Even though they may be very rare in the initial state of the system, the property of their endogenous mutually catalytic networks being dynamic attractors greatly enhanced reproduction propensity, revealing their potential for selection and Darwinian evolution processes. In parallel, order and complexity have been shown to be crucial parameters in successful evolution. Here, we probe these parameters in the dynamics of GARD-governed entities in an attempt to identify characteristic mechanisms of their development in non-covalent molecular assemblies. Using a virtual random walk perspective, a value for consecutive order is defined based on statistical thermodynamics. The complexity, on the other hand, is determined by the size of a minimal algorithm fully describing the statistical properties of the random walk. By referring to a previously published diagonal line in an order/complexity diagram that represents the progression of evolution, it is shown that the GARD model has the potential to advance in this direction. These results can serve as a solid foundation for identifying general criteria for future analyses of evolving systems.
CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods
Jain S., Bakolitsa C., Brenner S. E. et_al. (2024) Genome Biology. 25, 1, 53. Abstract
Background: The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. Results: Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. Conclusions: Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Survivorship Data in Prostate Cancer: Where Are We and Where Do We Need To Be?
Russell B., Beyer K., Lawlor A. et_al. (2024) European Urology Open Science. 59, p. 27-29 Abstract
Cancer survivorship was recently identified as a prostate cancer (PCa) research priority by PIONEER, a European network of excellence for big data in PCa. Despite being a research priority, cancer survivorship lacks a clear and agreed definition, and there is a distinct paucity of patient-reported outcome (PRO) data available on the subject. Data collection on cancer survivorship depends on the availability and implementation of (validated) routinely collected patient-reported outcome measures (PROMs). There have been recent advances in the availability of such PROMs. For instance, the European Organisation for Research and Treatment of Cancer Quality of Life Group (EORTC QLG) is developing survivorship questionnaires. This provides an excellent first step in improving the data available on cancer survivorship. However, we propose that an agreed, standardised definition of (prostate) cancer survivorship must first be established. Only then can real-world data on survivorship be collected to strengthen our knowledge base. With more men than ever surviving PCa, this type of research is imperative to ensure that the quality of life of these men is considered as much as their quantity of life. Patient summary: As there are more prostate cancer survivors than ever before, research into cancer survivorship is crucial. We highlight the importance of such research and provide recommendations on how to carry it out. The first step should be establishing agreement on a standardised definition of survivorship. From this, patient-reported outcome measures can then be used to collect important survivorship data.

2023

Collectively autocatalytic sets
Ashkenasy G., Kauffman S., Lancet D., Otto S., Ruiz-Mirazo K., Semenov S. & Xavier J. (2023) Cell Reports Physical Science. 4, 10, 101594. Abstract
The origins of life probably involved autocatalysis. Kauffman's 1986 description of collectively autocatalytic setsself-replicating reaction networksand related ideas have influenced efforts to study the properties of reaction networks that may have given rise to life. Here, researchers discuss the impact of collectively autocatalytic sets on the field.
Unanswered questions in prostate cancer - findings of an international multi-stakeholder consensus by the PIONEER consortium
Omar M. I., MacLennan S., Ribal M. J. et_al. (2023) Nature Reviews Urology. 20, 8, p. 494-501 Abstract
In this Consensus Statement, the authors present results from an international multi-stakeholder consensus conducted by the PIONEER consortium to identify the most important questions in the field of prostate cancer that could be addressed using big data.PIONEER is a European network of excellence for big data in prostate cancer consisting of 37 private and public stakeholders from 9 countries across Europe. Many progresses have been done in prostate cancer management, but unanswered questions in the field still exist, and big data could help to answer these questions. The PIONEER consortium conducted a two-round modified Delphi survey aiming at building consensus between two stakeholder groups - health-care professionals and patients with prostate cancer - about the most important questions in the field of prostate cancer to be answered using big data. Respondents were asked to consider what would be the effect of answering the proposed questions on improving diagnosis and treatment outcomes for patients with prostate cancer and to score these questions on a scale of 1 (not important) to 9 (critically important). The mean percentage of participants who scored each of the proposed questions as critically important was calculated across the two stakeholder groups and used to rank the questions and identify the highest scoring questions in the critically important category. The identification of questions in prostate cancer that are important to various stakeholders will help the PIONEER consortium to provide answers to these questions to improve the clinical care of patients with prostate cancer.
Attractor dynamics drives self-reproduction in protobiological catalytic networks
Kahana A., Segev L. & Lancet D. (2023) Cell Reports Physical Science. 4, 5, 101384. Abstract
The origin of life must have involved an unlikely transition from chaotic chemistry to self-reproducing supramolecular structures. Previous quantitative analyses of self-reproducing mutually catalytic networks made of simple molecules have led to increasing popularity of this pre-RNA scenario for lifes origin. Here, we investigate in detail the reproduction characteristic of the graded autocatalysis replication domain (GARD) computer-simulated physicochemically rigorous lipid-based model. This model displays compatibility with heterogeneous environments, addresses the networks spatial demarcation, and portrays trans-generational compositional information transfer. However, we find that compositionally reproducing states are extremely rare, suggesting that random roaming would be a vastly inefficient path toward reproduction. Rewardingly, the present study shows that all self-reproducing states are also dynamic attractors of the catalytic network. This suggests a greatly enhanced propensity for the spontaneous emergence of reproduction and primal evolution, augmenting the likelihood of protolife appearance.[Display omitted]
Lifes origin may have involved self-reproducing supramolecular autocatalytic entitiesSimulated physicochemical model for lipid assemblies shows frequent self-reproductionReproduction is observed only within very rare compositional statesSelf-reproducers prove to be dynamic attractors, improving the chance for lifes origin
Simulations of the dynamic behavior of spontaneously formed lipid assemblies can offer insight into the origins of life, but few assembly compositions self-reproduce, presumably necessary for life to begin. Kahana et al. show that some self-reproducing compositions are dynamic attractors, making self-reproduction, and hence lifes emergence, much more plausible.
How Well do Polygenic Risk Scores Identify Men at High Risk for Prostate Cancer? Systematic Review and Meta-Analysis
Siltari A., Lönnerbro R., Pang K. et_al. (2023) Clinical Genitourinary Cancer. 21, 2, p. 316.e1-316.e11 Abstract
Objectives: Genome-wide association studies have revealed over 200 genetic susceptibility loci for prostate cancer (PCa). By combining them, polygenic risk scores (PRS) can be generated to predict risk of PCa. We summarize the published evidence and conduct meta-analyses of PRS as a predictor of PCa risk in Caucasian men. Patients and methods: Data were extracted from 59 studies, with 16 studies including 17 separate analyses used in the main meta-analysis with a total of 20,786 cases and 69,106 controls identified through a systematic search of ten databases. Random effects meta-analysis was used to obtain pooled estimates of area under the receiver-operating characteristic curve (AUC). Meta-regression was used to assess the impact of number of single-nucleotide polymorphisms (SNPs) incorporated in PRS on AUC. Heterogeneity is expressed as I² scores. Publication bias was evaluated using funnel plots and Egger tests. Results: The ability of PRS to identify men with PCa was modest (pooled AUC 0.63, 95% CI 0.62-0.64) with moderate consistency (I² 64%). Combining PRS with clinical variables increased the pooled AUC to 0.74 (0.68-0.81). Meta-regression showed only negligible increase in AUC for adding incremental SNPs. Despite moderate heterogeneity, publication bias was not evident. Conclusion: Typically, PRS accuracy is comparable to PSA or family history with a pooled AUC value 0.63 indicating mediocre performance for PRS alone.
Lifes Emergence by Protocellular Mutually Catalytic Networks
Yaniv R. & Lancet D. (2023) Guidebook for Systems Applications in Astrobiology. p. 239-263 Abstract
Viewing a Watson-Crick model of DNA is an exhilarating experience. It becomes crystal clear how the base-paired double helix allows information to be stored and copied. Yet, in the post-1953 euphoria, scientists were tempted to believe that having such an explicit chemical entity as a functional core of present-day life implies that life began with polynucleotides. The RNA-world model conjectures that the first chemical entity capable of self-replication was a base-paired polynucleotide. This chapter highlights reasons why this is unlikely to be the case and delineates an alternative scenario that draws upon Aleksandr Oparins early teachings. The crux is that life cannot be rooted in replicating single molecules but rather necessitates the emergence of supramolecular structures capable of reproduction in their entirety. Also described in detail is the pioneering systems chemistry models that show how catalytic networks can reproduce, as well as the proposed key roles of lipid assemblies (micelles and vesicles) in such a scenario. Last, in the realm of astrobiology, we portray a whole-planet generation of an immense number of different nanoscopic protocells, a basis for preselection for replication and reproduction.
Composomes
Lancet D. (2023) Encyclopedia of Astrobiology, Third Edition. p. 654-655 Abstract

2022

Micellar Composition Affects Lipid Accretion Kinetics in Molecular Dynamics Simulations: Support for Lipid Network Reproduction
Kahana A., Lancet D. & Palmai Z. (2022) Life (Basel, Switzerland). 12, 7, 955. Abstract
Mixed lipid micelles were proposed to facilitate life through their documented growth dynamics and catalytic properties. Our previous research predicted that micellar self-reproduction involves catalyzed accretion of lipid molecules by the residing lipids, leading to compositional homeostasis. Here, we employ atomistic Molecular Dynamics simulations, beginning with 54 lipid monomers, tracking an entire course of micellar accretion. This was done to examine the self-assembly of variegated lipid clusters, allowing us to measure entry and exit rates of monomeric lipids into pre-micelles with different compositions and sizes. We observe considerable rate-modifications that depend on the assembly composition and scrutinize the underlying mechanisms as well as the energy contributions. Lastly, we describe the measured potential for compositional homeostasis in our simulated mixed micelles. This affirms the basis for micellar self-reproduction, with implications for the study of the origin of life.
The GeneCards Suite
Safran M., Rosen N., Twik M., BarShir R., Stein T. I., Dahary D., Fishilevich S. & Lancet D. (2022) Practical Guide to Life Science Databases. p. 27-56 Abstract
The GeneCards® database of human genes was launched in 1997 and has expanded since then to encompass gene-centric, disease-centric, and pathway-centric entities and relationships within the GeneCards Suite, effectively navigating the universe of human biological datagenes, proteins, cells, regulatory elements, biological pathways, and diseasesand the connections among them. The knowledgebase amalgamates information from >150 selected sources related to genes, proteins, ncRNAs, regulatory elements, chemical compounds, drugs, splice variants, SNPs, signaling molecules, differentiation protocols, biological pathways, stem cells, genetic tests, clinical trials, diseases, publications, and more and empowers the suites Next Generation Sequencing (NGS), gene set, shared descriptors, and batch query analysis tools.

2021

Self-reproducing catalytic micelles as nanoscopic protocell precursors
Kahana A. & Lancet D. (2021) Nature reviews. Chemistry. 5, 12, p. 870-878 Abstract
Protocells at lifes origin are often conceived as bilayer-enclosed precursors of life, whose self-reproduction rests on the early advent of replicating catalytic biopolymers. This Perspective describes an alternative scenario, wherein reproducing nanoscopic lipid micelles with catalytic capabilities were forerunners of biopolymer-containing protocells. This postulate gains considerable support from experiments describing micellar catalysis and autocatalytic proliferation, and, more recently, from reports on cross-catalysis in mixed micelles that lead to life-like steady-state dynamics. Such results, along with evidence for micellar prebiotic compatibility, synergize with predictions of our chemically stringent computer-simulated model, illustrating how mutually catalytic lipid networks may enable micellar compositional reproduction that could underlie primal selection and evolution. Finally, we highlight studies on how endogenously catalysed lipid modifications could guide further protocellular complexification, including micelle to vesicle transition and monomer to biopolymer progression. These portrayals substantiate the possibility that protocellular evolution could have been seeded by pre-RNA lipid assemblies.

Accepted version
Dynamic lipid aptamers: non-polymeric chemical path to early life
Kahana A., Maslov S. & Lancet D. (2021) Chemical Society Reviews. 50, 21, p. 11741-11746 Abstract
A widespread dogma asserts that life could not have emerged without biopolymers RNA and proteins. However, the widely acknowledged implausibility of a spontaneous appearance and proliferation of these complex molecules in primordial messy chemistry casts doubt on this scenario. A proposed alternative is \u201cLipid-First\u201d, based on the evidence that lipid assemblies may spontaneously emerge in heterogeneous environments, and are shown to undergo growth and fission, and to portray autocatalytic self-copying. What seems undecided is whether lipid assemblies have protein-like capacities for stereospecific interactions, a sine qua non of life processes. This Viewpoint aims to alleviate such doubts, pointing to growing experimental evidence that lipid aggregates possess dynamic surface configurations capable of stereospecific molecular recognition. Such findings help support a possible key role of lipids in seeding life's origin.
The Key Role of Patient Involvement in the Development of Core Outcome Sets in Prostate Cancer
Beyer K., MacLennan S. J., Moris L. et_al. (2021) European Urology Focus. 7, 5, p. 943-946 Abstract
Patients are the stewards of their own care and hence their voice is important when designing and implementing research. Patients should be involved not only as participants in research that impacts their care, as the recipients of that care and any associated harms, but also as research collaborators in prioritising important questions from the patient perspective and designing the research and the ways in which is it most appropriate to involve patients. The PIONEER Consortium, an international multistakeholder collaboration lead by the European Association of Urology, has developed a core outcome set (COS) for localised and metastatic prostate cancer relevant to all stakeholders in particular patients. Throughout the work of PIONEER, patient representatives were involved as collaborators in setting the research agenda, and a wider group of patients was involved as participants in developing COSs, for instance in consensus meetings on choosing important outcomes and appropriate definitions. This publication showcases the process for COS development and highlights the most important recommendations to ultimately inform future research projects co-created between patients and other stakeholders. Patient summary: An important step in involving patients in the selection of outcomes for clinical trials, clinical audits, and real-world evidence is the development of a core outcome set (COS) that is relevant to all stakeholders. This report highlights the patient participation throughout our PIONEER COS development. Take Home Message: An important step in involving patients in the selection of outcomes for clinical trials, clinical audits, and real-world evidence is to develop a core outcome set (COS) that is relevant to all stakeholders. As part of the work of the PIONEER Consortium, we aim to highlight the patient participation throughout our PIONEER COS development.
GeneCaRNA: A Comprehensive Gene-centric Database of Human Non-coding RNAs in the GeneCards Suite
Barshir R., Fishilevich S., Iny-Stein T., Zelig O., Mazor Y., Guan-Golan Y., Safran M. & Lancet D. (2021) Journal of Molecular Biology. 433, 11, 166913. Abstract
Non-coding RNA (ncRNA) genes assume increasing biological importance, with growing associations with diseases. Many ncRNA sources are transcript-centric, but for non-coding variant analysis and disease decipherment it is essential to transform this information into a comprehensive set of genome-mapped ncRNA genes. We present GeneCaRNA, a new all-inclusive gene-centric ncRNA database within the GeneCards Suite. GeneCaRNA information is integrated from four community-backed data structures: the major transcript database RNAcentral with its 20 encompassed databases, and the ncRNA entries of three major gene resources HGNC, Ensembl and NCBI Gene. GeneCaRNA presents 219,587 ncRNA gene pages, a 7-fold increase from those available in our three gene mining sources. Each ncRNA gene has wide-ranging annotation, mined from >100 worldwide sources, providing a powerful GeneCards-leveraged search. The latter empowers VarElect, our disease-gene interpretation tool, allowing one to systematically decipher ncRNA variants. The combined power of GeneCaRNA with GeneHancer, our regulatory elements database, facilitates wide-ranging scrutiny of the non-coding terra incognita of gene networks and whole genome analyses.

2020

Rare Variant Burden Analysis within Enhancers Identifies CAV1 as an ALS Risk Gene
Cooper-Knock J., Zhang S., Yacovzada N. S., Eitan C., Hornstein E., Fishilevich S. & Lancet D. (2020) Cell Reports. 33, 9, 108456. Abstract
Amyotrophic lateral sclerosis (ALS) is an incurable neurodegenerative disease. CAV1 and CAV2 organize membrane lipid rafts (MLRs) important for cell signaling and neuronal survival, and overexpression of CAV1 ameliorates ALS phenotypes in vivo. Genome-wide association studies localize a large proportion of ALS risk variants within the non-coding genome, but further characterization has been limited by lack of appropriate tools. By designing and applying a pipeline to identify pathogenic genetic variation within enhancer elements responsible for regulating gene expression, we identify disease-associated variation within CAV1/CAV2 enhancers, which replicate in an independent cohort. Discovered enhancer mutations reduce CAV1/CAV2 expression and disrupt MLRs in patient-derived cells, and CRISPR-Cas9 perturbation proximate to a patient mutation is sufficient to reduce CAV1/CAV2 expression in neurons. Additional enrichment of ALS-associated mutations within CAV1 exons positions CAV1 as an ALS risk gene. We propose CAV1/CAV2 overexpression as a personalized medicine target for ALS.
Genome-wide association study identifies 16 genomic regions associated with circulating cytokines at birth
Wang Y., Nudel R., Benros M. E., Skogstrand K., Fishilevich S. & Lancet D. (2020) PLoS Genetics. 16, 11, e1009163. Abstract
Circulating inflammatory markers are essential to human health and disease, and they are often dysregulated or malfunctioning in cancers as well as in cardiovascular, metabolic, immunologic and neuropsychiatric disorders. However, the genetic contribution to the physiological variation of levels of circulating inflammatory markers is largely unknown. Here we report the results of a genome-wide genetic study of blood concentration of ten cytokines, including the hitherto unexplored calcium-binding protein (S100B). The study leverages a unique sample of neonatal blood spots from 9,459 Danish subjects from the iPSYCH initiative. We estimate the SNP-heritability of marker levels as ranging from essentially zero for Erythropoietin (EPO) up to 73% for S100B. We identify and replicate 16 associated genomic regions (p −9), of which four are novel. We show that the associated variants map to enhancer elements, suggesting a possible transcriptional effect of genomic variants on the cytokine levels. The identification of the genetic architecture underlying the basic levels of cytokines is likely to prompt studies investigating the relationship between cytokines and complex disease. Our results also suggest that the genetic architecture of cytokines is stable from neonatal to adult life.
Author Correction: Introducing PIONEER: a project to harness big data in prostate cancer research (Nature Reviews Urology, (2020), 17, 6, (351-362), 10.1038/s41585-020-0324-x)
Omar M. I., Roobol M. J., Ribal M. J. et_al. (2020) Nature Reviews Urology. 17, 8, p. 482 Abstract
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Introducing PIONEER: a project to harness big data in prostate cancer research
Omar M. I., Roobol M. J., Ribal M. J., Abbott T. & Lancet D. (2020) Nature Reviews Urology. 17, 6, p. 351-361 Abstract
Prostate Cancer Diagnosis and Treatment Enhancement Through the Power of Big Data in Europe (PIONEER) is a European network of excellence for big data in prostate cancer, consisting of 32 private and public stakeholders from 9 countries across Europe. Launched by the Innovative Medicines Initiative 2 and part of the Big Data for Better Outcomes Programme (BD4BO), the overarching goal of PIONEER is to provide high-quality evidence on prostate cancer management by unlocking the potential of big data. The project has identified critical evidence gaps in prostate cancer care, via a detailed prioritization exercise including all key stakeholders. By standardizing and integrating existing high-quality and multidisciplinary data sources from patients with prostate cancer across different stages of the disease, the resulting big data will be assembled into a single innovative data platform for research. Based on a unique set of methodologies, PIONEER aims to advance the field of prostate cancer care with a particular focus on improving prostate-cancer-related outcomes, health system efficiency by streamlining patient management, and the quality of health and social care delivered to all men with prostate cancer and their families worldwide.Author Correction: The originally published article
contained errors in Figure 1 and did not reflect the current organization of
the PIONEER consortium. The figure has been corrected in the HTML and PDF
versions of the manuscript to reflect the correct organization of PIONEER.
A unified nomenclature for vertebrate olfactory receptors
Olender T., Jones T. E. M., Bruford E. & Lancet D. (2020) BMC Evolutionary Biology. 20, 1, p. 42 42. Abstract
Background - Olfactory receptors (ORs) are G protein-coupled receptors with a crucial role in odor detection. A typical mammalian genome harbors - 1000 OR genes and pseudogenes; however, different gene duplication/deletion events have occurred in each species, resulting in complex orthology relationships. While the human OR nomenclature is widely accepted and based on phylogenetic classification into 18 families and further into subfamilies, for other mammals different and multiple nomenclature systems are currently in use, thus concealing important evolutionary and functional insights. Results - Here, we describe the Mutual Maximum Similarity (MMS) algorithm, a systematic classifier for assigning a human-centric nomenclature to any OR gene based on inter-species hierarchical pairwise similarities. MMS was applied to the OR repertoires of seven mammals and zebrafish. Altogether, we assigned symbols to 10,249 ORs. This nomenclature is supported by both phylogenetic and synteny analyses. The availability of a unified nomenclature provides a framework for diverse studies, where textual symbol comparison allows immediate identification of potential ortholog groups as well as species-specific expansions/deletions; for example, Or52e5 and Or52e5b represent a rat-specific duplication of OR52E5. Another example is the complete absence of OR subfamily OR6Z among primate OR symbols. In other mammals, OR6Z members are located in one genomic cluster, suggesting a large deletion in the great ape lineage. An additional 14 mammalian OR subfamilies are missing from the primate genomes. While in chimpanzee 87% of the symbols were identical to human symbols, this number decreased to - 50% in dog and cow and to - 30% in rodents, reflecting the adaptive changes of the OR gene superfamily across diverse ecological niches. Application of the proposed nomenclature to zebrafish revealed similarity to mammalian ORs that could not be detected from the current zebrafish olfactory receptor gene nomenclature. Conclusions - We have consolidated a unified standard nomenclature system for the vertebrate OR superfamily. The new nomenclature system will be applied to cow, horse, dog and chimpanzee by the Vertebrate Gene Nomenclature Committee and its implementation is currently under consideration by other relevant species-specific nomenclature committees.

2019

Genome analysis and knowledge-driven variant interpretation with TGex
Dahary D., Golan Y., Mazor Y. et_al. (2019) BMC Medical Genomics. 12, 1, 200. Abstract
Background: The clinical genetics revolution ushers in great opportunities, accompanied by significant challenges. The fundamental mission in clinical genetics is to analyze genomes, and to identify the most relevant genetic variations underlying a patient's phenotypes and symptoms. The adoption of Whole Genome Sequencing requires novel capacities for interpretation of non-coding variants.Results: We present TGex, the Translational Genomics expert, a novel genome variation analysis and interpretation platform, with remarkable exome analysis capacities and a pioneering approach of non-coding variants interpretation. TGex's main strength is combining state-of-the-art variant filtering with knowledge-driven analysis made possible by VarElect, our highly effective gene-phenotype interpretation tool. VarElect leverages the widely used GeneCards knowledgebase, which integrates information from > 150 automatically-mined data sources. Access to such a comprehensive data compendium also facilitates TGex's broad variant annotation, supporting evidence exploration, and decision making. TGex has an interactive, user-friendly, and easy adaptive interface, ACMG compliance, and an automated reporting system. Beyond comprehensive whole exome sequence capabilities, TGex encompasses innovative non-coding variants interpretation, towards the goal of maximal exploitation of whole genome sequence analyses in the clinical genetics practice. This is enabled by GeneCards' recently developed GeneHancer, a novel integrative and fully annotated database of human enhancers and promoters. Examining use-cases from a variety of TGex users world-wide, we demonstrate its high diagnostic yields (42% for single exome and 50% for trios in 1500 rare genetic disease cases) and critical actionable genetic findings. The platform's support for integration with EHR and LIMS through dedicated APIs facilitates automated retrieval of patient data for TGex's customizable reporting engine, establishing a rapid and cost-effective workflow for an entire range of clinical genetic testing, including rare disorders, cancer predisposition, tumor biopsies and health screening.Conclusions: TGex is an innovative tool for the annotation, analysis and prioritization of coding and non-coding genomic variants. It provides access to an extensive knowledgebase of genomic annotations, with intuitive and flexible configuration options, allows quick adaptation, and addresses various workflow requirements. It thus simplifies and accelerates variant interpretation in clinical genetics workflows, with remarkable diagnostic yield, as exemplified in the described use cases.
Twenty Years of "Lipid World": A Fertile Partnership with David Deamer
Lancet D., Segre D. & Kahana A. (2019) Life. 9, 4, 77. Abstract
"The Lipid World" was published in 2001, stemming from a highly effective collaboration with David Deamer during a sabbatical year 20 years ago at the Weizmann Institute of Science in Israel. The present review paper highlights the benefits of this scientific interaction and assesses the impact of the lipid world paper on the present understanding of the possible roles of amphiphiles and their assemblies in the origin of life. The lipid world is defined as a putative stage in the progression towards life's origin, during which diverse amphiphiles or other spontaneously aggregating small molecules could have concurrently played multiple key roles, including compartment formation, the appearance of mutually catalytic networks, molecular information processing, and the rise of collective self-reproduction and compositional inheritance. This review brings back into a broader perspective some key points originally made in the lipid world paper, stressing the distinction between the widely accepted role of lipids in forming compartments and their expanded capacities as delineated above. In the light of recent advancements, we discussed the topical relevance of the lipid worldview as an alternative to broadly accepted scenarios, and the need for further experimental and computer-based validation of the feasibility and implications of the individual attributes of this point of view. Finally, we point to possible avenues for exploring transition paths from small molecule-based noncovalent structures to more complex biopolymer-containing proto-cellular systems.
Noncoding deletions reveal a gene that is critical for intestinal function
Oz-Levi D., Olender T., Bar-Joseph I. et_al. (2019) Nature. 571, 7763, p. 107-111 Abstract
Large-scale genome sequencing is poised to provide a substantial increase in the rate of discovery of disease-associated mutations, but the functional interpretation of such mutations remains challenging. Here we show that deletions of a sequence on human chromosome 16 that we term the intestine-critical region (ICR) cause intractable congenital diarrhoea in infants(1,2). Reporter assays in transgenic mice show that the ICR contains a regulatory sequence that activates transcription during the development of the gastrointestinal system. Targeted deletion of the ICR in mice caused symptoms that recapitulated the human condition. Transcriptome analysis revealed that an unannotated open reading frame (Percc1) flanks the regulatory sequence, and the expression of this gene was lost in the developing gut of mice that lacked the ICR. Percc1-knockout mice displayed phenotypes similar to those observed upon ICR deletion in mice and patients, whereas an ICR-driven Percc1 transgene was sufficient to rescue the phenotypes found in mice that lacked the ICR. Together, our results identify a gene that is critical for intestinal function and underscore the need for targeted in vivo studies to interpret the growing number of clinical genetic findings that do not affect known protein-coding genes.
Enceladus: First Observed Primordial Soup Could Arbitrate Origin-of-Life Debate
Kahana A., Schmitt-Kopplin P. & Lancet D. (2019) Astrobiology. 19, 10, p. 1263-1278 Abstract
A recent breakthrough publication has reported complex organic molecules in the plumes emanating from the subglacial water ocean of Saturn's moon Enceladus (Postberg et al., 2018, Nature 558:564-568). Based on detailed chemical scrutiny, the authors invoke primordial or endogenously synthesized carbon-rich monomers (
Protobiotic Systems Chemistry Analyzed by Molecular Dynamics
Kahana A. & Lancet D. (2019) Life. 9, 2, 38. Abstract
Systems chemistry has been a key component of origin of life research, invoking models of life's inception based on evolving molecular networks. One such model is the graded autocatalysis replication domain (GARD) formalism embodied in a lipid world scenario, which offers rigorous computer simulation based on defined chemical kinetics equations. GARD suggests that the first pre-RNA life-like entities could have been homeostatically-growing assemblies of amphiphiles, undergoing compositional replication and mutations, as well as rudimentary selection and evolution. Recent progress in molecular dynamics has provided an experimental tool to study complex biological phenomena such as protein folding, ligand-receptor interactions, and micellar formation, growth, and fission. The detailed molecular definition of GARD and its inter-molecular catalytic interactions make it highly compatible with molecular dynamics analyses. We present a roadmap for simulating GARD's kinetic and thermodynamic behavior using various molecular dynamics methodologies. We review different approaches for testing the validity of the GARD model by following micellar accretion and fission events and examining compositional changes over time. Near-future computational advances could provide empirical delineation for further system complexification, from simple compositional non-covalent assemblies towards more life-like protocellular entities with covalent chemistry that underlies metabolism and genetic encoding.

2018

Systems protobiology: origin of life in lipid catalytic networks
Lancet D., Zidovetzki R. & Markovitch O. (2018) Journal of the Royal Society Interface. 15, 144, 20180159. Abstract
Life is that which replicates and evolves, but there is no consensus on how life emerged. We advocate a systems protobiology view, whereby the first replicators were assemblies of spontaneously accreting, heterogeneous and mostly non-canonical amphiphiles. This view is substantiated by rigorous chemical kinetics simulations of the graded autocatalysis replication domain (GARD) model, based on the notion that the replication or reproduction of compositional information predated that of sequence information. GARD reveals the emergence of privileged non-equilibrium assemblies (composomes), which portray catalysis-based homeostatic (concentration-preserving) growth. Such a process, along with occasional assembly fission, embodies cell-like reproduction. GARD pre-RNA evolution is evidenced in the selection of different composomes within a sparse fitness landscape, in response to environmental chemical changes. These observations refute claims that GARD assemblies (or other mutually catalytic networks in the metabolism first scenario) cannot evolve. Composomes represent both a genotype and a selectable phenotype, anteceding present-day biology in which the two are mostly separated. Detailed GARD analyses show attractor-like transitions from random assemblies to self-organized composomes, with negative entropy change, thus establishing composomes as dissipative system-shallmarks of life. We showa preliminary new version of our model, metabolic GARD (M-GARD), in which lipid covalent modifications are orchestrated by non-enzymatic lipid catalysts, themselves compositionally reproduced. M-GARD fills the gap of the lack of true metabolism in basic GARD, and is rewardingly supported by a published experimental instance of a lipid-based mutually catalytic network. Anticipating near-future far-reaching progress of molecular dynamics, M-GARD is slated to quantitatively depict elaborate protocells, with orchestrated reproduction of both lipid bilayer and lumenal content. Finally, a GARD analysis in a whole-planet context offers the potential for estimating the probability of life's emergence. The invigorated GARD scrutiny presented in this reviewenhances the validity of autocatalytic sets as a bona fide early evolution scenario and provides essential infrastructure for a paradigm shift towards a systems protobiology view of life's origin.
Replication of Simulated Prebiotic Amphiphilic Vesicles in a Finite Environment Exhibits Complex Behavior That Includes High Progeny Variability and Competition
Armstrong D. L., Lancet D. & Zidovetzki R. (2018) Astrobiology. 18, 4, p. 419-430 Abstract
We studied the simulated replication and growth of prebiotic vesicles composed of 140 phospholipids and cholesterol using our R-GARD (Real Graded Autocatalysis Replication Domain) formalism that utilizes currently extant lipids that have known rate constants of lipid-vesicle interactions from published experimental data. R-GARD normally modifies kinetic parameters of lipid-vesicle interactions based on vesicle composition and properties. Our original R-GARD model tracked the growth and division of one vesicle at a time in an environment with unlimited lipids at a constant concentration. We explore here a modified model where vesicles compete for a finite supply of lipids. We observed that vesicles exhibit complex behavior including initial fast unrestricted growth, followed by intervesicle competition for diminishing resources, then a second growth burst driven by better-adapted vesicles, and ending with a final steady state. Furthermore, in simulations without kinetic parameter modifications (invariant kinetics), the initial replication was an order of magnitude slower, and vesicles' composition variability at the final steady state was much lower. The complex kinetic behavior was not observed either in the previously published R-GARD simulations or in additional simulations presented here with only one lipid component. This demonstrates that both a finite environment (inducing selection) and multiple components (providing variation for selection to act upon) are crucial for portraying evolution-like behavior. Such properties can improve survival in a changing environment by increasing the ability of early protocellular entities to respond to rapid environmental fluctuations likely present during abiogenesis both on Earth and possibly on other planets. This in silico simulation predicts that a relatively simple in vitro chemical system containing only lipid molecules might exhibit properties that are relevant to prebiotic processes.
Mutations in AIFM1 cause an X-linked childhood cerebellar ataxia partially responsive to riboflavin
Heimer G., Eyal E., Zhu X. et_al. (2018) European Journal of Paediatric Neurology. 22, 1, p. 93-101 Abstract
Background: AIFM1 encodes a mitochondrial flavoprotein with a dual role (NADH oxidoreductase and regulator of apoptosis), which uses riboflavin as a cofactor. Mutations in the X linked AIFM1 were reported in relation to two main phenotypes: a severe infantile mitochondrial encephalomyopathy and an early-onset axonal sensorimotor neuropathy with hearing loss. In this paper we report two unrelated males harboring AIFM1 mutations (one of which is novel) who display distinct phenotypes including progressive ataxia which partially improved with riboflavin treatment.Methods: For both patients trio whole exome sequencing was performed. Validation and segregation were performed with Sanger sequencing. Following the diagnosis, patients were treated with up to 200 mg riboflavin/day for 12 months. Ataxia was assessed by the ICARS scale at baseline, and 6 and 12 months following treatment.Results: Patient 1 presented at the age of 5 years with auditory neuropathy, followed by progressive ataxia, vermian atrophy and axonal neuropathy. Patient 2 presented at the age of 4.5 years with severe limb and palatal myoclonus, followed by ataxia, cerebellar atrophy, ophthalmoplegia, sensorineural hearing loss, hyporeflexia and cardiomyopathy. Two deleterious missense mutations were found in the AIFM1 gene: p. Met340Thr mutation located in the FAD dependent oxidoreductase domain and the novel p. Thr141Ile mutation located in a highly conserved DNA binding motif. Ataxia score, decreased by 39% in patient 1 and 20% in patient 2 following 12 months of treatment.Conclusion: AIFM1 mutations cause childhood cerebellar ataxia, which may be partially treatable in some patients with high dose riboflavin. (C) 2017 European Paediatric Neurology Society. Published by Elsevier Ltd. All rights reserved.

2017

Next-generation sequencing of patients with congenital anosmia
Alkelai A., Olender T., Dode C. et_al. (2017) European Journal of Human Genetics. 25, 12, p. 1377-1387 Abstract
We performed whole exome or genome sequencing in eight multiply affected families with ostensibly isolated congenital anosmia. Hypothesis-free analyses based on the assumption of fully penetrant recessive/dominant/X-linked models obtained no strong single candidate variant in any of these families. In total, these eight families showed 548 rare segregating variants that were predicted to be damaging, in 510 genes. Three Kallmann syndrome genes (FGFR1, SEMA3A, and CHD7) were identified. We performed permutation-based analysis to test for overall enrichment of these 510 genes carrying these 548 variants with genes mutated in Kallmann syndrome and with a control set of genes mutated in hypogonadotrophic hypogonadism without anosmia. The variants were found to be enriched for Kallmann syndrome genes (3 observed vs. 0.398 expected, p = 0.007), but not for the second set of genes. Among these three variants, two have been already reported in genes related to syndromic anosmia (FGFR1 (p.(R250W)), CHD7 (p.(L2806V))) and one was novel (SEMA3A (p.(T717I))). To replicate these findings, we performed targeted sequencing of 16 genes involved in Kallmann syndrome and hypogonadotrophic hypogonadism in 29 additional families, mostly singletons. This yielded an additional 6 variants in 5 Kallmann syndrome genes (PROKR2, SEMA3A, CHD7, PROK2, ANOS1), two of them already reported to cause Kallmann syndrome. In all, our study suggests involvement of 6 syndromic Kallmann genes in isolated anosmia. Further, we report a yet unreported appearance of di-genic inheritance in a family with congenital isolated anosmia. These results are consistent with a complex molecular basis of congenital anosmia.
Rational confederation of genes and diseases: NGS interpretation via GeneCards, MalaCards and VarElect
Rappaport N., Fishilevich S., Nudel R. et_al. (2017) BioMedical Engineering Online. 16, 72. Abstract
Background: A key challenge in the realm of human disease research is next generation sequencing (NGS) interpretation, whereby identified filtered variant-harboring genes are associated with a patient's disease phenotypes. This necessitates bioinformatics tools linked to comprehensive knowledgebases. The GeneCards suite databases, which include GeneCards (human genes), MalaCards (human diseases) and PathCards (human pathways) together with additional tools, are presented with the focus on MalaCards utility for NGS interpretation as well as for large scale bioinformatic analyses. Results: VarElect, our NGS interpretation tool, leverages the broad information in the GeneCards suite databases. MalaCards algorithms unify disease-related terms and annotations from 69 sources. Further, MalaCards defines hierarchical relatedness-aliases, disease families, a related diseases network, categories and ontological classifications. GeneCards and MalaCards delineate and share a multi-tiered, scored gene-disease network, with stringency levels, including the definition of elite status-high quality gene-disease pairs, coming from manually curated trustworthy sources, that includes 4500 genes for 8000 diseases. This unique resource is key to NGS interpretation by VarElect. VarElect, a comprehensive search tool that helps infer both direct and indirect links between genes and user-supplied disease/phenotype terms, is robustly strengthened by the information found in MalaCards. The indirect mode benefits from GeneCards' diverse gene-to-gene relationships, including SuperPaths-integrated biological pathways from 12 information sources. We are currently adding an important information layer in the form of "disease SuperPaths", generated from the gene-disease matrix by an algorithm similar to that previously employed for biological pathway unification. This allows the discovery of novel gene-disease and disease-disease relationships. The advent of whole genome sequencing necessitates capacities to go beyond protein coding genes. GeneCards is highly useful in this respect, as it also addresses 101,976 non-protein-coding RNA genes. In a more recent development, we are currently adding an inclusive map of regulatory elements and their inferred target genes, generated by integration from 4 resources. Conclusions: MalaCards provides a rich big-data scaffold for in silico biomedical discovery within the gene-disease universe. VarElect, which depends significantly on both GeneCards and MalaCards power, is a potent tool for supporting the interpretation of wet-lab experiments, notably NGS analyses of disease. The GeneCards suite has thus transcended its 2-decade role in biomedical research, maturing into a key player in clinical investigation.
GeneHancer: Genome-wide integration of enhancers and target genes in GeneCards
Fishilevich S., Nudel R., Rappaport N. et_al. (2017) Database-The Journal Of Biological Databases And Curation. 2017, bax028. Abstract
A major challenge in understanding gene regulation is the unequivocal identification of enhancer elements and uncovering their connections to genes. We present GeneHancer, a novel database of human enhancers and their inferred target genes, in the framework of GeneCards. First, we integrated a total of 434 000 reported enhancers from four different genome-wide databases: the Encyclopedia of DNA Elements (ENCODE), the Ensembl regulatory build, the functional annotation of the mammalian genome (FANTOM) project and the VISTA Enhancer Browser. Employing an integration algorithm that aims to remove redundancy, GeneHancer portrays 285 000 integrated candidate enhancers (covering 12.4% of the genome), 94 000 of which are derived from more than one source, and each assigned an annotation-derived confidence score. GeneHancer subsequently links enhancers to genes, using: tissue co-expression correlation between genes and enhancer RNAs, as well as enhancer-targeted transcription factor genes; expression quantitative trait loci for variants within enhancers; and capture Hi-C, a promoter-specific genome conformation assay. The individual scores based on each of these four methods, along with geneenhancer genomic distances, form the basis for GeneHancers combinatorial likelihood-based scores for enhancergene pairing. Finally, we define elite enhancergene relations reflecting both a high-likelihood enhancer definition and a strong enhancergene association. GeneHancer predictions are fully integrated in the widely used GeneCards Suite, whereby candidate enhancers and their annotations are displayed on every relevant GeneCard. This assists in the mapping of non-coding variants to enhancers, and via the linked genes, forms a basis for variantphenotype interpretation of whole-genome sequences in health and disease.
MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search
Rappaport N., Twik M., Plaschkes I. et_al. (2017) Nucleic Acids Research. 45, D1, p. D877-D887 Abstract
The MalaCards human disease database (http://www.malacards.org/) is an integrated compendium of annotated diseases mined from 68 data sources. MalaCards has a web card for each of similar to 20 000 disease entries, in six global categories. It portrays a broad array of annotation topics in 15 sections, including Summaries, Symptoms, Anatomical Context, Drugs, Genetic Tests, Variations and Publications. The Aliases and Classifications section reflects an algorithm for disease name integration across often-conflicting sources, providing effective annotation consolidation. A central feature is a balanced Genes section, with scores reflecting the strength of disease-gene associations. This is accompanied by other gene-related disease information such as pathways, mouse phenotypes and GO-terms, stemming from MalaCards' affiliation with the GeneCards Suite of databases. MalaCards' capacity to inter-link information from complementary sources, along with its elaborate search function, relational database infrastructure and convenient data dumps, allows it to tackle its rich disease annotation landscape, and facilitates systems analyses and genome sequence interpretation. MalaCards adopts a `flat' disease-card approach, but each card is mapped to popular hierarchical ontologies (e.g.International Classification of Diseases, Human Phenotype Ontology and Unified Medical Language System) and also contains information about multi-level relations among diseases, thereby providing an optimal tool for disease representation and scrutiny.

2016

Integrated identification of disease-gene links and their utility in next-generation sequencing interpretation
Rappaport N., Plaschkes I., Fishilevich S., Twik M., Stein T. I., Safran M., Nudel R., Oz-Levi D. & Lancet D. (2016) ACM-BCB 2016 - 7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. p. 463-464 Abstract
The study of human diseases is at the core of present-day biological research. It is an interdisciplinary effort encompassing genomics, bioinformatics, systems biology, and systems medicine. Currently, many efforts are being made to elucidate the genetic underpinnings of human diseases. A consequence thereof is that many different sources use different nomenclatures, definitions, and classifications. Furthermore, the identification of gene-disease links, in addition to being challenging in its own right, is also affected by this lack of convention. We addressed both of these issues when creating MalaCards (www.malacards.org), an integrated and unified database of human diseases and their annotations, which capitalizes on information from the GeneCards database (www.genecards.org) [1-2]. GeneCards has annotations relevant to various characteristics of genes, which can be used as a discovery platform for identifying gene-disease links [3-4]. At the heart of MalaCards is a consolidated gene-disease matrix based on nine sources, some manually curated and others text-mined. A scoring algorithm prioritizes the list of disease-associated genes based on the strength of the evidence from each source. Figure 1 shows the frequencies of gene-disease links across the GeneCards gene categories.
A role for TENM1 mutations in congenital general anosmia
Alkelai A., Olender T., Haffner-Krausz R. et_al. (2016) Clinical Genetics. 90, 3, p. 211-219 Abstract
Congenital general anosmia (CGA) is a neurological disorder entailing a complete innate inability to sense odors. While the mechanisms underlying vertebrate olfaction have been studied in detail, there are still gaps in our understanding of the molecular genetic basis of innate olfactory disorders. Applying whole-exome sequencing to a family multiply affected with CGA, we identified three members with a rare X-linked missense mutation in the TENM1 (teneurin 1) gene (ENST00000422452:c.C4829T). In Drosophila melanogaster, TENM1 functions in synaptic-partner-matching between axons of olfactory sensory neurons and target projection neurons and is involved in synapse organization in the olfactory system. We used CRISPR-Cas9 system to generate a Tenm1 disrupted mouse model. Tenm1^−/− and point-mutated Tenm1^A^/A adult mice were shown to have an altered ability to locate a buried food pellet. Tenm1^A^/A mice also displayed an altered ability to sense aversive odors. Results of our study, that describes a new Tenm1 mouse, agree with the hypothesis that TENM1 has a role in olfaction. However, additional studies should be done in larger CGA cohorts, to provide statistical evidence that loss-of-function mutations in TENM1 can solely cause the disease in our and other CGA cases.
VarElect: The phenotype-based variation prioritizer of the GeneCards Suite
Stelzer G., Plaschkes I., Oz Levi L. D. et_al. (2016) BMC Genomics. 17, 444. Abstract
Background: Next generation sequencing (NGS) provides a key technology for deciphering the genetic underpinnings of human diseases. Typical NGS analyses of a patient depict tens of thousands non-reference coding variants, but only one or very few are expected to be significant for the relevant disorder. In a filtering stage, one employs family segregation, rarity in the population, predicted protein impact and evolutionary conservation as a means for shortening the variation list. However, narrowing down further towards culprit disease genes usually entails laborious seeking of gene-phenotype relationships, consulting numerous separate databases. Thus, a major challenge is to transition from the few hundred shortlisted genes to the most viable disease-causing candidates. Results: We describe a novel tool, VarElect ( http://ve.genecards.org ), a comprehensive phenotype-dependent variant/gene prioritizer, based on the widely-used GeneCards, which helps rapidly identify causal mutations with extensive evidence. The GeneCards suite offers an effective and speedy alternative, whereby >120 gene-centric automatically-mined data sources are jointly available for the task. VarElect cashes on this wealth of information, as well as on GeneCards' powerful free-text Boolean search and scoring capabilities, proficiently matching variant-containing genes to submitted disease/symptom keywords. The tool also leverages the rich disease and pathway information of MalaCards, the human disease database, and PathCards, the unified pathway (SuperPaths) database, both within the GeneCards Suite. The VarElect algorithm infers direct as well as indirect links between genes and phenotypes, the latter benefitting from GeneCards' diverse gene-to-gene data links in GenesLikeMe. Finally, our tool offers an extensive gene-phenotype evidence portrayal ("MiniCards") and hyperlinks to the parent databases. Conclusions: We demonstrate that VarElect compares favorably with several often-used NGS phenotyping tools, thus providing a robust facility for ranking genes, pointing out their likelihood to be related to a patient's disease. VarElect's capacity to automatically process numerous NGS cases, either in stand-alone format or in VCF-analyzer mode (TGex and VarAnnot), is indispensable for emerging clinical projects that involve thousands of whole exome/genome NGS analyses.
Identification of a Functional Risk Variant for Pemphigus Vulgaris in the ST18 Gene
Vodo D., Sarig O., Geller S. et_al. (2016) PLoS Genetics. 12, 5, e1006008. Abstract
Pemphigus vulgaris (PV) is a life-threatening autoimmune mucocutaneous blistering disease caused by disruption of intercellular adhesion due to auto-antibodies directed against epithelial components. Treatment is limited to immunosuppressive agents, which are associated with serious adverse effects. The propensity to develop the disease is in part genetically determined. We therefore reasoned that the delineation of PV genetic basis may point to novel therapeutic strategies. Using a genome-wide association approach, we recently found that genetic variants in the vicinity of the ST18 gene confer a significant risk for the disease. Here, using targeted deep sequencing, we identified a PV-associated variant residing within the ST18 promoter region (p
GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data
Ben-Ari Fuchs S., Lieder I., Stelzer G. et_al. (2016) OMICS A Journal of Integrative Biology. 20, 3, p. 139-151 Abstract
Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ (geneanalytics.genecards.org), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards® - the human gene database; the MalaCards - the human diseases database; and the PathCards - the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery® - the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics, and others yet to emerge on the postgenomics horizon.
The human olfactory transcriptome
Olender T., Keydar I., Pinto J. M. et_al. (2016) BMC Genomics. 17, 1, 619. Abstract
Background: Olfaction is a versatile sensory mechanism for detecting thousands of volatile odorants. Although molecular basis of odorant signaling is relatively well understood considerable gaps remain in the complete charting of all relevant gene products. To address this challenge, we applied RNAseq to four well-characterized human olfactory epithelial samples and compared the results to novel and published mouse olfactory epithelium as well as 16 human control tissues. Results: We identified 194 non-olfactory receptor (OR) genes that are overexpressed in human olfactory tissues vs. controls. The highest overexpression is seen for lipocalins and bactericidal/permeability-increasing (BPI)-fold proteins, which in other species include secreted odorant carriers. Mouse-human discordance in orthologous lipocalin expression suggests different mammalian evolutionary paths in this family. Of the overexpressed genes 36 have documented olfactory function while for 158 there is little or no previous such functional evidence. The latter group includes GPCRs, neuropeptides, solute carriers, transcription factors and biotransformation enzymes. Many of them may be indirectly implicated in sensory function, and ~70 % are over expressed also in mouse olfactory epithelium, corroborating their olfactory role. Nearly 90 % of the intact OR repertoire, and ~60 % of the OR pseudogenes are expressed in the olfactory epithelium, with the latter showing a 3-fold lower expression. ORs transcription levels show a 1000-fold inter-paralog variation, as well as significant inter-individual differences. We assembled 160 transcripts representing 100 intact OR genes. These include 1-4 short 5' non-coding exons with considerable alternative splicing and long last exons that contain the coding region and 3' untranslated region of highly variable length. Notably, we identified 10 ORs with an intact open reading frame but with seemingly non-functional transcripts, suggesting a yet unreported OR pseudogenization mechanism. Analysis of the OR upstream regions indicated an enrichment of the homeobox family transcription factor binding sites and a consensus localization of a specific transcription factor binding site subfamily (Olf/EBF). Conclusions: We provide an overview of expression levels of ORs and auxiliary genes in human olfactory epithelium. This forms a transcriptomic view of the entire OR repertoire, and reveals a large number of over-expressed uncharacterized human non-receptor genes, providing a platform for future discovery.
TECPR2 mutations cause a new subtype of familial dysautonomia like hereditary sensory autonomic neuropathy with intellectual disability
Heimer G., Oz-Levi D., Eyal E. et_al. (2016) European Journal of Paediatric Neurology. 20, 1, p. 69-79 Abstract
Background TECPR2 was first described as a disease causing gene when the c.3416delT frameshift mutation was found in five Jewish Bukharian patients with similar features. It was suggested to constitute a new subtype of complex hereditary spastic paraparesis (SPG49). Results We report here 3 additional patients from unrelated non-Bukharian families, harboring two novel mutations (c.1319delT, c.C566T) in this gene. Accumulating clinical data clarifies that in addition to intellectual disability and evolving spasticity the main disabling feature of this unique disorder is autonomic-sensory neuropathy accompanied by chronic respiratory disease and paroxysmal autonomic events. Conclusion We suggest that the disease should therefore be classified as a new subtype of hereditary sensory-autonomic neuropathy. The discovery of additional mutations in non-Bukharian patients implies that this disease might be more common than previously appreciated and should therefore be considered in undiagnosed cases of intellectual disability with autonomic features and respiratory symptoms regardless of demographic origin.
Genic insights from integrated human proteomics in GeneCards
Fishilevich S., Zimmerman S., Kohn A., Iny Stein S. T., Olender T., Kolker E., Safran M. & Lancet D. (2016) Database : the journal of biological databases and curation. 2016, Abstract
GeneCards is a one-stop shop for searchable human gene annotations (http://www.gene cards.org/). Data are automatically mined from ∼120 sources and presented in an integrated web card for every human gene. We report the application of recent advances in proteomics to enhance gene annotation and classification in GeneCards. First, we constructed the Human Integrated Protein Expression Database (HIPED), a unified database of protein abundance in human tissues, based on the publically available mass spectrometry (MS)-based proteomics sources ProteomicsDB, Multi-Omics Profiling Expression Database, Protein Abundance Across Organisms and The MaxQuant DataBase. The integrated database, residing within GeneCards, compares favourably with its individual sources, covering nearly 90% of human protein-coding genes. For gene annotation and comparisons, we first defined a protein expression vector for each gene, based on normalized abundances in 69 normal human tissues. This vector is portrayed in the GeneCards expression section as a bar graph, allowing visual inspection and comparison. These data are juxtaposed with transcriptome bar graphs. Using the protein expression vectors, we further defined a pairwise metric that helps assess expression-based pairwise proximity. This new metric for finding functional partners complements eight others, including sharing of pathways, gene ontology (GO) terms and domains, implemented in the GeneCards Suite. In parallel, we calculated proteome-based differential expression, highlighting a subset of tissues that overexpress a gene and subserving gene classification. This textual annotation allows users of VarElect, the suite's next-generation phenotyper, to more effectively discover causative disease variants. Finally, we define the protein-RNA expression ratio and correlation as yet another attribute of every gene in each tissue, adding further annotative information. The results constitute a significant enhancement of several GeneCards sections and help promote and organize the genome-wide structural and functional knowledge of the human proteome.
ORDB, HORDE, ODORactor and other on-line knowledge resources of olfactory receptor-odorant interactions
Marenco L., Wang R., McDougal R. et_al. (2016) Database-The Journal Of Biological Databases And Curation. 2016, baw132. Abstract
We present here an exploration of the evolution of three well-established, web-based resources dedicated to the dissemination of information related to olfactory receptors (ORs) and their functional ligands, odorants. These resources are: the Olfactory Receptor Database (ORDB), the Human Olfactory Data Explorer (HORDE) and ODORactor. ORDB is a repository of genomic and proteomic information related to ORs and other chemosensory receptors, such as taste and pheromone receptors. Three companion databases closely integrated with ORDB are OdorDB, ORModelDB and OdorMapDB; these resources are part of the SenseLab suite of databases (http://senselab.med.yale.edu). HORDE (http://genome.weizmann.ac.il/horde/) is a semi-automatically populated database of the OR repertoires of human and several mammals. ODORactor (http://mdl.shsmu.edu.cn/ODORactor/) provides information related to OR-odorant interactions from the perspective of the odorant. All three resources are connected to each other via web-links.
The GeneCards suite: From gene data mining to disease genome sequence analyses
Stelzer G., Rosen N., Plaschkes I. et_al. (2016) Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.]. 2016, p. 1.30.1-1.30.33 Abstract
GeneCards, the human gene compendium, enables researchers to effectively navigate and inter-relate the wide universe of human genes, diseases, variants, proteins, cells, and biological pathways. Our recently launched Version 4 has a revamped infrastructure facilitating faster data updates, better-targeted data queries, and friendlier user experience. It also provides a stronger foundation for the GeneCards suite of companion databases and analysis tools. Improved data unification includes gene-disease links via MalaCards and merged biological pathways via PathCards, as well as drug information and proteome expression. VarElect, another suite member, is a phenotype prioritizer for next-generation sequencing, leveraging the GeneCards and MalaCards knowledgebase. It automatically infers direct and indirect scored associations between hundreds or even thousands of variant-containing genes and disease phenotype terms. Var- Elects capabilities, either independently or within TGex, our comprehensive variant analysis pipeline, help prepare for the challenge of clinical projects that involve thousands of exome/genome NGS analyses.

2015

Whole-exome sequencing in undiagnosed genetic diseases: Interpreting 119 trios
Zhu X., Petrovski S., Xie P. et_al. (2015) Genetics in Medicine. 17, 10, p. 774-781 Abstract
Purpose:Despite the recognized clinical value of exome-based diagnostics, methods for comprehensive genomic interpretation remain immature. Diagnoses are based on known or presumed pathogenic variants in genes already associated with a similar phenotype. Here, we extend this paradigm by evaluating novel bioinformatics approaches to aid identification of new gene-disease associations.Methods:We analyzed 119 trios to identify both diagnostic genotypes in known genes and candidate genotypes in novel genes. We considered qualifying genotypes based on their population frequency and in silico predicted effects we also characterized the patterns of genotypes enriched among this collection of patients.Results:We obtained a genetic diagnosis for 29 (24%) of our patients. We showed that patients carried an excess of damaging de novo mutations in intolerant genes, particularly those shown to be essential in mice (P = 3.4 × 10 -8). This enrichment is only partially explained by mutations found in known disease-causing genes.Conclusion:This work indicates that the application of appropriate bioinformatics analyses to clinical sequence data can also help implicate novel disease genes and suggest expanded phenotypes for known disease genes. These analyses further suggest that some cases resolved by whole-exome sequencing will have direct therapeutic implications.
Exome sequencing as a differential diagnosis tool: Resolving mild trichohepatoenteric syndrome
Oz Levi L. D., Weiss B., Lahad A. et_al. (2015) Clinical Genetics. 87, 6, p. 602-603 Abstract
PathCards: Multi-source consolidation of human biological pathways
Belinky F., Nativ N., Stelzer G., Zimmerman S., Stein T. I., Safran M. & Lancet D. (2015) Database : the journal of biological databases and curation. 2015, bav006. Abstract
The study of biological pathways is key to a large number of systems analyses. However, many relevant tools consider a limited number of pathway sources, missing out on many genes and gene-to-gene connections. Simply pooling several pathways sources would result in redundancy and the lack of systematic pathway interrelations. To address this, we exercised a combination of hierarchical clustering and nearest neighbor graph representation, with judiciously selected cutoff values, thereby consolidating 3215 human pathways from 12 sources into a set of 1073 SuperPaths. Our unification algorithm finds a balance between reducing redundancy and optimizing the level of pathway-related informativeness for individual genes. We show a substantial enhancement of the SuperPaths' capacity to infer gene-to-gene relationships when compared with individual pathway sources, separately or taken together. Further, we demonstrate that the chosen 12 sources entail nearly exhaustive gene coverage. The computed SuperPaths are presented in a new online database, PathCards, showing each SuperPath, its constituent network of pathways, and its contained genes. This provides researchers with a rich, searchable systems analysis resource.

2014

Quasispecies in population of compositional assemblies
Gross R., Fouxon I., Lancet D. & Markovitch O. (2014) BMC Evolutionary Biology. 14, 265. Abstract
Background: The quasispecies model refers to information carriers that undergo self-replication with errors. A quasispecies is a steady-state population of biopolymer sequence variants generated by mutations from a master sequence. A quasispecies error threshold is a minimal replication accuracy below which the population structure breaks down. Theory and experimentation of this model often refer to biopolymers, e.g. RNA molecules or viral genomes, while its prebiotic context is often associated with an RNA world scenario. Here, we study the possibility that compositional entities which code for compositional information, intrinsically different from biopolymers coding for sequential information, could show quasispecies dynamics. Results: We employed a chemistry-based model, graded autocatalysis replication domain (GARD), which simulates the network dynamics within compositional molecular assemblies. In GARD, a compotype represents a population of similar assemblies that constitute a quasi-stationary state in compositional space. A compotype's center-of-mass is found to be analogous to a master sequence for a sequential quasispecies. Using single-cycle GARD dynamics, we measured the quasispecies transition matrix (Q) for the probabilities of transition from one center-of-mass Euclidean distance to another. Similarly, the quasispecies' growth rate vector (A) was obtained. This allowed computing a steady state distribution of distances to the center of mass, as derived from the quasispecies equation. In parallel, a steady state distribution was obtained via the GARD equation kinetics. Rewardingly, a significant correlation was observed between the distributions obtained by these two methods. This was only seen for distances to the compotype center-of-mass, and not to randomly selected compositions. A similar correspondence was found when comparing the quasispecies time dependent dynamics towards steady state. Further, changing the error rate by modifying basal assembly joining rate of GARD kinetics was found to display an error catastrophe, similar to the standard quasispecies model. Additional augmentation of compositional mutations leads to the complete disappearance of the master-like composition. Conclusions: Our results show that compositional assemblies, as simulated by the GARD formalism, portray significant attributes of quasispecies dynamics. This expands the applicability of the quasispecies model beyond sequence-based entities, and potentially enhances validity of GARD as a model for prebiotic evolution.
Multispecies population dynamics of prebiotic compositional assemblies
Markovitch O. & Lancet D. (2014) Journal of Theoretical Biology. 357, p. 26-34 Abstract
Present life portrays a two-tier phenomenology: molecules compose supramolecular structures, such as cells or organisms, which in turn portray population behaviors, including selection, evolution and ecological dynamics. Prebiotic models have often focused on evolution in populations of self-replicating molecules, without explicitly invoking the intermediate molecular-to-supramolecular transition. Here, we explore a prebiotic model that allows one to relate parameters of chemical interaction networks within molecular assemblies to emergent population dynamics. We use the graded autocatalysis replication domain (GARD) model, which simulates the network dynamics within amphiphile-containing molecular assemblies, and exhibits quasi-stationary compositional states termed compotype species. These grow by catalyzed accretion, divide and propagate their compositional information to progeny in a replication-like manner. The model allows us to ask how molecular network parameters influence assembly evolution and population dynamics parameters. In 1000 computer simulations, each embodying different parameter set of the global chemical interaction network parameters, we observed a wide range of behaviors. These were analyzed by a multi species logistic model often used for analyzing population ecology (r-K or Lotka-Volterra competition model). We found that compotypes with a larger intrinsic molecular repertoire show a higher intrinsic growth (r) and lower carrying capacity (K), as well as lower replication fidelity. This supports a prebiotic scenario initiated by fast-replicating assemblies with a high molecular diversity, evolving into more faithful replicators with narrower molecular repertoires.
MalaCards: A Comprehensive automatically-mined Database of human diseases
Rappaport N., Twik M., Nativ N., Stelzer G., Bahir I., Stein T. I., Safran M. & Lancet D. (2014) Current Protocols in Bioinformatics. 2014, p. 1.24.1-1.24.19 Abstract
Systems medicine provides insights into mechanisms of human diseases, and expedites the development of better diagnostics and drugs. To facilitate such strategies, we initiated MalaCards, a compendium of human diseases and their annotations, integrating and often remodeling information from 64 data sources. MalaCards employs, among others, the proven automatic data-mining strategies established in the construction of GeneCards, our widely used compendium of human genes. The development of MalaCards poses many algorithmic challenges, such as disease name unification, integrated classification, gene-disease association, and disease-targeted expression analysis. MalaCards displays a Web card for each of >19,000 human diseases, with 17 sections, including textual summaries, related diseases, related genes, genetic variations and tests, and relevant publications. Also included are a powerful search engine and a variety of categorized disease lists. This unit describes two basic protocols to search and browse MalaCards effectively. Curr. Protoc. Bioinform. 47:1.24.1-1.24.19.
MOPED enables discoveries through consistently processed proteomics data
Higdon R., Stewart E., Stanberry L. et_al. (2014) Journal of Proteome Research. 13, 1, p. 107-113 Abstract
The Model Organism Protein Expression Database (MOPED, http://moped. proteinspire.org) is an expanding proteomics resource to enable biological and biomedical discoveries. MOPED aggregates simple, standardized and consistently processed summaries of protein expression and metadata from proteomics (mass spectrometry) experiments from human and model organisms (mouse, worm, and yeast). The latest version of MOPED adds new estimates of protein abundance and concentration as well as relative (differential) expression data. MOPED provides a new updated query interface that allows users to explore information by organism, tissue, localization, condition, experiment, or keyword. MOPED supports the Human Proteome Project's efforts to generate chromosome- and diseases-specific proteomes by providing links from proteins to chromosome and disease information as well as many complementary resources. MOPED supports a new omics metadata checklist to harmonize data integration, analysis, and use. MOPED's development is driven by the user community, which spans 90 countries and guides future development that will transform MOPED into a multiomics resource. MOPED encourages users to submit data in a simple format. They can use the metadata checklist to generate a data publication for this submission. As a result, MOPED will provide even greater insights into complex biological processes and systems and enable deeper and more comprehensive biological and biomedical discoveries.

2013

Deficiency of Asparagine Synthetase Causes Congenital Microcephaly and a Progressive Form of Encephalopathy
Ruzzo E. K., Oz Levi D., Ben-Asher E., Olender T. & Lancet D. (2013) Neuron. 80, 2, p. 429-441 Abstract
We analyzed four families that presented with a similar condition characterized by congenital microcephaly, intellectual disability, progressive cerebral atrophy, and intractable seizures. We show that recessive mutations in the ASNS gene are responsible for this syndrome. Two of the identified missense mutations dramatically reduce ASNS protein abundance, suggesting that the mutations cause loss of function. Hypomorphic Asns mutant mice have structural brain abnormalities, including enlarged ventricles and reduced cortical thickness, and show deficits in learning and memory mimicking aspects of the patient phenotype. ASNS encodes asparagine synthetase, which catalyzes the synthesis of asparagine from glutamine and aspartate. The neurological impairment resulting from ASNS deficiency may be explained by asparagine depletion in the brain or by accumulation of aspartate/glutamate leading to enhanced excitability and neuronal damage. Our study thus indicates that asparagine synthesis is essential for the development and function of the brain but not for that of other organs
TECPR2:A new autophagy link for neurodegeneration
Oz-Levi D., Gelman A., Elazar Z. & Lancet D. (2013) Autophagy. 9, 5, p. 801-802 Abstract
Autophagy dysfunction has been implicated in a group of progressive neurodegenerative diseases, and has been reported to play a major role in the pathogenesis of these disorders. We have recently reported a recessive mutation in TECPR2, an autophagy-implicated WD repeat-containing protein, in five individuals with a novel form of monogenic hereditary spastic paraparesis (HSP). We found that diseased skin fibroblasts had a decreased accumulation of the autophagy-initiation protein MAP1LC3B/LC3B, and an attenuated delivery of both LC3B and the cargorecruiting protein SQSTM1/p62 to the lysosome where they are subject to degradation. The discovered TECPR2 mutation reveals for the first time a role for aberrant autophagy in a major class of Mendelian neurodegenerative diseases, and suggests mechanisms by which impaired autophagy may impinge on a broader scope of neurodegeneration.
An overview of synergistic data tools for biological scrutiny
Olender T., Safran M., Edgar R. et_al. (2013) Israel Journal of Chemistry. 53, 3-4, p. 185-198 Abstract
A network of biological databases is reviewed, supplying a framework for studies of human genes and the association of their genomic variations with human phenotypes. The network is composed of GeneCards, the human gene compendium, which provides comprehensive information on all known and predicted human genes, along with its suite members GeneDecks and GeneLoc. Two databases are shown that address genes and variations focusing on olfactory reception (HORDE) and transduction (GOSdb). In the realm of disease scrutiny, we portray MalaCards, a novel comprehensive database of human diseases and their annotations. Also shown is GeneKid, a tool aimed at generating novel kidney disease biomarkers using systems biology, as well as Xome, a database for whole-exome next-generation DNA sequences for human diseases in the Israeli population. Finally, we show LifeMap Discovery, a database of embryonic development, stem cell research and regenerative medicine, which links to both GeneCards and MalaCards.
HORDE: Comprehensive resource for olfactory receptor genomics
Olender T., Nativ N. & Lancet D. (2013) Olfactory Receptors. J. Crasto C.(eds.). p. 23-38 Abstract
Olfactory receptors (ORs) constitute the largest gene family in the mammalian genome. The existence of these proteins underlies the nature of, and variability in, odorant perception. The Human Olfactory Receptor Data Explorer (HORDE, http://genome.weizmann.ac.il/horde/ ) is a free online resource, which presents a complete compendium of all OR genes and pseudogenes in the genome of human and four other vertebrates. HORDE includes three parts: (1) an automated pipeline, which mines OR gene and pseudogene sequences out of complete genomes, and generates gene symbols based on sequence similarity; (2) a card generator that obtains and displays annotative information on individual ORs retrieved from external databases and relevant studies; and (3) a search engine that allows user retrieval of OR information. For human ORs, HORDE specifically addresses the universe of interindividual variation, as obtained from several sources, including whole genome sequences made possible by next-generation sequencing. This encompasses single nucleotide polymorphisms (SNP) and copy number variation (CNV), including deleterious mutational events. HORDE also hosts a number of tools designed specifically to assist in the study of OR evolution and function. In this chapter, we describe the status of HORDE (build #43). We also discuss plans for future enhancements and a road map for HORDE to become a better community-based bioinformatics tool. We highlight HORDE's role as a major research tool in the study of an expanding cohort of OR repertoires.
Non-redundant compendium of human ncRNA genes in GeneCards
Belinky F., Bahir I., Stelzer G. et_al. (2013) Bioinformatics. 29, 2, p. 255-261 Abstract
Motivation: Non-coding RNA (ncRNA) genes are increasingly acknowledged for their importance in the human genome. However, there is no comprehensive non-redundant database for all such human genes. Results: We leveraged the effective platform of GeneCards, the human gene compendium, together with the power of fRNAdb and additional primary sources, to judiciously unify all ncRNA gene entries obtainable from 15 different primary sources. Overlapping entries were clustered to unified locations based on an algorithm employing genomic coordinates. This allowed GeneCards' gamut of relevant entries to rise ∼5-fold, resulting in ∼80 000 human non-redundant ncRNAs, belonging to 14 classes. Such 'grand unification' within a regularly updated data structure will assist future ncRNA research.
The MATCHIT automaton: Exploiting compartmentalization for the synthesis of branched polymers
Weyland M. S., Fellermann H., Hadorn M., Sorek D., Lancet D., Rasmussen S. & Fuechslin R. M. (2013) Computational and Mathematical Methods in Medicine. 2013, 467428. Abstract
We propose an automaton, a theoretical framework that demonstrates how to improve the yield of the synthesis of branched chemical polymer reactions. This is achieved by separating substeps of the path of synthesis into compartments. We use chemical containers (chemtainers) to carry the substances through a sequence of fixed successive compartments. We describe the automaton in mathematical terms and show how it can be configured automatically in order to synthesize a given branched polymer target. The algorithm we present finds an optimal path of synthesis in linear time. We discuss how the automaton models compartmentalized structures found in cells, such as the endoplasmic reticulum and the Golgi apparatus, and we show how this compartmentalization can be exploited for the synthesis of branched polymers such as oligosaccharides. Lastly, we show examples of artificial branched polymers and discuss how the automaton can be configured to synthesize them with maximal yield.
General Olfactory Sensitivity Database (GOSdb): Candidate Genes and their Genomic Variations
Keydar I., Ben-Asher E., Feldmesser E. et_al. (2013) Human Mutation. 34, 1, p. 32-41 Abstract
Genetic variations in olfactory receptors likely contribute to the diversity of odorant-specific sensitivity phenotypes. Our working hypothesis is that genetic variations in auxiliary olfactory genes, including those mediating transduction and sensory neuronal development, may constitute the genetic basis for general olfactory sensitivity (GOS) and congenital general anosmia (CGA). We thus performed a systematic exploration for auxiliary olfactory genes and their documented variation. This included a literature survey, seeking relevant functional in vitro studies, mouse gene knockouts and human disorders with olfactory phenotypes, as well as data mining in published transcriptome and proteome data for genes expressed in olfactory tissues. In addition, we performed next-generation transcriptome sequencing (RNA-seq) of human olfactory epithelium and mouse olfactory epithelium and bulb, so as to identify sensory-enriched transcripts. Employing a global score system based on attributes of the 11 data sources utilized, we identified a list of 1,680 candidate auxiliary olfactory genes, of which 450 are shortlisted as having higher probability of a functional role. For the top-scoring 136 genes, we identified genomic variants (probably damaging single nucleotide polymorphisms, indels, and copy number deletions) gleaned from public variation repositories. This database of genes and their variants should assist in rationalizing the great interindividual variation in human overall olfactory sensitivity (http://genome.weizmann.ac.il/GOSdb).
MalaCards: An integrated compendium for diseases and their annotation
Rappaport N., Nativ N., Stelzer G. et_al. (2013) Database-The Journal Of Biological Databases And Curation. 2013, bat018. Abstract
Comprehensive disease classification, integration and annotation are crucial for biomedical discovery. At present, disease compilation is incomplete, heterogeneous and often lacking systematic inquiry mechanisms. We introduce MalaCards, an integrated database of human maladies and their annotations, modeled on the architecture and strategy of the GeneCards database of human genes. MalaCards mines and merges 44 data sources to generate a computerized card for each of 16 919 human diseases. Each MalaCard contains disease-specific prioritized annotations, as well as inter-disease connections, empowered by the GeneCards relational database, its searches and GeneDecks set analyses. First, we generate a disease list from 15 ranked sources, using disease-name unification heuristics. Next, we use four schemes to populate MalaCards sections: (i) directly interrogating disease resources, to establish integrated disease names, synonyms, summaries, drugs/therapeutics, clinical features, genetic tests and anatomical context; (ii) searching GeneCards for related publications, and for associated genes with corresponding relevance scores; (iii) analyzing disease-associated gene sets in GeneDecks to yield affiliated pathways, phenotypes, compounds and GO terms, sorted by a composite relevance score and presented with GeneCards links; and (iv) searching within MalaCards itself, e.g. for additional related diseases and anatomical context. The latter forms the basis for the construction of a disease network, based on shared MalaCards annotations, embodying associations based on etiology, clinical features and clinical conditions. This broadly disposed network has a power-law degree distribution, suggesting that this might be an inherent property of such networks. Work in progress includes hierarchical malady classification, ontological mapping and disease set analyses, striving to make MalaCards an even more effective tool for biomedical research.

2012

Is There an Optimal Level of Open-Endedness in Prebiotic Evolution?
Markovitch O., Sorek D., Lui L. T., Lancet D. & Krasnogor N. (2012) Origins of Life and Evolution of Biospheres. 42, 5, p. 469-474 Abstract
In this paper we explore the question of whether there is an optimal set up for a putative prebiotic system leading to open-ended evolution (OEE) of the events unfolding within this system. We do so by proposing two key innovations. First, we introduce a new index that measures OEE as a function of the likelihood of events unfolding within a universe given its initial conditions. Next, we apply this index to a variant of the graded autocatalysis replication domain (GARD) model, Segre et al. (P Natl Acad Sci USA 97(8):4112-4117, 2000; Markovitch and Lancet Artif Life 18(3), 2012), and use it to study - under a unified and concise prebiotic evolutionary framework - both a variety of initial conditions of the universe and the OEE of species that evolve from them.
Mutation in TECPR2 reveals a role for autophagy in hereditary spastic paraparesis
Oz Levi D., Ben-Zeev B., Ruzzo E. K. et_al. (2012) American Journal of Human Genetics. 91, 6, p. 1065-1072 Abstract
We studied five individuals from three Jewish Bukharian families affected by an apparently autosomal-recessive form of hereditary spastic paraparesis accompanied by severe intellectual disability, fluctuating central hypoventilation, gastresophageal reflux disease, wake apnea, areflexia, and unique dysmorphic features. Exome sequencing identified one homozygous variant shared among all affected individuals and absent in controls: a 1 bp frameshift TECPR2 deletion leading to a premature stop codon and predicting significant degradation of the protein. TECPR2 has been reported as a positive regulator of autophagy. We thus examined the autophagy-related fate of two key autophagic proteins, SQSTM1 (p62) and MAP1LC3B (LC3), in skin fibroblasts of an affected individual, as compared to a healthy control, and found that both protein levels were decreased and that there was a more pronounced decrease in the lipidated form of LC3 (LC3II). siRNA knockdown of TECPR2 showed similar changes, consistent with aberrant autophagy. Our results are strengthened by the fact that autophagy dysfunction has been implicated in a number of other neurodegenerative diseases. The discovered TECPR2 mutation implicates autophagy, a central intracellular mechanism, in spastic paraparesis.
Evolutionary grass roots for odor recognition
Olender T. & Lancet D. (2012) Chemical Senses. 37, 7, p. 581-584 Abstract
Considerable evidence supports the idea that odorant recognition depends on specific sequence variations in olfactory receptor (OR) proteins. Much of this emerges from in vitro screens in heterogenous expression systems. However, the ultimate proof should arise from measurements of odorant thresholds in human individuals harboring different OR genetic variants, a research vein that has so far been only scantly explored. The study of McRae et al., published in this issue of Chemical Senses, shows how the recognition of a grassy odorant depends on specific OR interindividual sequence changes. It provides a clear relevant example for the impact of genetics on olfaction and is an excellent portrayal of the power of human genomics to decipher olfactory perception.
Personal receptor repertoires: olfaction as a model
Olender T., Waszak S. M., Viavant M. et_al. (2012) BMC Genomics. 13, 1, 414. Abstract
Background: Information on nucleotide diversity along completely sequenced human genomes has increased tremendously over the last few years. This makes it possible to reassess the diversity status of distinct receptor proteins in different human individuals. To this end, we focused on the complete inventory of human olfactory receptor coding regions as a model for personal receptor repertoires.Results: By performing data-mining from public and private sources we scored genetic variations in 413 intact OR loci, for which one or more individuals had an intact open reading frame. Using 1000 Genomes Project haplotypes, we identified a total of 4069 full-length polypeptide variants encoded by these OR loci, average of ~10 per locus, constituting a lower limit for the effective human OR repertoire. Each individual is found to harbor as many as 600 OR allelic variants, ~50% higher than the locus count. Because OR neuronal expression is allelically excluded, this has direct effect on smell perception diversity of the species. We further identified 244 OR segregating pseudogenes (SPGs), loci showing both intact and pseudogene forms in the population, twenty-six of which are annotatively " resurrected" from a pseudogene status in the reference genome. Using a custom SNP microarray we validated 150 SPGs in a cohort of 468 individuals, with every individual genome averaging 36 disrupted sequence variations, 15 in homozygote form. Finally, we generated a multi-source compendium of 63 OR loci harboring deletion Copy Number Variations (CNVs). Our combined data suggest that 271 of the 413 intact OR loci (66%) are affected by nonfunctional SNPs/indels and/or CNVs.Conclusions: These results portray a case of unusually high genetic diversity, and suggest that individual humans have a highly personalized inventory of functional olfactory receptors, a conclusion that might apply to other receptor multigene families.
Excess mutual catalysis is required for effective evolvability
Markovitch O. & Lancet D. (2012) Artificial Life. 18, 3, p. 243-266 Abstract
It is widely accepted that autocatalysis constitutes a crucial facet of effective replication and evolution (e.g., in Eigen's hypercycle model). Other models for early evolution (e.g., by Dyson, Gánti, Varela, and Kauffman) invoke catalytic networks, where cross-catalysis is more apparent. A key question is how the balance between auto- (self-) and cross- (mutual) catalysis shapes the behavior of model evolving systems. This is investigated using the graded autocatalysis replication domain (GARD) model, previously shown to capture essential features of reproduction, mutation, and evolution in compositional molecular assemblies. We have performed numerical simulations of an ensemble of GARD networks, each with a different set of lognormally distributed catalytic values. We asked what is the influence of the catalytic content of such networks on beneficial evolution. Importantly, a clear trend was observed, wherein only networks with high mutual catalysis propensity (pmc) allowed for an augmented diversity of composomes, quasi-stationary compositions that exhibit high replication fidelity. We have reexamined a recent analysis that showed meager selection in a single GARD instance and for a few nonstationary target compositions. In contrast, when we focused here on compotypes (clusters of composomes) as targets for selection in populations of compositional assemblies, appreciable selection response was observed for a large portion of the networks simulated. Further, stronger selection response was seen for high pmc values. Our simulations thus demonstrate that GARD can help analyze important facets of evolving systems, and indicate that excess mutual catalysis over self-catalysis is likely to be important for the emergence of molecular systems capable of evolutionlike behavior.
DOCK4 and CEACAM21 as novel schizophrenia candidate genes in the Jewish population
Alkelai A., Lupoli S., Greenbaum L., Kohn Y., Kanyas-Sarner K., Ben-Asher E., Lancet D., Macciardi F. & Lerer B. (2012) International Journal of Neuropsychopharmacology. 15, 4, p. 459-469 Abstract
It is well accepted that schizophrenia has a strong genetic component. Several genome-wide association studies (GWASs) of schizophrenia have been published in recent years; most of them population based with a case-control design. Nevertheless, identifying the specific genetic variants which contribute to susceptibility to the disorder remains a challenging task. A family-based GWAS strategy may be helpful in the identification of schizophrenia susceptibility genes since it is protected against population stratification, enables better accounting for genotyping errors and is more sensitive for identification of rare variants which have a very low frequency in the general population. In this project we implemented a family-based GWAS of schizophrenia in a sample of 107 Jewish-Israeli families. We found one genome-wide significant association in the intron of the DOCK4 gene (rs2074127, p value=1.13410 ⁷) and six additional nominally significant association signals with p
Genome sequence of the pattern-forming social bacterium: Paenibacillus dendritiformis C454 Chiral Morphotype
Sirota-Madi A., Olender T., Helman Y. et_al. (2012) Journal of Bacteriology. 194, 8, p. 2127-2128 Abstract
Paenibacillus dendritiformis is a Gram-positive, soil-dwelling, spore-forming social microorganism. An intriguing collective faculty of this strain is manifested by its ability to switch between different morphotypes, such as the branching (T) and the chiral (C) morphotypes. Here we report the 6.3-Mb draft genome sequence of the P. dendritiformis C454 chiral morphotype.
Association of the type 2 diabetes mellitus susceptibility gene, TCF7L2, with schizophrenia in an Arab-Israeli family sample
Alkelai A., Greenbaum L., Lupoli S., Kohn Y., Sarner-Kanyas K., Ben-Asher E., Lancet D., Macciardi F. & Lerer B. (2012) PLoS ONE. 7, 1, e29228. Abstract
Many reports in different populations have demonstrated linkage of the 10q24-q26 region to schizophrenia, thus encouraging further analysis of this locus for detection of specific schizophrenia genes. Our group previously reported linkage of the 10q24-q26 region to schizophrenia in a unique, homogeneous sample of Arab-Israeli families with multiple schizophrenia-affected individuals, under a dominant model of inheritance. To further explore this candidate region and identify specific susceptibility variants within it, we performed re-analysis of the 10q24-26 genotype data, taken from our previous genome-wide association study (GWAS) (Alkelai et al, 2011). We analyzed 2089 SNPs in an extended sample of 57 Arab Israeli families (189 genotyped individuals), under the dominant model of inheritance, which best fits this locus according to previously performed MOD score analysis. We found significant association with schizophrenia of the TCF7L2 gene intronic SNP, rs12573128, (p = 7.01×10^-6) and of the nearby intergenic SNP, rs1033772, (p = 6.59×10^-6) which is positioned between TCF7L2 and HABP2. TCF7L2 is one of the best confirmed susceptibility genes for type 2 diabetes (T2D) among different ethnic groups, has a role in pancreatic beta cell function and may contribute to the comorbidity of schizophrenia and T2D. These preliminary results independently support previous findings regarding a possible role of TCF7L2 in susceptibility to schizophrenia, and strengthen the importance of integrating linkage analysis models of inheritance while performing association analyses in regions of interest. Further validation studies in additional populations are required.
MOPED: Model Organism Protein Expression Database
Kolker E., Higdon R., Haynes W., Welch D., Broomall W., Lancet D., Stanberry L. & Kolker N. (2012) Nucleic Acids Research. 40, D1, p. D1093-D1099 Abstract
Large numbers of mass spectrometry proteomics studies are being conducted to understand all types of biological processes. The size and complexity of proteomics data hinders efforts to easily share, integrate, query and compare the studies. The Model Organism Protein Expression Database (MOPED, htttp://moped.proteinspire.org) is a new and expanding proteomics resource that enables rapid browsing of protein expression information from publicly available studies on humans and model organisms. MOPED is designed to simplify the comparison and sharing of proteomics data for the greater research community. MOPED uniquely provides protein level expression data, metaanalysis capabilities and quantitative data from standardized analysis. Data can be queried for specific proteins, browsed based on organism, tissue, localization and condition and sorted by false discovery rate and expression. MOPED empowers users to visualize their own expression data and compare it with existing studies. Further, MOPED links to various protein and pathway databases, including GeneCards, Entrez, UniProt, KEGG and Reactome. The current version of MOPED contains over 43 000 proteins with at least one spectral match and more than 11 million high certainty spectra.

2011

Replication of simulated prebiotic amphiphile vesicles controlled by experimental lipid physicochemical properties
Armstrong D. L., Markovitch O., Zidovetzki R. & Lancet D. (2011) Physical Biology. 8, 6, 066001. Abstract
We present a new embodiment of the graded autocatalysis replication domain (GARD) for the growth, replication and evolution of lipid vesicles based on a semi-empirical foundation using experimentally measured kinetic values of selected extant lipid species. Extensive simulations using this formalism elucidated the details of the dependence of the replication and properties of the vesicles on the physicochemical properties and concentrations of the lipids, both in the environment and in the vesicle. As expected, the overall concentration and number of amphiphilic components strongly affect average replication time. Furthermore, variations in acyl chain length and unsaturation of vesicles also influence replication rate, as do the relative concentrations of individual lipid types. Understanding of the dependence of replication rates on physicochemical parameters opens a new direction in the study of prebiotic vesicles and lays the groundwork for future studies involving the competition between lipid vesicles for available amphiphilic monomers.
Identification of new schizophrenia susceptibility loci in an ethnically homogeneous, family-based, Arab-Israeli sample
Alkelai A., Lupoli S., Greenbaum L. et_al. (2011) FASEB Journal. 25, 11, p. 4011-4023 Abstract
While the use of population-based samples is a common strategy in genome-wide association studies (GWASs), family-based samples have considerable advantages, such as robustness against population stratification and false-positive associations, better quality control, and the possibility to check for both linkage and association. In a genome-wide linkage study of schizophrenia in Arab-Israeli families with multiple affected individuals, we previously reported significant evidence for a susceptibility locus at chromosome 6q23.2-q24.1 and suggestive evidence at chromosomes 10q22.3-26.3, 2q36.1-37.3 and 7p21.1-22.3. To identify schizophrenia susceptibility genes, we applied a family-based GWAS strategy in an enlarged, ethnically homogeneous, Arab-Israeli family sample. We performed genome-wide single nucleotide polymorphism (SNP) genotyping and single SNP transmission disequilibrium test association analysis and found genome-wide significant association (best value of P=1.22×10 ^-11) for 8 SNPs within or near highly reasonable functional candidate genes for schizophrenia. Of particular interest are a group of SNPs within and flanking the transcriptional factor LRRFIP1 gene. To determine replicability of the significant associations beyond the Arab-Israeli population, we studied the association of the significant SNPs in a German case-control validation sample and found replication of associations near the UGT1 subfamily and EFHD1 genes. Applying an exploratory homozygosity mapping approach as a complementary strategy to identify schizophrenia susceptibility genes in our Arab Israeli sample, we identified 8 putative disease loci. Overall, this GWAS, which emphasizes the important contribution of family based studies, identifies promising candidate genes for schizophrenia.
In-silico human genomics with GeneCards
Stelzer G., Dalah I., Stein T. I. et_al. (2011) Human Genomics. 5, 6, p. 709-717 Abstract
Since 1998, the bioinformatics, systems biology, genomics and medical communities have enjoyed a synergistic relationship with the GeneCards database of human genes (http://www.genecards.org). This human gene compendium was created to help to introduce order into the increasing chaos of information flow. As a consequence of viewing details and deep links related to specific genes, users have often requested enhanced capabilities, such that, over time, GeneCards has blossomed into a suite of tools (including GeneDecks, GeneALaCart, GeneLoc, GeneNote and GeneAnnot) for a variety of analyses of both single human genes and sets thereof. In this paper, we focus on inhouse and external research activities which have been enabled, enhanced, complemented and, in some cases, motivated by GeneCards. In turn, such interactions have often inspired and propelled improvements in GeneCards. We describe here the evolution and architecture of this project, including examples of synergistic applications in diverse areas such as synthetic lethality in cancer, the annotation of genetic variations in disease, omics integration in a systems biology approach to kidney disease, and bioinformatics tools.
Mapping of molecular pathways, biomarkers and drug targets for diabetic nephropathy
Fechete R., Heinzel A., Perco P. et_al. (2011) Proteomics Clinical Applications. 5, 5-6, p. 354-366 Abstract
Purpose: For diseases with complex phenotype such as diabetic nephropathy (DN), integration of multiple Omics sources promises an improved description of the disease pathophysiology, being the basis for novel diagnostics and therapy, but equally important personalization aspects. Experimental design: Molecular features on DN were retrieved from public domain Omics studies and by mining scientific literature, patent text and clinical trial specifications. Molecular feature sets were consolidated on a human protein interaction network and interpreted on the level of molecular pathways in the light of the pathophysiology of the disease and its clinical context defined as associated biomarkers and drug targets. Results: About 1000 gene symbols each could be assigned to the pathophysiological description of DN and to the clinical context. Direct feature comparison showed minor overlap, whereas on the level of molecular pathways, the complement and coagulation cascade, PPAR signaling, and the renin-angiotensin system linked the disease descriptor space with biomarkers and targets. Conclusion and clinical relevance: Only the combined molecular feature landscapes closely reflect the clinical implications of DN in the context of hypertension and diabetes. Omics data integration on the level of interaction networks furthermore provides a platform for identification of pathway-specific biomarkers and therapy options.
Synthetic lethal hubs associated with vincristine resistant neuroblastoma
Fechete R., Barth S., Olender T. et_al. (2011) Molecular BioSystems. 7, 1, p. 200-214 Abstract
Chemotherapy of cancer experiences a number of shortcomings including development of drug resistance. This fact also holds true for neuroblastoma utilizing chemotherapeutics as vincristine. We performed a comparative analysis of molecular and cellular mechanisms associated with vincristine resistance utilizing cell line as well as human tissue data. Differential gene expression analysis revealed molecular features, processes and pathways afflicted with drug resistance mechanisms in general, and specifically with vincristine significantly involving actin associated features. However, specific mode of resistance as well as underlying genotype of parental, vincristine sensitive cells apparently exhibited significant heterogeneity. No consensus profile for vincristine resistance could be derived, but resistance-associated changes on the level of individual neuroblastoma cell lines as well as individual patient profiles became clearly evident. Based on these prerequisites we utilized the concept of synthetic lethality aimed at identifying hub proteins which when inhibited promise to induce cell death due to a synthetic lethal interaction with down-regulated, chemoresistance associated features. Our screening procedure identified synthetic lethal hub proteins afflicted with actin associated processes holding synthetic lethal interactions to down-regulated features individually found in all chemoresistant cell lines tested, therefore promising an improved therapeutic window. Verification of such synthetic lethal hub candidates in human neuroblastoma tissue expression profiles indicated the feasibility of this screening approach for addressing vincristine resistance in neuroblastoma.
Evolutionary attributes of simulated prebiotic metabolic networks
Markovitch O. & Lancet D. (2011) ECAL 2011. Abstract
A metabolism-first scenario for the origin of life entails that as early as replicating entities have emerged prebiotically, they must have constituted relatively complex molecular networks, arising via spontaneous accretion of assemblies of simpler organic molecules. While it is widely accepted that self-catalysis is a prerequisite for life, considerably less attention has been devoted to network-based mutual-catalysis and its effect on evolution. To remedy this, we have used the graded autocatalytic replication domain (GARD) model, previously shown to capture essential features of reproduction, mutation and evolution in compositional molecular assemblies. We simulated a large ensemble of GARD rate-enhancement networks, thus allowing one to better study the crucial network properties of the implicated molecular assemblies. We found, with high statistical power, that high prevalence of mutual-catalysis is required for the emergence of appreciable diversity and evolvability of the assemblies, as well as for them to have significant selection attributes. We suggest that only minimal self-catalysis capabilities are needed to facilitate evolution-like behavior, and that excess self-catalysis may drive a population towards an evolutionary 'dead-end'.
Omics Data Management and Annotation
Harel A., Dalah I., Pietrokovski S., Safran M. & Lancet D. (2011) Bioinformatics For Omics Data. p. 71-96 Abstract
Technological Omics breakthroughs, including next generation sequencing, bring avalanches of data which need to undergo effective data management to ensure integrity, security, and maximal knowledge-gleaning. Data management system requirements include flexible input formats, diverse data entry mechanisms and views, user friendliness, attention to standards, hardware and software platform definition, as well as robustness. Relevant solutions elaborated by the scientific community include Laboratory Information Management Systems (LIMS) and standardization protocols facilitating data sharing and managing. In project planning, special consideration has to be made when choosing relevant Omics annotation sources, since many of them overlap and require sophisticated integration heuristics. The data modeling step defines and categorizes the data into objects (e.g., genes, articles, disorders) and creates an application flow. A data storage/warehouse mechanism must be selected, such as file-based systems and relational databases, the latter typically used for larger projects. Omics project life cycle considerations must include the definition and deployment of new versions, incorporating either full or partial updates. Finally, quality assurance (QA) procedures must validate data and feature integrity, as well as system performance expectations. We illustrate these data management principles with examples from the life cycle of the GeneCards Omics project (http://www.genecards.org), a comprehensive, widely used compendium of annotative information about human genes. For example, the GeneCards infrastructure has recently been changed from text files to a relational database, enabling better organization and views of the growing data. Omics data handling benefits from the wealth of Web-based information, the vast amount of public domain software, increasingly affordable hardware, and effective use of data management and annotation principles as outlined in this chapter.

Submitted version

2010

Genome sequence of the pattern forming Paenibacillus vortex bacterium reveals potential for thriving in complex environments
Sirota-Madi A., Olender T., Helman Y. et_al. (2010) BMC Genomics. 11, 1, 710. Abstract
Background: The pattern-forming bacterium Paenibacillus vortex is notable for its advanced social behavior, which is reflected in development of colonies with highly intricate architectures. Prior to this study, only two other Paenibacillus species (Paenibacillus sp. JDR-2 and Paenibacillus larvae) have been sequenced. However, no genomic data is available on the Paenibacillus species with pattern-forming and complex social motility. Here we report the de novo genome sequence of this Gram-positive, soil-dwelling, sporulating bacterium.Results: The complete P. vortex genome was sequenced by a hybrid approach using 454 Life Sciences and Illumina, achieving a total of 289× coverage, with 99.8% sequence identity between the two methods. The sequencing results were validated using a custom designed Agilent microarray expression chip which represented the coding and the non-coding regions. Analysis of the P. vortex genome revealed 6,437 open reading frames (ORFs) and 73 non-coding RNA genes. Comparative genomic analysis with 500 complete bacterial genomes revealed exceptionally high number of two-component system (TCS) genes, transcription factors (TFs), transport and defense related genes. Additionally, we have identified genes involved in the production of antimicrobial compounds and extracellular degrading enzymes.Conclusions: These findings suggest that P. vortex has advanced faculties to perceive and react to a wide range of signaling molecules and environmental conditions, which could be associated with its ability to reconfigure and replicate complex colony architectures. Additionally, P. vortex is likely to serve as a rich source of genes important for agricultural, medical and industrial applications and it has the potential to advance the study of social microbiology within Gram-positive bacteria.
Systematic inference of copy-number genotypes from personal genome sequencing data reveals extensive olfactory receptor gene content diversity
Waszak S. M., Hasin Y., Zichner T. et_al. (2010) PLoS Computational Biology. 6, 11, e1000988. Abstract
Copy-number variations (CNVs) are widespread in the human genome, but comprehensive assignments of integer locus copy-numbers (i.e., copy-number genotypes) that, for example, enable discrimination of homozygous from heterozygous CNVs, have remained challenging. Here we present CopySeq, a novel computational approach with an underlying statistical framework that analyzes the depth-of-coverage of high-throughput DNA sequencing reads, and can incorporate paired-end and breakpoint junction analysis based CNV-analysis approaches, to infer locus copy-number genotypes. We benchmarked CopySeq by genotyping 500 chromosome 1 CNV regions in 150 personal genomes sequenced at low-coverage. The assessed copy-number genotypes were highly concordant with our performed qPCR experiments (Pearson correlation coefficient 0.94), and with the published results of two microarray platforms (95-99% concordance). We further demonstrated the utility of CopySeq for analyzing gene regions enriched for segmental duplications by comprehensively inferring copy-number genotypes in the CNV-enriched >800 olfactory receptor (OR) human gene and pseudogene loci. CopySeq revealed that OR loci display an extensive range of locus copy-numbers across individuals, with zero to two copies in some OR loci, and two to nine copies in others. Among genetic variants affecting OR loci we identified deleterious variants including CNVs and SNPs affecting ~15% and~20% of the human OR gene repertoire, respectively, implying that genetic variants with a possible impact on smell perception are widespread. Finally, we found that for several OR loci the reference genome appears to represent a minor-frequency variant, implying a necessary revision of the OR repertoire for future functional studies. CopySeq can ascertain genomic structural variation in specific gene families as well as at a genome-wide scale, where it may enable the quantitative evaluation of CNVs in genome-wide association studies involving high-throughput sequencing.
Fine mapping of AHI1 as a schizophrenia susceptibility gene: From association to evolutionary evidence
Torri F., Akelai A., Lupoli S. et_al. (2010) FASEB Journal. 24, 8, p. 3066-3082 Abstract
In previous studies, we identified a locus for schizophrenia on 6q23.3 and proposed the Abelson helper integration site 1 (AHI1) as the candidate gene. AHI1 is expressed in the brain and plays a key role in neurodevelopment, is involved in Joubert syndrome, and has been recently associated with autism. The neurodevelopmental role of AHI1 fits with etiological hypotheses of schizophrenia. To definitively confirm our hypothesis, we searched for associations using a dense map of the region. Our strongest findings lay within the AHI1 gene: single-nucleotide polymorphisms rs11154801 and rs7759971 showed significant associations (P=6.23E-06; P=0.84E-06) and haplotypes gave P values in the 10E-8 to 10E-10 range. The second highest significant region maps close to AHI1 and includes the intergenic region between BC040979 and PDE7B (rs2038549 at P=9.70E-06 and rs1475069 at P=6.97E-06), and PDE7B and MAP7. Using a sample of Palestinian Arab families to confirm these findings, we found isolated signals. While these results did not retain their significance after correction for multiple testing, the joint analysis across the 2 samples supports the role of AHI1, despite the presence of heterogeneity. Given the hypothesis of positive selection of schizophrenia genes, we resequenced a 11 kb region within AHI1 in ethnically defined populations and found evidence for a selective sweep. Network analysis indicates 2 haplotype clades, with schizophrenia-susceptibility haplotypes clustering within the major clade. In conclusion, our data support the role of AHI1 as a susceptibility gene for schizophrenia and confirm it has been subjected to positive selection, also shedding light on new possible candidate genes, MAP7 and PDE7B.
Lymphoblast and brain expression of AHI1 and the novel primate-specific gene, C6orf217, in schizophrenia and bipolar disorder
Slonimsky A., Levy I., Kohn Y., Rigbi A., Ben-Asher E., Lancet D., Agam G. & Lerer B. (2010) Schizophrenia Research. 120, 1-3, p. 159-166 Abstract
Association with schizophrenia of the Abelson Helper Integration Site 1 (AHI1) gene on chromosome 6q23 and the adjacent primate-specific gene, C6orf217, was demonstrated in an inbred, Arab Israeli family sample and replicated in an Icelandic case control sample. Further support was provided by a second replication in a large European sample and a meta-analysis that supported association with schizophrenia of all seven alleles overtransmitted to affected subjects in the original study. We examined constitutive expression of AHI1 and C6orf217 in immortalized lymphoblasts of patients from the Arab Israeli family sample in which the association with schizophrenia was originally discovered and population-matched normal controls, and in post-mortem brain of patients with schizophrenia and bipolar (BP) disorder and control subjects from the Stanley Medical Research Institute Collection. We found a significant effect of diagnostic group in the lymphoblast sample (F=5.72; df=2,39; p=0.006). Patients with early age of onset had higher AHI1 expression than controls and later onset patients (p=0.002; 0.03 respectively). C6orf217 expression in lymphoblasts was too low to measure. We found no difference in brain expression of AHI1 in schizophrenia or BP patients compared to controls. However, there was a genotypic difference in AHI1 expression for SNP rs9321501, which was strongly associated with schizophrenia in the original study. Genotypes that included the undertransmitted C allele (CC/AC) showed lower expression than the homozygous AA genotype (F=4.73, df=2,83; p=0.028). There was no significant difference in brain expression of C6orf217 between patients and controls and no genotypic effect. This study provides further evidence for involvement of AHI1 in susceptibility to schizophrenia.
Spontaneous chiral symmetry breaking in early molecular networks
Kafri R., Markovitch O. & Lancet D. (2010) Biology Direct. 5, 38. Abstract
Background: An important facet of early biological evolution is the selection of chiral enantiomers for molecules such as amino acids and sugars. The origin of this symmetry breaking is a long-standing question in molecular evolution. Previous models addressing this question include particular kinetic properties such as autocatalysis or negative cross catalysis.Results: We propose here a more general kinetic formalism for early enantioselection, based on our previously described Graded Autocatalysis Replication Domain (GARD) model for prebiotic evolution in molecular assemblies. This model is adapted here to the case of chiral molecules by applying symmetry constraints to mutual molecular recognition within the assembly. The ensuing dynamics shows spontaneous chiral symmetry breaking, with transitions towards stationary compositional states (composomes) enriched with one of the two enantiomers for some of the constituent molecule types. Furthermore, one or the other of the two antipodal compositional states of the assembly also shows time-dependent selection.Conclusion: It follows that chiral selection may be an emergent consequence of early catalytic molecular networks rather than a prerequisite for the initiation of primeval life processes. Elaborations of this model could help explain the prevalent chiral homogeneity in present-day living cells.Reviewers: This article was reviewed by Boris Rubinstein (nominated by Arcady Mushegian), Arcady Mushegian, Meir Lahav (nominated by Yitzhak Pilpel) and Sergei Maslov.
GeneCards Version 3: the human gene integrator.
Safran M., Dalah I., Alexander J. et_al. (2010) Database-The Journal Of Biological Databases And Curation. 2010, p. baq020 Abstract
GeneCards (www.genecards.org) is a comprehensive, authoritative compendium of annotative information about human genes, widely used for nearly 15 years. Its gene-centric content is automatically mined and integrated from over 80 digital sources, resulting in a web-based deep-linked card for each of >73,000 human gene entries, encompassing the following categories: protein coding, pseudogene, RNA gene, genetic locus, cluster and uncategorized. We now introduce GeneCards Version 3, featuring a speedy and sophisticated search engine and a revamped, technologically enabling infrastructure, catering to the expanding needs of biomedical researchers. A key focus is on gene-set analyses, which leverage GeneCards' unique wealth of combinatorial annotations. These include the GeneALaCart batch query facility, which tabulates user-selected annotations for multiple genes and GeneDecks, which identifies similar genes with shared annotations, and finds set-shared annotations by descriptor enrichment analysis. Such set-centric features address a host of applications, including microarray data analysis, cross-database annotation mapping and gene-disorder associations for drug targeting. We highlight the new Version 3 database architecture, its multi-faceted search engine, and its semi-automated quality assurance system. Data enhancements include an expanded visualization of gene expression patterns in normal and cancer tissues, an integrated alternative splicing pattern display, and augmented multi-source SNPs and pathways sections. GeneCards now provides direct links to gene-related research reagents such as antibodies, recombinant proteins, DNA clones and inhibitory RNAs and features gene-related drugs and compounds lists. We also portray the GeneCards Inferred Functionality Score annotation landscape tool for scoring a gene's functional information status. Finally, we delineate examples of applications and collaborations that have benefited from the GeneCards suite. Database URL: www.genecards.org.
Replication and darwinian selection define life's origin
Markovitch O., Inger A., Shenhav B. & Lancet D. (2010) Origins of Life and Evolution of Biospheres. 40, 4, p. 484-488 Abstract

2009

GeneDecks: Paralog hunting and gene-set distillation with genecards annotation
Stelzer G., Inger A., Olender T., Iny Stein S. T., Dalah I., Harel A., Safran M. & Lancet D. (2009) Omics-A Journal Of Integrative Biology. 13, 6, p. 477-487 Abstract
Sophisticated genomic navigation strongly benefits from a capacity to establish a similarity metric among genes. GeneDecks is a novel analysis tool that provides such a metric by highlighting shared descriptors between pairs of genes, based on the rich annotation within the GeneCards compendium of human genes. The current implementation addresses information about pathways, protein domains, Gene Ontology (GO) terms, mouse phenotypes, mRNA expression patterns, disorders, drug relationships, and sequence-based paralogy. GeneDecks has two modes: (1) Paralog Hunter, which seeks functional paralogs based on combinatorial similarity of attributes; and (2) Set Distiller, which ranks descriptors by their degree of sharing within a given gene set. GeneDecks enables the elucidation of unsuspected putative functional paralogs, and a refined scrutiny of various gene-sets (e.g., from high-throughput experiments) for discovering relevant biological patterns.
Mutations and lethality in simulated prebiotic networks
Inger A., Solomon A., Shenhav B., Olender T. & Lancet D. (2009) Journal of Molecular Evolution. 69, 5, p. 568-578 Abstract
The Graded Autocatalysis Replication Domain (GARD) model describes an origin of life scenario which involves non-covalent compositional assemblies, made of monomeric mutually catalytic molecules. GARD constitutes an alternative to informational biopolymers as a mechanism of primordial inheritance. In the present work, we examined the effect of mutations, one of the most fundamental mechanisms for evolution, in the context of the networks of mutual interaction within GARD prebiotic assemblies. We performed a systematic analysis analogous to single and double gene deletions within GARD. While most deletions have only a small effect on both growth rate and molecular composition of the assemblies, ~10% of the deletions caused lethality, or sometimes showed enhanced fitness. Analysis of 14 different network properties on 2,000 different GARD networks indicated that lethality usually takes place when the deleted node has a high molecular count, or when it is a catalyst for such node. A correlation was also found between lethality and node degree centrality, similar to what is seen in real biological networks. Addressing double knockout mutations, our results demonstrate the occurrence of both synthetic lethality and extragenic suppression within GARD networks, and convey an attempt to correlate synthetic lethality to network node-pair properties. The analyses presented help establish GARD as a workable alternative prebiotic scenario, suggesting that life may have begun with large molecular networks of low fidelity, that later underwent evolutionary compaction and fidelity augmentation.
Evidence for an interaction of schizophrenia susceptibility loci on chromosome 6q23.3 and 10q24.33-q26.13 in Arab Israeli families
Alkelai A., Kohn Y., Olender T., Sarner-Kanyas K., Rigbi A., Hamdan A., Ben-Asher E., Lancet D. & Lerer B. (2009) American Journal Of Medical Genetics Part B-Neuropsychiatric Genetics. 150, 7, p. 914-925 Abstract
A genome scan for schizophrenia related loci in Arab Israeli families by Lerer et al. [Lerer et al. (2003); Mol Psychiatry 8:488-498] detected significant evidence for linkage at chromosome 6q23. Subsequent fine mapping [Levi et al. (2005); Eur J Hum Genet 13:763-771], association [Amann-Zalcenstein et al. (2006); Eur J Hum Genet 14:1111-1119] and replication studies [Ingason et al. (2007); Eur J Hum Genet 15:988-991] identified AHI1 as a putative susceptibility gene. The same genome scan revealed suggestive evidence for a schizophrenia susceptibility locus in the 10q23-26 region. Genes at these two loci may act independently in the pathogenesis of the disease in our homogeneous sample of Arab Israeli families or may interact with each other and with other factors in a common biological pathway. The purpose of our current study was to test the hypothesis of genetic interaction between these two loci and to identify the type of interaction between them. The initial stage of our study focused on the 10q23-q26 region which has not been explored further in our sample. The second stage of the study included a test for possible genetic interaction between the 6q23.3 locus and the refined 10q24.33-q26.13 locus. A final candidate region of 19.9 Mb between markers D10S222 (105.3 Mb) and D10S587 (125.2 Mb) was found on chromosome 10 by non-parametric and parametric linkage analyses. These linkage findings are consistent with previous reports in the same chromosomal region. Two-locus multipoint linkage analysis under three complex disease inheritance models (heterogeneity, multiplicative, and additive models) yielded a best maximum LOD score of 7.45 under the multiplicative model suggesting overlapping function of the 6q23.3 and 10q24.33-q26.13 loci.
GIFtS: Annotation landscape analysis with GeneCards
Harel A., Inger A., Stelzer G., Strichman-Almashanu L., Dalah I., Safran M. & Lancet D. (2009) BMC Bioinformatics. 10, 348. Abstract
Background: Gene annotation is a pivotal component in computational genomics, encompassing prediction of gene function, expression analysis, and sequence scrutiny. Hence, quantitative measures of the annotation landscape constitute a pertinent bioinformatics tool. GeneCards^®is a gene-centric compendium of rich annotative information for over 50,000 human gene entries, building upon 68 data sources, including Gene Ontology (GO), pathways, interactions, phenotypes, publications and many more. Results: We present the GeneCards Inferred Functionality Score (GIFtS) which allows a quantitative assessment of a gene's annotation status, by exploiting the unique wealth and diversity of GeneCards information. The GIFtS tool, linked from the GeneCards home page, facilitates browsing the human genome by searching for the annotation level of a specified gene, retrieving a list of genes within a specified range of GIFtS value, obtaining random genes with a specific GIFtS value, and experimenting with the GIFtS weighting algorithm for a variety of annotation categories. The bimodal shape of the GIFtS distribution suggests a division of the human gene repertoire into two main groups: the high-GIFtS peak consists almost entirely of protein-coding genes; the low-GIFtS peak consists of genes from all of the categories. Cluster analysis of GIFtS annotation vectors provides the classification of gene groups by detailed positioning in the annotation arena. GIFtS also provide measures which enable the evaluation of the databases that serve as GeneCards sources. An inverse correlation is found (for GIFtS>25) between the number of genes annotated by each source, and the average GIFtS value of genes associated with that source. Three typical source prototypes are revealed by their GIFtS distribution: genome-wide sources, sources comprising mainly highly annotated genes, and sources comprising mainly poorly annotated genes. The degree of accumulated knowledge for a given gene measured by GIFtS was correlated (for GIFtS>30) with the number of publications for a gene, and with the seniority of this entry in the HGNC database. Conclusion: GIFtS can be a valuable tool for computational procedures which analyze lists of large set of genes resulting from wet-lab or computational research. GIFtS may also assist the scientific community with identification of groups of uncharacterized genes for diverse applications, such as delineation of novel functions and charting unexplored areas of the human genome.
Human olfaction: from genomic variation to phenotypic diversity
Hasin-Brumshtein Y., Lancet D. & Olender T. (2009) Trends in Genetics. 25, 4, p. 178-184 Abstract
The sense of smell is a complex molecular device, encompassing several hundred olfactory receptor proteins (ORs). These receptors, encoded by the largest human gene superfamily, integrate odorant signals into an accurate 'odor image' in the brain. Widespread phenotypic diversity in human olfaction is, in part, attributable to prevalent genetic variation in OR genes, owing to copy number variation, deletion alleles and deleterious single nucleotide polymorphisms. The development of new genomic tools, including next generation sequencing and CNV assays, provides opportunities to characterize the genetic variations of this system. The advent of large-scale functional screens of expressed ORs, combined with genetic association studies, has the potential to link variations in ORs to human chemosensory phenotypes. This promises to provide a genome-wide view of human olfaction, resulting in a deeper understanding of personalized odor coding, with the potential to decipher flavor and fragrance preferences.
Further evidence for association of the RGS2 gene with antipsychotic-induced parkinsonism: Protective role of a functional polymorphism in the 3-untranslated region
Greenbaum L., Smith R. C., Rigbi A. et_al. (2009) Pharmacogenomics Journal. 9, 2, p. 103-110 Abstract
RGS2 (regulator of G-protein signaling 2) modulates dopamine receptor signal transduction. Functional variants in the gene may influence susceptibility to extrapyramidal symptoms (EPS) induced by antipsychotic drugs. To further investigate our previous report of association of the RGS2 gene with susceptibility to antipsychotic-induced EPS, we performed a replication study. EPS were rated in 184 US patients with schizophrenia (115 African Americans, 69 Caucasian) treated for at least a month with typical antipsychotic drugs (n = 45), risperidone (n = 46), olanzapine (n = 50) or clozapine (n = 43). Six single nucleotide polymorphisms (SNPs) within or flanking RGS2 were genotyped (rs1933695, rs2179652, rs2746073, rs4606, rs1819741 and rs1152746). Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated by logistic regression. Our results indicate association of SNP rs4606 with antipsychotic-induced parkinsonism (AIP), as measured by the Simpson Angus scale, in the overall sample and in the African-American subsample, the G (minor) allele having a protective effect. ORs for AIP among rs4606 G-allele carriers were 0.23 (95% CI 0.10-0.54, P=0.001) in the overall sample, and 0.20 (0.07-0.57, P = 0.003) in the African-American subsample. In the previously studied Israeli sample the OR was 0.31 (0.11-0.84, P=0.02). We completely sequenced the RGS2 gene in nine patients with AIP and nine patients without, from the Israeli sample. No common coding polymorphisms or additional regulatory variants were revealed, suggesting that association of the rs4606 C/G polymorphism with AIP is biologically meaningful and not a consequence of linkage disequilibrium with another functional variant. Taken together, the findings of the current study support the association of RGS2 with AIP and focus on a possible protective effect of the minor G allele of SNP rs4606. This SNP is located in the 3-regulatory region of the gene, and is known to influence RGS2 mRNA levels and protein expression.
Common peptides shed light on evolution of Olfactory Receptors
Gottlieb A., Olender T., Lancet D. & Horn D. (2009) BMC Evolutionary Biology. 9, 1, 91. Abstract
Background. Olfactory Receptors (ORs) form the largest multigene family in vertebrates. Their evolution and their expansion in the vertebrate genomes was the subject of many studies. In this paper we apply a motif-based approach to this problem in order to uncover evolutionary characteristics. Results. We extract deterministic motifs from ORs belonging to ten species using the MEX (Motif Extraction) algorithm, thus defining Common Peptides (CPs) characteristic to ORs. We identify species-specific CPs and show that their relative abundance is high only in fish and frog, suggesting relevance to water-soluble odorants. We estimate the origins of CPs according to the tree of life and track the gains and losses of CPs through evolution. We identify major CP gain in tetrapods and major losses in reptiles. Although the number of human ORs is less than half of the number of ORs in other mammals, the fraction of lost CPs is only 11%. By examining the positions of CPs along the OR sequence, we find two regions that expanded only in tetrapods. Using CPs we are able to establish remote homology relations between ORs and non-OR GPCRs. Selecting CPs according to their evolutionary age, we bicluster ORs and CPs for each species. Clean biclustering emerges when using relatively novel CPs. Evolutionary age is used to track the history of CP acquisition in the collection of mammalian OR families within HORDE (Human Olfactory Receptor Data Explorer). Conclusion. The CP method provides a novel perspective that reveals interesting traits in the evolution of olfactory receptors. It is consistent with previous knowledge, and provides finer details. Using available phylogenetic trees, evolution can be rephrased in terms of CP origins.

2008

High-resolution copy-number variation map reflects human olfactory receptor diversity and evolution
Hasin Y., Olender T., Khen M. et_al. (2008) PLoS Genetics. 4, 11, e1000249. Abstract
Olfactory receptors (ORs), which are involved in odorant recognition, form the largest mammalian protein superfamily. The genomic content of OR genes is considerably reduced in humans, as reflected by the relatively small repertoire size and the high fraction (∼55%) of human pseudogenes. Since several recent low-resolution surveys suggested that OR genomic loci are frequently affected by copy-number variants (CNVs), we hypothesized that CNVs may play an important role in the evolution of the human olfactory repertoire. We used high-resolution oligonucleotide tiling microarrays to detect CNVs across 851 OR gene and pseudogene loci. Examining genomic DNA from 25 individuals with ancestry from three populations, we identified 93 OR gene loci and 151 pseudogene loci affected by CNVs, generating a mosaic of OR dosages across persons. Our data suggest that ∼50% of the CNVs involve more than one OR, with the largest CNV spanning 11 loci. In contrast to earlier reports, we observe that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases. Furthermore, our results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee. Interestingly, among the latter we observed an enrichment in CNV losses over gains, a finding potentially related to the known diminution of the human OR repertoire. Quantitative PCR experiments performed for 122 sampled ORs agreed well with the microarray results and uncovered 23 additional CNVs. Importantly, these experiments allowed us to uncover nine common deletion alleles that affect 15 OR genes and five pseudogenes. Comparison to the chimpanzee reference genome revealed that all of the deletion alleles are human derived, therefore indicating a profound effect of human-specific deletions on the individual OR gene content. Furthermore, these deletion alleles may be used in future genetic association studies of olfactory inter-individual differences.
Genome analysis of the platypus reveals unique signatures of evolution (Nature (2008) 453, (175-183))
Warren W. C., Hillier L. D. W., Marshall Graves J. A. et_al. (2008) Nature. 455, 7210, p. 256 Abstract
Update on the olfactory receptor (OR) gene superfamily.
Olender T., Lancet D. & Nebert D. W. (2008) Human Genomics. 3, 1, p. 87-97 Abstract
The olfactory receptor gene (OR) superfamily is the largest in the human genome. The superfamily contains 390 putatively functional genes and 465 pseudogenes arranged into 18 gene families and 300 subfamilies. Even members within the same subfamily are often located on different chromosomes. OR genes are located on all autosomes except chromosome 20, plus the X chromosome but not the Y chromosome. The gene:pseudogene ratio is lowest in human, higher in chimpanzee and highest in rat and mouse--most likely reflecting the greater need of olfaction for survival in the rodent than in the human. The OR genes undergo allelic exclusion, each sensory neurone expressing usually only one odourant receptor allele; the mechanism by which this phenomenon is regulated is not yet understood. The nomenclature system (based on evolutionary divergence of genes into families and subfamilies of the OR gene superfamily) has been designed similarly to that originally used for the CYP gene superfamily.
Why do young women smoke? V. Role of direct and interactive effects of nicotinic cholinergic receptor gene variation on neurocognitive function
Rigbi A., Kanyas K., Yakir A., Greenbaum L., Pollak Y., Ben-Asher E., Lancet D., Kertzman S. & Lerer B. (2008) Genes Brain And Behavior. 7, 2, p. 164-172 Abstract
Previous work suggests that young women who smoke cigarettes regularly, or did so in the past, manifest a neurocognitive profile that is characterized by small but significant impairments of response inhibition and attention. The present study sought to determine whether variation in nicotinic cholinergic receptor (nAchR) genes impacts upon cognitive function in these domains by overall or differential effects on the performance of current, former and non-smokers. The study sample consisted of 100 female college students, current or past smokers, and 144 who had never smoked. All performed a computerized neurocognitive test battery and were genotyped for 39 single nucleotide polymorphisms in 11 nAchR genes. The results, derived from linear or logistic regression, show significant direct and interactive relationships between single nucleotide polymorphisms and haplotypes in several nAchR genes and performance on the Matching Familiar Figures Test (MFFT) Stroop test, Continuous Performance Task (CPT) and Tower of London (TOL) test. Response inhibition (MFFT, Stroop, CPT Loading Phase, TOL) was associated with variants in CHRNA2, CHRNA4, CHRNA5, CHRNA7, CHRNA9, CHRNA10, CHRNB2 and CHRNB3. Selective attention (Stroop) was associated with CHRNA4, CHRNA5, CHRNA9 and CHRNB2. Sustained attention (CPT Boring Phase) was associated with CHRNA4, CHRNA5, CHRNA7, CHRNA10 and CHRNB3. Up to 37% of the variance among the smokers and up to 47% of the variance among the non-smokers on the test measures was explained. Differences between smokers and non-smokers in neurocognitive function, putatively implicated in susceptibility to nicotine dependence, may be modulated by variants in nAchR genes, with potential implications for prevention and treatment.
Genome-wide linkage scan, fine mapping, and haplotype analysis in a large, inbred, arab israeli pedigree suggest a schizophrenia susceptibility locus on chromosome 20p13
Teltsh O., Kanyas K., Karni O. et_al. (2008) American Journal Of Medical Genetics Part B-Neuropsychiatric Genetics. 147, 2, p. 209-215 Abstract
Linkage and association studies in schizophrenia have repeatedly drawn attention to several chromosomal regions and to genes within them. Conflicting patterns of association and the lack of a clear functional significance of the associated variants limit the interpretation of these results. The use of rare pedigrees, where genes with a major effect cause the disorder, has been proven beneficial in studies of other complex disorders. Our objective was to use this advantage by performing a genome wide linkage analysis for schizophrenia in a large, multiplex Israeli Arab pedigree. We genotyped 346 microsatellite markers in 24 pedigree members affected with schizophrenia spectrum disorders and 32 unaffected relatives. Two-point linkage analysis with SUPERLINK demonstrated a LOD score of 2.47 for D20S116 on chromosome 20p13 under an autosomal dominant mode of inheritance. Further fine mapping yielded a two-point LOD score of 2.56 for the adjacent marker D20S193 and narrowed down the linked region to 2-5 cM. A haplotype containing the markers D20S193, D20S889, and D20S116, 0.7 Mb in length, was found to be shared by most affected pedigree members. Genotyping of 43 SNPs in the interval supported these results with a multipoint LOD score of 2.7 around D20S193. We were also able to better define the boundaries of the shared haplotype which contains strong candidate genes for schizophrenia. Our study exemplifies the power of rare and unique pedigrees in drawing attention to novel regions for genetic studies of schizophrenia.
A multi-scaled approach to artificial life simulation with P Systems and Dissipative Particle Dynamics
Smaldon J., Blakes J., Krasnogor N. & Lancet D. (2008) GECCO'08. p. 249-256 Abstract
Compartmentalisation is thought to have been a crucial step in the origin of life. To help us bridge the gap between selfassembly processes behind the formation of bio-compartments and metabolic and information bearing processes we refer to DPD and P Systems Simulations. In this paper we outline a new software platform linking a high level abstract computational formalism (P Systems) with a molecular scale model (Dissipative Particle Dynamics) by linking the membranes which delimit the cellular regions within P Systems to self-assembled phospholipid based vesicles in DPD. We test the platform by modelling a passive transport process involving vesicles containing membrane inclusions similar to pore complexes such as α-hemolysin. In doing so, we illustrate the usefulness of the modelling approach and derive a more realistic parameter set for the P system through the dissipative particle dynamics simulation.

2007

Novel definition files for human GeneChips based on GeneAnnot
Ferrari F., Bortoluzzi S., Coppe A. et_al. (2007) BMC Bioinformatics. 8, 446. Abstract
Background: Improvements in genome sequence annotation revealed discrepancies in the original probeset/gene assignment in Affymetrix microarray and the existence of differences between annotations and effective alignments of probes and transcription products. In the current generation of Affymetrix human GeneChips, most probesets include probes matching transcripts from more than one gene and probes which do not match any transcribed sequence. Results: We developed a novel set of custom Chip Definition Files (CDF) and the corresponding Bioconductor libraries for Affymetrix human GeneChips, based on the information contained in the GeneAnnot database. GeneAnnot-based CDFs are composed of unique custom-probesets, including only probes matching a single gene. Conclusion: GeneAnnot-based custom CDFs solve the problem of a reliable reconstruction of expression levels and eliminate the existence of more than one probeset per gene, which often leads to discordant expression signals for the same transcript when gene differential expression is the focus of the analysis. GeneAnnot CDFs are freely distributed and fully compliant with Affymetrix standards and all available software for gene expression analysis. The CDF libraries are available from http://www.xlab.unimo.it/GA_CDF, along with supplementary information (CDF libraries, installation guidelines and R code, CDF statistics, and analysis results).
Genetic elucidation of human hyperosmia to isovaleric acid
Menashe I., Abaffy T., Hasin Y., Goshen S., Yahalom V., Luetje C. W. & Lancet D. (2007) PLoS Biology. 5, 11, p. 2462-2468 Abstract
The genetic basis of odorant-specific variations in human olfactory thresholds, and in particular of enhanced odorant sensitivity (hyperosmia), remains largely unknown. Olfactory receptor (OR) segregating pseudogenes, displaying both functional and nonfunctional alleles in humans, are excellent candidates to underlie these differences in olfactory sensitivity. To explore this hypothesis, we examined the association between olfactory detection threshold phenotypes of four odorants and segregating pseudogene genotypes of 43 ORs genome-wide. A strong association signal was observed between the single nucleotide polymorphism variants in OR11H7P and sensitivity to the odorant isovaleric acid. This association was largely due to the low frequency of homozygous pseudogenized genotype in individuals with specific hyperosmia to this odorant, implying a possible functional role of OR11H7P in isovaleric acid detection. This predicted receptor-ligand functional relationship was further verified using the Xenopus oocyte expression system, whereby the intact allele of OR11H7P exhibited a response to isovaleric acid. Notably, we also uncovered another mechanism affecting general olfactory acuity that manifested as a significant inter-odorant threshold concordance, resulting in an overrepresentation of individuals who were hyperosmic to several odorants. An involvement of polymorphisms in other downstream transduction genes is one possible explanation for this observation. Thus, human hyperosmia to isovaleric acid is a complex trait, contributed to by both receptor and other mechanisms in the olfactory signaling pathway.
Question 7: The first units of life were not simple cells
Norris V., Hunding A., Kepes F., Lancet D., Minsky A., Raine D., Root-Bernstein R. & Sriram K. (2007) Origins of Life and Evolution of Biospheres. 37, 4-5, p. 429-432 Abstract
Five common assumptions about the first cells are challenged by the pre-biotic ecology model and are replaced by the following propositions: firstly, early cells were more complex, more varied and had a greater diversity of constituents than modern cells; secondly, the complexity of a cell is not related to the number of genes it contains, indeed, modern bacteria are as complex as eukaryotes; thirdly, the unit of early life was an 'ecosystem' rather than a 'cell'; fourthly, the early cell needed no genes at all; fifthly, early life depended on non-covalent associations and on catalysts that were not confined to specific reactions. We present here the outlines of a theory that connects findings about modern bacteria with speculations about their origins.
Coevolution of compositional protocells and their environment
Shenhav B., Oz A. & Lancet D. (2007) Philosophical Transactions Of The Royal Society B-Biological Sciences. 362, 1486, p. 1813-1819 Abstract
The coevolution of environment and living organisms is well known in nature. Here, it is suggested that similar processes can take place before the onset of life, where protocellular entities, rather than full-fledged living systems, coevolve along with their surroundings. Specifically, it is suggested that the chemical composition of the environment may have governed the chemical repertoire generated within molecular assemblies, compositional protocells, while compounds generated within these protocells altered the chemical composition of the environment. We present an extension of the graded autocatalysis replication domain (GARD) model - the environment exchange polymer GARD (EE-GARD) model. In the new model, molecules, which are formed in a protocellular assembly, may be exported to the environment that surrounds the protocell. Computer simulations of the model using an infinite-sized environment showed that EE-GARD assemblies may assume several distinct quasi-stationary compositions (composomes), similar to the observations in previous variants of the GARD model. A statistical analysis suggested that the repertoire of composomes manifested by the assemblies is independent of time. In simulations with a finite environment, this was not the case. Composomes, which were frequent in the early stages of the simulation disappeared, while others emerged. The change in the frequencies of composomes was found to be correlated with changes induced on the environment by the assembly. The EE-GARD model is the first GARD model to portray a possible time evolution of the composomes repertoire.
Pharmacogenetics of glatiramer acetate therapy for multiple sclerosis reveals drug-response markers
Grossman I., Avidan N., Singer C. et_al. (2007) Pharmacogenetics and Genomics. 17, 8, p. 657-666 Abstract
Genetic-based optimization of treatment prescription is becoming a central research focus in the management of chronic diseases, such as multiple sclerosis, which incur a prolonged drug-regimen adjustment. This study was aimed to identify genetic markers that can predict response to glatiramer acetate (Copaxone) immunotherapy for relapsing multiple sclerosis. For this purpose, we genotyped fractional cohorts of two glatiramer acetate clinical trials for HLA-DRB1*1501 and 61 single nucleotide polymorphisms within a total of 27 candidate genes. Statistical analyses included single nucleotide polymorphism-by-single nucleotide polymorphism and haplotype tests of drug-by-genotype effects in drug-treated versus placebo-treated groups.We report the detection of a statistically significant association between glatiramer acetate response and a single nucleotide polymorphism in a T-cell receptor β (TRB@) variant replicated in the two independent cohorts (odds ratio=6.85). Findings in the Cathepsin S (CTSS) gene (P=0.049 corrected for all single nucleotide polymorphisms and definitions tested, odds ratio=11.59) in one of the cohorts indicate a possible association that needs to be further investigated. Additionally, we recorded nominally significant associations of response with five other genes, MBP, CD86, FAS, IL1R1 and IL12RB2, which are likely to be involved in glatiramer acetate's mode-of-action, both directly and indirectly. Each of these association signals in and of itself is consistent with the no-association null-hypothesis, but the number of detected associations is surprising vis-à-vis chance expectation. Moreover, the restriction of these associations to the glatiramer acetate-treated group, rather than the placebo group, clearly demonstrates drug-specific genetic effects. These findings provide additional progress toward development of pharmacogenetics- based personalized treatment for multiple sclerosis.
Association of the RGS2 gene with extrapyramidal symptoms induced by treatment with antipsychotic medication
Greenbaum L., Strous R. D., Kanyas K. et_al. (2007) Pharmacogenetics and Genomics. 17, 7, p. 519-528 Abstract
OBJECTIVES: To investigate the role of genes encoding regulators of G protein signaling in early therapeutic response to antipsychotic drugs and in susceptibility to drug-induced extrapyramidal symptoms. As regulators of G protein signaling and regulators of G protein signaling-like proteins play a pivotal role in dopamine receptor signaling, genetically based, functional variation could contribute to interindividual variability in therapeutic and adverse effects. METHODS: Consecutively hospitalized, psychotic patients with Diagnostic and Statistical Manual of Mental Disorder-IV schizophrenia (n=121) were included in the study if they received treatment with typical antipsychotic medication (n=72) or typical antipsychotic drugs and risperidone (n=49) for at least 2 weeks. Clinical state and adverse effects were rated at baseline and after 2 weeks. Twenty-four single nucleotide polymorphisms were genotyped in five regulators of G protein signaling genes. RESULTS: None of the single nucleotide polymorphisms were related to clinical response to antipsychotic treatment at 2 weeks. Five out of six single nucleotide polymorphisms within or flanking the RGS2 gene were nominally associated with development or worsening of parkinsonian symptoms (PARK+) as measured by the Simpson Angus Scale, one of them after correction for multiple testing (rs4606, P=0.002). A GCCTG haplotype encompassing tagging single nucleotide polymorphisms within and flanking RGS2 was significantly overrepresented among PARK+ compared with PARK- patients (0.23 vs. 0.08, P=0.003). A second, 'protective', GTGCA haplotype was significantly overrepresented in PARK- patients (0.13 vs. 0.30, P=0.009). Both haplotype associations survive correction for multiple testing. CONCLUSIONS: Subject to replication, these findings suggest that genetic variation in the RGS2 gene is associated with susceptibility to extrapyramidal symptoms induced by antipsychotic drugs.
Association of the dopamine receptor interacting protein gene, NEF3, with early response to antipsychotic medication
Strous R. D., Greenbaum L., Kanyas K. et_al. (2007) International Journal of Neuropsychopharmacology. 10, 3, p. 321-333 Abstract
Genetic variation in antipsychotic drug targets could underlie variability among patients in the time required for antipsychotic effects to be elicited. In a clinical, pharmacogenetic study we focused on the dopamine receptor interacting protein (DRIP) gene family. DRIPs are pivotally involved in regulating dopamine receptor signal transduction. Consecutively hospitalized, acutely psychotic patients with DSM-IV schizophrenia (n=121) were included in the study if they received treatment with typical antipsychotic medication (TYP, n=72) or TYP plus risperidone (TYP-R, n=49) for at least 2 wk. Clinical state and adverse effects were rated at baseline and after 2 wk. Patients improved significantly on both TYP and TYP-R with no significant difference between them. Early responders were defined as patients whose PANSS change scores were greater than the median. Twenty-two single nucleotide polymorphisms (SNPs) were analysed in five DRIP-encoding genes. Two SNPs in NEF3, which encodes the DRIP, neurofilament-medium (NF-M), were associated with early response (rs1457266, p=0.01; rs1379357, p=0.006). A 5 SNP haplotype spanning NEF3 was over-represented in early responders (p=0.015), in the combined patient group and in the TYP group alone. These findings suggest that variation in NEF3, most likely functional variants that are in linkage disequilibrium with the SNPs that we studied, influences rate of response to TYP. Since NEF3 is primarily associated with dopamine D₁ receptor function, the evidence for a complementary role of dopamine D₁ receptors in antipsychotic effects is considered. The findings reported here open an interesting research avenue in the pharmacogenetics of antipsychotic effects but require replication in larger samples treated in a controlled context.
Haplotype structure and selection of the MDM2 oncogene in humans
Atwal G. S., Bond G. L., Metsuyanim S. et_al. (2007) Proceedings of the National Academy of Sciences of the United States of America. 104, 11, p. 4524-4529 Abstract
The MDM2 protein is an ubiquitin ligase that plays a critical role in regulating the levels and activity of the p53 protein, which is a central tumor suppressor. A SNP in the human MDM2 gene (SNP309 T/G) occurs at frequencies dependent on demographic history and has been shown to have important differential effects on the activity of the MDM2 and p53 proteins and to associate with altered risk for the development of several cancers. In this report, the haplotype structure of the MDM2 gene is determined by using 14 different SNPs across the gene from three different population samples: Caucasians, African Americans, and the Ashkenazi Jewish ethnic group. The results presented in this report indicate that there is a substantially reduced variability of the deleterious SNP309 G allele haplotype in all three populations studied, whereas multiple common T allele haplotypes were found in all three populations. This observation, coupled with the relatively high frequency of the G allele haplotype in both and Caucasian and Ashkenazi Jewish population data sets, suggests that this haplotype could have undergone a recent positive selection sweep. An entropy-based selection test is presented that explicitly takes into account the correlations between different SNPs, and the analysis of MDM2 reveals a significant departure from the standard assumptions of selective neutrality.
Molecular Recognition in Biology: Models for Analysis of Protein-Ligand Interactions
Lancet D., Horovitz A. & Katchalski-Katzir E. (2007) The Lock-and-Key Principle, The State of the Art--100 Years On. Vol. 1. p. 25-71 Abstract
Erratum: AHI1, a pivotal neurodevelopment gene, and C6orf217 are associated with susceptibility to schizophrenia (European Journal of Human Genetics (2006) vol. 14 (1111-1119) 10.1038/sj.ejhg.5201675)
Amann-Zalcenstein D., Avidan N., Kanyas K. et_al. (2007) European Journal of Human Genetics. 15, 3, p. 387 Abstract
Erratum: Loss of olfactory receptor genes coincides with the acquisition of full trichromatic vision in primates (PLoS Biology (2007) 2, 1, DOI: 10.1371/journal.pbio.0020005)
Gilad Y., Wiebe V., Przeworski M., Lancet D. & Pääbo S. (2007) PLoS Biology. 5, 6, p. 1383 Abstract
Search for hand osteoarthritis susceptibility locus on chromosome 6p12.3-p12.1
Jakowlev K., Livshits G., Kalichman L., Ben-Asher E., Malkin I., Lancet D. & Kobyliansky E. (2007) Human Biology. 79, 1, p. 1-14 Abstract
The existence of osteoarthritis susceptibility loci on chromosome 6 for individuals suffering from hip and knee osteoarthritis has been suggested. We determined whether radiographic hand osteoarthritis in a demographically homogeneous population of European origin can be linked to loci on chromosome 6p12.3-p12.1. Nine single nucleotide polymorphisms (SNPs) were genotyped in 764 individuals (members of 189 nuclear and more complex two- or three-generation families). Radiographic hand osteoarthritis was characterized by two traits: (1) the total individual osteoarthritis score (PC1-OA) and (2) the osteophytes score (PC1-OS), obtained from the principal components analysis of sums of the Kellgren and Lawrence grade and of the osteophyte grades, respectively, for 14 joints on each hand. The contribution of genetic and environmental factors and of covariates such as age and body mass index to hand osteoarthritis was evaluated by variance components analysis. The association between the studied traits and selected DNA markers was evaluated by three types of transmission disequilibrium tests. The parent-offspring and sib-sib correlations were statistically significant for all studied traits. The additive genetic effects for PC1-OA and PC1-OS were estimated to be 43% and 37.9%, respectively. Transmission disequilibrium tests consistently revealed a statistically significant association (p values ranged from 0.017 to 0.030) between SNP rs1508632 and PC1-OS. In the tested cohort the putative genetic factors are influential enough to determine interindividual differences regarding the extent of hand osteoarthritis. SNP rs1508632 lies in immediate proximity to the TINAG gene, implicating it as a possible hand osteoarthritis susceptibility gene.
Mutations in olfactory signal transduction genes are not a major cause of human congenital general anosmia
Feldmesser E., Bercovich D., Avidan N., Halbertal S., Haim L., Gross-Isseroff R., Goshen S. & Lancet D. (2007) Chemical Senses. 32, 1, p. 21-30 Abstract
Anosmia affects the western world population, mostly the elderly, reaching to 5% in subjects over the age of 45 years and strongly lowering their quality of life. A smaller minority (about 0.01%) is born without a sense of smell, afflicted with congenital general anosmia (CGA). No causative genes for human CGA have been identified yet, except for some syndromic cases such as Kallman syndrome. In mice, however, deletion of any of the 3 main olfactory transduction components (guanidine triphosphate binding protein, adenylyl cyclase, and the cyclic adenosine monophosphate-gated channel) causes profound reduction of physiological responses to odorants. In an attempt to identify human CGA-related mutations, we performed whole-genome linkage analysis in affected families, but no significant linkage signals were observed, probably due to the small size of families analyzed. We further carried out direct mutation screening in the 3 main olfactory transduction genes in 64 unrelated anosmic individuals. No potentially causative mutations were identified, indicating that transduction gene variations underlie human CGA rarely and that mutations in other genes have to be identified. The screened genes were found to be under purifying selection, suggesting that they play a crucial functional role not only in olfaction but also potentially in additional pathways.

2006

The association of DNA sequence variation at the MAOA genetic locus with quantitative behavioural traits in normal males
Rosenberg S., Templeton A. R., Feigin P. D., Lancet D., Beckmann J. S., Selig S., Hamer D. H. & Skorecki K. (2006) Human Genetics. 120, 4, p. 447-459 Abstract
Monoamine oxidase A (MAOA) catalyses the oxidative deamination of biogenic amines including neurotransmitters, mainly norepinephrine and serotonin in the brain and peripheral tissues. A nonsense mutation in the gene was shown to be involved in a rare X-linked behavioural syndrome, which includes impaired impulse control, aggression and borderline mental retardation (Brunner syndrome). Several recent studies have shown the association of genetic variation of a VNTR in the gene promoter with various pathological behavioural traits. In the present study the association of MAOA genetic variation with a large set of quantitative behavioural traits in normal individuals has been examined. DNA samples from 421 unrelated males were genotyped for 14 SNPs and for the promoter VNTR at the MAOA locus. An additional 16 SNPs were genotyped at apparently neutral loci across the X chromosome to serve as a genomic control for possible false positive associations due to population structure. Behavioural traits were measured using the NEO psychometric questionnaire, which is based on a 5-axis model of personality, and consists of 30 different quantitative traits. There was a robust association of the A2 ("straightforwardness") facet with common allelic variants at the promoter VNTR. Most of the tested traits were not associated with the VNTR despite reasonable power, thus demonstrating that the VNTR influence on quantitative behavioural traits in normal males may be very specific. In contrast, several traits of the C ("conscientiousness") axis were associated with less common SNP-defined haplotypes. Hence, it appears that common genetic variation at the VNTR contributes to the behavioural attribute of "straightforwardness", while rare haplotypes defined by SNPs downstream of the transcription start site may contribute to "conscientiousness". This study is used to address the validation, interpretation and limitation of genetic association studies of quantitative behavioural traits.
Expoldb: Expression linked polymorphism database with inbuilt tools for analysis of expression and simple repeats
Sharma V. K., Sharma A., Kumar N. et_al. (2006) BMC Genomics. 7, 258. Abstract
Background: Quantitative variation in gene expression has been proposed to underlie phenotypic variation among human individuals. A facilitating step towards understanding the basis for gene expression variability is associating genome wide transcription patterns with potential cis modifiers of gene expression. Description: EXPOLDB, a novel Database, is a new effort addressing this need by providing information on gene expression levels variability across individuals, as well as the presence and features of potentially polymorphic (TG/CA)_n repeats. EXPOLDB thus enables associating transcription levels with the presence and length of (TG/CA)_n repeats. One of the unique features of this database is the display of expression data for 5 pairs of monozygotic twins, which allows identification of genes whose variability in expression, are influenced by non-genetic factors including environment. In addition to queries by gene name, EXPOLDB allows for queries by a pathway name. Users can also upload their list of HGNC (HUGO (The Human Genome Organisation) Gene Nomenclature Committee) symbols for interrogating expression patterns. The online application 'SimRep' can be used to find simple repeats in a given nucleotide sequence. To help illustrate primary applications, case examples of Housekeeping genes and the RUNX gene family, as well as one example of glycolytic pathway genes are provided. Conclusion: The uniqueness of EXPOLDB is in facilitating the association of genome wide transcription variations with the presence and type of polymorphic repeats while offering the feature for identifying genes whose expression variability are influenced by non genetic factors including environment. In addition, the database allows comprehensive querying including functional information on biochemical pathways of the human genes.
AHI1, a pivotal neurodevelopmental gene, and C6orf217 are associated with susceptibility to schizophrenia
Amann Zalcenstein D., Avidan N., Kanyas K. et_al. (2006) European Journal of Human Genetics. 14, 10, p. 1111-1119 Abstract
Schizophrenia, a severe neuropsychiatric disorder, is believed to involve multiple genetic factors. A significant body of evidence supports a pivotal role for abnormalities of brain development in the disorder. Linkage signals for schizophrenia map to human chromosome 6q. To obtain a finer localization, we genotyped 180 single nucleotide polymorphisms (SNPs) in a young, inbred Arab-Israeli family sample with a limited number of founders. The SNPs were mostly within a ∼7Mb region around the strong linkage peak at 136.2Mb that we had previously mapped. The most significant genetic association with schizophrenia for single SNPs and haplotypes was within a 500kb genomic region of high linkage disequilibrium (LD) at 135.85Mb. In a different, outbred, nuclear family sample that was not appropriate for linkage analysis, under-transmitted haplotypes incorporating the same SNPs (but not the individual SNPs) were significantly associated with schizophrenia. The implicated genomic region harbors the Abelson Helper Integration Site 1 (AHI1) gene, which showed the strongest association signal, and an adjacent, primate-specific gene, C6orf217. Mutations in human AHI1 underlie the autosomal recessive Joubert Syndrome with brain malformation and mental retardation. Previous comparative genomic analysis has suggested accelerated evolution of AHI1 in the human lineage. C6orf217 has multiple splice isoforms and is expressed in brain but does not seem to encode a functional protein. The two genes appear in opposite orientations and their regulatory upstream regions overlap, which might affect their expression. Both, AHI1 and C6orf217 appear to be highly relevant candidate genes for schizophrenia.
Ancient genomic architecture for mammalian olfactory receptor clusters
Aloni R., Olender T. & Lancet D. (2006) GENOME BIOLOGY. 7, 10, R88. Abstract
Background: Mammalian olfactory receptor (OR) genes reside in numerous genomic clusters of up to several dozen genes. Whole-genome sequence alignment nets of five mammals allow their comprehensive comparison, aimed at reconstructing the ancestral olfactory subgenome. Results: We developed a new and general tool for genome-wide definition of genomic gene clusters conserved in multiple species. Syntenic orthologs, defined as gene pairs showing conservation of both genomic location and coding sequence, were subjected to a graph theory algorithm for discovering CLICs (clusters in conservation). When applied to ORs in five mammals, including the marsupial opossum, more than 90% of the OR genes were found within a framework of 48 multi-species CLICs, invoking a general conservation of gene order and composition. A detailed analysis of individual CLICs revealed multiple differences among species, interpretable through species-specific genomic rearrangements and reflecting complex mammalian evolutionary dynamics. One significant instance involves CLIC #1, which lacks a human member, implying the human-specific deletion of an OR cluster, whose mouse counterpart has been tentatively associated with isovaleric acid odorant detection. Conclusion: The identified multi-species CLICs demonstrate that most of the mammalian OR clusters have a common ancestry, preceding the split between marsupials and placental mammals. However, only two of these CLICs were capable of incorporating chicken OR genes, parsimoniously implying that all other CLICs emerged subsequent to the avian-mammalian divergence.
A probabilistic classifier for olfactory receptor pseudogenes
Menashe I., Aloni R. & Lancet D. (2006) BMC Bioinformatics. 7, 393. Abstract
Background: Olfactory receptors (ORs), the largest mammalian gene superfamily (900-1400 genes), has >50% pseudogenes in humans. While most of these inactive genes are identified via coding frame (nonsense) disruptions, seemingly intact genes may also be inactive due to other deleterious (missense) mutations. An ultimate assessment of the actual size of the functional human OR repertoire thus requires an accurate distinction between genes and pseudogenes. Results: To characterize inactive ORs with intact open reading frame, we have developed a probabilistic Classifier for Olfactory Receptor Pseudogenes (CORP). This algorithm is based on deviations from a functionally crucial consensus, constituting sixty, highly conserved positions identified by a comparison of two evolutionarily-constrained OR repertoires (mouse and dog) with a small pseudogene fraction. We used a logistic regression analysis to assign appropriate coefficients to the conserved position and thus achieving maximal separation between active and inactive ORs. Consequently, the algorithms identified only 5% of the mouse functional ORs as pseudogenes, setting an upper limit of 0.05 to the false positive detection. Finally we used this algorithm to classify the 384 purportedly intact human OR genes. Of these, 135 were predicted as likely encoding non-functional proteins, and 38 were segregating between active and inactive forms due to missense polymorphisms. Conclusion: We demonstrated that the CORP algorithm is capable to distinguish between functional and non-functional OR genes with high precision even when the encoded protein would differ by a single amino acid. Using the CORP algorithm, we predict that ∼70% of human OR genes are likely non-functional pseudogenes, a much higher number than hitherto suspected. The method we present may be employed for better annotation of inactive members in other gene families as well.
Variations in the human olfactory receptor pathway
Menashe I. & Lancet D. (2006) Cellular and Molecular Life Sciences. 63, 13, p. 1485-1493 Abstract
Of all five senses, olfaction is the most complex molecular mechanism, as it comprises hundreds of receptor proteins enabling it to detect and discriminate thousands of odorants. Until lately, the understanding of this highly sophisticated sensory neuronal pathway has been rather sketchy. The sequencing of the human genome and the consequent advent of new genomic tools have opened new opportunities to better understand this multifaceted biological system. Here, we present the relevant progresses made in the last decade and highlight the possible genetic mechanisms of human olfactory variability.
ATM haplotypes and breast cancer risk in Jewish high-risk women
Koren M., Kimmel G., Ben-Asher E., Gal I., Papa M. Z., Beckmann J. S., Lancet D., Shamir R. & Friedman E. (2006) British Journal of Cancer. 94, 10, p. 1537-1543 Abstract
While genetic factors clearly play a role in conferring breast cancer risk, the contribution of ATM gene mutations to breast cancer is still unsettled. To shed light on this issue, ATM haplotypes were constructed using eight SNPs spanning the ATM gene region (142 kb) in ethnically diverse non-Ashkenazi Jewish controls (n = 118) and high-risk (n = 142) women. Of the 28 haplotypes noted, four were encountered in frequencies of 5% or more and accounted for 85% of all haplotypes. Subsequently, ATM haplotyping of high-risk, non-Ashkenazi Jews was performed on 66 women with breast cancer and 76 asymptomatic. One SNP (rs228589) was significantly more prevalent among breast cancer cases compared with controls (P = 4 × 10^-9), and one discriminative ATM haplotype was significantly more prevalent among breast cancer cases (33.3%) compared with controls (3.8%), (P ≤ 10^-10). There was no significant difference in the SNP and haplotype distribution between asymptomatic high-risk and symptomatic women as a function of disease status. We conclude that a specific ATM SNP and a specific haplotype are associated with increased breast cancer risk in high-risk non-Ashkenazi Jews.
Widespread ectopic expression of olfactory receptor genes
Feldmesser E., Olender T., Khen M., Yanai I., Ophir R. & Lancet D. (2006) BMC Genomics. 7, 121. Abstract
Background: Olfactory receptors (ORs) are the largest gene family in the human genome. Although they are expected to be expressed specifically in olfactory tissues, some ectopic expression has been reported, with special emphasis on sperm and testis. The present study systematically explores the expression patterns of OR genes in a large number of tissues and assesses the potential functional implication of such ectopic expression. Results: We analyzed the expression of hundreds of human and mouse OR transcripts, via EST and microarray data, in several dozens of human and mouse tissues. Different tissues had specific, relatively small OR gene subsets which had particularly high expression levels. In testis, average expression was not particularly high, and very few highly expressed genes were found, none corresponding to ORs previously implicated in sperm chemotaxis. Higher expression levels were more common for genes with a non-OR genomic neighbor. Importantly, no correlation in expression levels was detected for human-mouse orthologous pairs. Also, no significant difference in expression levels was seen between intact and pseudogenized ORs, except for the pseudogenes of subfamily 7E which has undergone a human-specific expansion. Conclusion: The OR superfamily as a whole, show widespread, locus-dependent and heterogeneous expression, in agreement with a neutral or near neutral evolutionary model for transcription control. These results cannot reject the possibility that small OR subsets might play functional roles in different tissues, however considerable care should be exerted when offering a functional interpretation for ectopic OR expression based only on transcription information.
Compositional complementarity and prebiotic ecology in the origin of life
Hunding A., Kepes F., Lancet D., Minsky A., Norris V., Raine D., Sriram K. & Root-Bernstein R. (2006) BioEssays. 28, 4, p. 399-412 Abstract
We hypothesize that life began not with the first self-reproducing molecule or metabolic network, but as a prebiotic ecology of co-evolving populations of macromolecular aggregates (composomes). Each composome species had a particular molecular composition resulting from molecular complementarity among environmentally available prebiotic compounds. Natural selection acted on composomal species that varied in properties and functions such as stability, catalysis, fission, fusion and selective accumulation of molecules from solution. Fission permitted molecular replication based on composition rather than linear structure, while fusion created composomal variability. Catalytic functions provided additional chemical novelty resulting eventually in autocatalytic and mutually catalytic networks within composomal species. Composomal autocatalysis and interdependence allowed the Darwinian co-evolution of content and control (metabolism). The existence of chemical interfaces within complex composomes created linear templates upon which self-reproducing molecules (such as RNA) could be synthesized, permitting the evolution of informational replication by molecular templating. Mathematical and experimental tests are proposed.
Why do young women smoke? I. Direct and interactive effects of environment, psychological characteristics and nicotinic cholinergic receptor genes
Greenbaum L., Kanyas K., Karni O. et_al. (2006) Molecular Psychiatry. 11, 3, p. 312-322 Abstract
Despite the health hazards, cigarette smoking is disproportionately frequent among young women. A significant contribution of genetic factors to smoking phenotypes is well established. Efforts to identify susceptibility genes do not generally take into account possible interaction with environment, life experience and psychological characteristics. We recruited 501 female Israeli students aged 20-30 years, obtained comprehensive background data and details of cigarette smoking and administered a battery of psychological instruments. Smoking initiators (n = 242) were divided into subgroups with high (n = 127) and low (n = 115) levels of nicotine dependence based on their scores on the Fagerstrom Tolerance Questionnaire and genotyped with noninitiators (n = 142) for single nucleotide polymorphisms (SNPs) in 11 nicotinic cholinergic receptor genes. We found nominally significant (P -14, Nagelkerke r² = 0.30). For severity of nicotine dependence, two SNPs in CHRNA7 (rs1909884 and rs883473), one SNP in CHRNA5 (rs680244) and the interaction of a SNP in CHRNA7 (rs2337980) with neuroticism, were included in the model (P = 2.24 × 10^-7, Nagelkerke r² = 0.40). These findings indicate that background factors, psychological characteristics and genetic variation in nicotinic cholinergic receptors contribute independently or interactively to smoking initiation and to severity of nicotine dependence in young women.
The trace amine receptor 4 gene is not associated with schizophrenia in a sample linked to chromosome 6q23 [3]
Amann D., Avidan N., Kanyas K. et_al. (2006) Molecular Psychiatry. 11, 2, p. 119-121 Abstract
Genetic basis of olfactory deficits
Menashe I., Feldmesser E. & Lancet D. (2006) Genomic Disorders. R. Lupski MD, PhD J. & Stankiewicz MD, PhD P.(eds.). p. 101-113 Abstract
The completion of the human genome sequencing has opened new opportunities to better understand complex biological systems. In this realm, the human sense of smell is an excellent example of how genome analysis provides new information on genome organization and on deficits. Before the advent of genomic tools, the understanding of this highly sophisticated sensory neuronal pathway has been rather sketchy. In this chapter we summarize the relevant progress made in the last decade, and highlight the initial elucidation of two classes of olfactory deficits and their possible underlying genetic mechanisms.

2005

Assessing natural variations in gene expression in humans by comparing with monozygotic twins using microarrays
Sharma A., Sharma V., Horn-Saban S., Lancet D., Ramachandran S. & Brahmachari S. (2005) Physiological Genomics. 21, p. 117-123 Abstract
Quantitative variation in gene expression in humans is the outcome of various factors, including differences in genetic background, gender, age, and environment. However, the extent of the influence of these factors on gene expression is not clear. We attempted to address this issue by carrying out gene expression profiling in blood leukocytes with 13 individuals (including 5 pairs of monozygotic twins) on 10,000 genes using HG-U95Av2 oligonucleotide microarrays. The proportion of differentially expressed genes between monozygotic twins was low (up to 1.76%). Most of the variations belonged to the least variable category. These genes, exhibiting "random variations," did not show clear preference to any functional class, although "signaling and communication" and "immune and related functions" generally topped the list. The extent of variation in gene expression increased in comparisons between unrelated individuals (up to 14.13%). Most of the genes (89%) exhibiting random variations in twins also varied in expression in unrelated individuals. As with twins, signaling and communication topped the list, and substantial variations were observed in all three categories: least variable, moderately variable, and most variable. An important outcome of this study was that the housekeeping genes were nearly insensitive to random variations but appeared to be more susceptible to genetic differences. However, the highly expressed housekeeping genes exhibited low variation and appeared to be insensitive to all known factors. Gene expression profiling in monozygotic twins can provide useful data for the assessment of natural variation in gene expression in humans.
Alternative splicing and gene duplication are inversely correlated evolutionary mechanisms
Kopelman N., Lancet D. & Yanai I. (2005) Nature Genetics. 37, 6, p. 588-589 Abstract
Gene duplication and alternative splicing are distinct evolutionary mechanisms that provide the raw material for new biological functions. We explored their relationships in human and mouse and found an inverse correlation between the size of a gene's family and its use of alternatively spliced isoforms. A cross-organism analysis suggests that selection for genome-wide genic proliferation might be interchangeably met by either evolutionary mechanism.
LEMD3: The gene responsible for bone density disorders (Osteopoikilosis)
Ben-Asher E., Zelzer E. & Lancet D. (2005) Israel Medical Association Journal. 7, 4, p. 273-274 Abstract
Polymer gard: Computer simulation of covalent bond formation in reproducing molecular assemblies
Shenhav B., Bar-Even A., Kafri R. & Lancet D. (2005) Origins of Life and Evolution of the Biosphere. 35, 2, p. 111-133 Abstract
The basic Graded Autocatalysis Replication Domain (GARD) model consists of a repertoire of small molecules, typically amphiphiles, which join and leave a non-covalent micelle-like assembly. Its replication behavior is due to occasional fission, followed by a homeostatic growth process governed by the assembly' s composition. Limitations of the basic GARD model are its small finite molecular repertoire and the lack of a clear path from a 'monomer world' towards polymer-based living entities.We have now devised an extension of the model (polymer GARD or P-GARD), where a monomer-based GARD serves as a 'scaffold' for oligomer formation, as a result of internal chemical rules. We tested this concept with computer simulations of a simple case of monovalent monomers, whereby more complex molecules (dimers) are formed internally, in a manner resembling biosynthetic metabolism. We have observed events of dimer 'take-over' - the formation of compositionally stable, replication-prone quasi stationary states (composomes) that have appreciable dimer content. The appearance of novel metabolism-like networks obeys a time-dependent power law, reminiscent of evolution under punctuated equilibrium. A simulation under constant population conditions shows the dynamics of takeover and extinction of different composomes, leading to the generation of different population distributions. The P-GARD model offers a scenario whereby biopolymer formation may be a result of rather than a prerequisite for early life-like processes.
Modular genes with metazoan-specific domains have increased tissue specificity
Cohen-Gihon I., Lancet D. & Yanai I. (2005) Trends in Genetics. 21, 4, p. 210-213 Abstract
We have systematically examined the domain composition across a comprehensive set of tissue-specific, midrange and housekeeping genes as defined by their mode of expression in 52 normal mouse tissues. We show a definite correlation between the number of domains and the degree of tissue specificity. This trend is further supported by a novel analysis involving the time of origin of each domain. Genes containing metazoan-specific domains are more prevalent in signal transduction and cell-communication pathways, and are depleted in primary metabolism. Our analyses suggest that highly modular gene products have been recruited for tissue-specific functions that are required in complex organisms.
Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification
Yanai I., Benjamin H., Shmoish M. et_al. (2005) Bioinformatics. 21, 5, p. 650-659 Abstract
Motivation: Genes are often characterized dichotomously as either housekeeping or single-tissue specific. We conjectured that crucial functional information resides in genes with midrange profiles of expression. Results: To obtain such novel information genome-wide, we have determined the mRNA expression levels for one of the largest hitherto analyzed set of 62839 probesets in 12 representative normal human tissues. Indeed, when using a newly defined graded tissue specificity index τ, valued between 0 for housekeeping genes and 1 for tissue-specific genes, genes with midrange profiles having 0.15 50% of all expression patterns. We developed a binary classification, indicating for every gene the l_B tissues in which it is overly expressed, and the 12 - l_B tissues in which it shows low expression. The 85 dominant midrange patterns with l_B = 2-11 were found to be bimodally distributed, and to contribute most significantly to the definition of tissue specification dendrograms. Our analyses provide a novel route to infer expression profiles for presumed ancestral nodes in the tissue dendrogram. Such definition has uncovered an unsuspected correlation, whereby de novo enhancement and diminution of gene expression go hand in hand. These findings highlight the importance of gene suppression events, with implications to the course of tissue specification in ontogeny and phylogeny.
Genotype phenotype correlations in Israeli colorectal cancer patients
Starinsky S., Figer A., Ben-Asher E., Geva R., Flex D., Fidder H., Zidan J., Lancet D. & Friedman E. (2005) International Journal of Cancer. 114, 1, p. 58-73 Abstract
While genetic factors clearly play a key role in colorectal cancer (CRC) pathogenesis and in determining its phenotypic features, the precise genes that involved are largely unknown. To gain insight into these genes, consecutive Israeli CRC patients were genotyped using SNPs from within candidate genes: APC, β-Catenin, K-RAS, DCC, P16, PTEN, RB1, P15, APOE, ERCC2, P53, MTHFR and hMSH2. Genotyping of consecutive, unselected colorectal cancer patients was done mostly by utilizing the MassARRAY technology (Sequenom) and to a lesser extent DGGE, ARMS and direct DNA sequencing. Correlation of genotypes with specific phenotypic features was carried out for all patients and separately for the Ashkenazim. Overall, 456 patients were analyzed, the majority (64.25%) being of Ashkenazi origin; mean age at diagnosis was 65.6 ± 14 (range 25-90 years), and the mean follow-up was 4.7 ± 0.28 (range 0-30 years). Statistically significant associations were noted between SNPs in β-catenin and APOE and a positive family history of cancer (β-catenin: p=0.034, APOE: p=0.033); tumor location and a DCC SNP (p=0.038) and the P53 R72P mutation and survival (p=0.0336). In Ashkenazi patients, ERCC2 and MTHFR genes' SNPs were associated with age at diagnosis (ERCC2: p=0.025, MTHFR: p=0.0005); a P53 polymorphism, APOE and Rb SNPs with a family history of cancer (P53 p=0.034; APOE p=0.04, Rb p=0.022); DCC SNP with tumor location (p=0.014); and p15 SNP with tumor grade (p=0.032). This preliminary study shows that genetic factors play a role in determining CRC phenotypic features and that a larger cohort with longer follow-up is clearly needed.
GeneTide - Terra Incognita Discovery Endeavor: A new transcriptome focused member of the GeneCards/GeneNote suite of databases
Shklar M., Strichman-Almashanu L., Shmueli O., Shmoish M., Safran M. & Lancet D. (2005) Nucleic Acids Research. 33, DATABASE ISS., p. D556-D561 Abstract
GeneCards® is an automatically mined database of human genes that strives to create, along with its auxiliary databases - GeneLoc, GeneNote and GeneAnnot - the most inclusive resource of gene-centered information of the human genome. GeneTide, the Gene Terra Incognita Discovery Endeavor (http://genecards.weizmann.ac.il/genetide/), the newest addition to this family, is a transcriptome-focused database which aims to enhance GeneCards with additional expressed sequence tag (EST)-based genes. This is achieved by comprehensively mapping >85% of the ∼5.6 million human ESTs currently available at dbEST to known genes by means of data mining and integration of genomic resources including UniGene, DoTS, AceView and in-house resources. GeneTide thus creates comprehensive links between ESTs and GeneCards genes. Furthermore, groups of unassociated transcripts serve as a basis for defining novel EST-based GeneCards Candidates (EGCs). These EGCs, nearly 25 000 of which were defined in version 0.3 of GeneTide, are further annotated with various parameters, including splicing evidence and expression data extracted from the GeneNote database, to determine their validity as possible de novo genes.
Early systems biology and prebiotic networks
Shenhav B., Solomon A., Lancet D. & Kafri R. (2005) Transactions On Computational Systems Biology I. p. 14-27 Abstract
Systems Biology constitutes tools and approaches aimed at deciphering complex biological entities. It is assumed that such complexity arose gradually, beginning from a few relatively simple molecules at life's inception, and culminating with the emergence of composite multicellular organisms billions of years later. The main point of the present paper is that very early in the evolution of life, molecular ensembles with high complexity may have arisen, which are best described and analyzed by the tools of Systems Biology. We show that modeled prebiotic mutually catalytic pathways have network attributes similar to those of present-day living cells. This includes network motifs and robustness attributes. We point out that early networks are weighted (graded), but that using a cutoff formalism one may probe their degree distribution and show that it approximate that of a random network. A question is then posed regarding the potential evolutionary mechanisms that may have led to the emergence of scale-free networks in modem cells.
Conservation anchors in the vertebrate genome
Aloni R. & Lancet D. (2005) GENOME BIOLOGY. 6, 7, 115. Abstract
Genomic segments that do not code for proteins yet show high conservation among vertebrates have recently been identified by various computational methodologies. We refer to them as ANCORs (ancestral non-coding conserved regions). The frequency of individual ANCORs within the genome, along with their (correlated) interspecies identity scores, helps in assessing the probability that they function in transcription regulation or RNA coding.

2004

NIPBL gene responsible for Cornelia de Lange syndrome, a severe developmental disorder
Ben-Asher E. & Lancet D. (2004) Israel Medical Association Journal. 6, 9, p. 571-572 Abstract
Genomic profiling of interpopulation diversity guides prioritization of candidate-genes for autoimmunity
Grossman I., Avidan N., Singer C., Paperna T., Lancet D., Beckmann J. & Miller A. (2004) Genes and Immunity. 5, 6, p. 493-504 Abstract
Autoimmune diseases seem to have strong genetic attributes, and are affected to some extent by shared susceptibility loci. The latter potentially amount to hundreds of candidate genes (CG), creating the need for a prioritization strategy in genetic association studies. To form such a strategy, 26 autoimmune-related CG were genotyped for a total of 72 single nucleotide polymorphisms (SNPs) in three distinct Israeli ethnic populations: Ashkenazi Jews, Sephardic Jews and Arabs. Four quantitative criteria reflecting population stratification were analyzed: allele frequencies, haplotype frequencies, the F_st statistic for homozygotes distribution and linkage disequilibrium extents. According to the consequent interpopulation genomic diversity profiles, the genes were classified into conserved, intermediate and diversified gene groups. Our results demonstrate a correlation between the biological role of autoimmune-related CG and their interpopulation diversity profiles as classified by the different analyses. Annotation analysis suggests that genes more readily influenced by environmental conditions, such as immunological mediators, are 'population specific'. Conversely, genes showing genetic conservation across all populations are characterized by apoptotic and cleaving functions. We suggest a research strategy by which CG association studies should focus first on likely conserved gene categories, to increase the likelihood of attaining significant results and promote the development of gene-based therapies.
Is the G72/G30 locus associated with schizophrenia? Single nucleotide polymorphisms, haplotypes, and gene expression analysis
Korostishevsky M., Kaganovich M., Cholostoy A. et_al. (2004) Biological Psychiatry. 56, 3, p. 169-176 Abstract
Background The genes G72/G30 were recently implicated in schizophrenia in both Canadian and Russian populations. We hypothesized that 1) polymorphic changes in this gene region might be associated with schizophrenia in the Ashkenazi Jewish population and that 2) changes in G72/G30 gene expression might be expected in schizophrenic patients compared with control subjects. Methods Eleven single nucleotide polymorphisms (SNPs) encompassing the G72/G30 genes were typed in the genomic deoxyribonucleic acid (DNA) from 60 schizophrenic patients and 130 matched control subjects of Ashkenazi ethnic origin. Case-control comparisons were based on linkage disequilibrium (LD) and haplotype frequency estimations. Gene expression analysis of G72 and G30 was performed on 88 postmortem dorsolateral prefrontal cortex samples. Results Linkage disequilibrium analysis revealed two main SNP blocks. Haplotype analysis on block II, containing three SNPs external to the genes, demonstrated an association with schizophrenia. Gene expression analysis exhibited correlations between expression levels of the G72 and G30 genes, as well as a tendency toward overexpression of the G72 gene in schizophrenic brain samples of 44 schizophrenic patients compared with 44 control subjects. Conclusions It is likely that the G72/G30 region is involved in susceptibility to schizophrenia in the Ashkenazi population. The elevation in expression of the G72 gene coincides with the glutamatergic theory of schizophrenia.
A new gene for the Charcot-Marie-Tooth disorder
Ben-Asher E. & Lancet D. (2004) Israel Medical Association Journal. 6, 6, p. 376-377 Abstract
Recent innovations in the genecards suite
Safran A. & Lancet D. (2004) Briefings in Bioinformatics. 5, 2, p. 204-205 Abstract
GeneAnnot: Comprehensive two-way linking between oligonucleotide array probesets and GeneCards genes
Chalifa-Caspi V., Yanai I., Ophir R. et_al. (2004) Bioinformatics. 20, 9, p. 1457-1458 Abstract
Motivation: High density oligonucleotide arrays are usually annotated in a one-to-one fashion, with each probeset assigned to one gene. However, in reality, subsets of oligonucleotides in a probeset may match sequences within more than one gene, potentially leading to misinterpretations. Moreover, a gene is often represented by more than one probeset, and analyzing probe matches at the mRNA level can help one deduce whether these probesets are derived from the same or different splice variants. Results: The GeneAnnot system comprehensively documents the many-to-many relationship between oligonucleotide array probesets and annotated genes in GeneCards™. It performs pairwise alignments between the probe sequences and gene transcripts, and assigns sensitivity and specificity scores to each probeset/gene pair.
5-lipoxygenase activating protein (ALOX5AP): Association with cardiovascular infarction and stroke
Ben-Asher E. & Lancet D. (2004) Israel Medical Association Journal. 6, 5, p. 318-319 Abstract
The canine olfactory subgenome
Olender T., Fuchs T., Linhart C., Shamir R., Adams M., Kalush F., Khen M. & Lancet D. (2004) Genomics. 83, 3, p. 361-372 Abstract
We identified 971 olfactory receptor (OR) genes in the dog genome, estimated to constitute ∼80% of the canine OR repertoire. This was achieved by directed genomic DNA cloning of olfactory sequence tags as well as by mining the Celera canine genome sequences. The dog OR subgenome is estimated to have 12% pseudogenes, suggesting a functional repertoire similar to that of mouse and considerably larger than for humans. No novel OR families were discovered, but as many as 34 gene subfamilies were unique to the dog. "Fish- like" Class I ancient ORs constituted 18% of the repertoire, significantly more than in human and mouse. A set of 122 dog-human-mouse ortholog triplets was identified, with a relatively high fraction of Class I ORs. The elucidation of a large portion of the canine olfactory receptor gene superfamily, with some dog-specific attributes, may help us understand the unique chemosensory capacities of this species.
Prospects of a computational origin of life endeavor
Shenhav B. & Lancet D. (2004) Origins of Life and Evolution of the Biosphere. 34, 1-2, p. 181-194 Abstract
While the last century brought an exquisite understanding of the molecular basis of life, very little is known about the detailed chemical mechanisms that afforded the emergence of life on early earth. There is a broad agreement that the problem lies in the realm of chemistry, and likely resides in the formation and mutual interactions of carbon-based molecules in aqueous medium. Yet, present-day experimental approaches can only capture the synthesis and behavior of a few molecule types at a time. On the other hand, experimental simulations of prebiotic syntheses, as well as chemical analyses of carbonaceous meteorites, suggest that the early prebiotic hydrosphere contained many thousands of different compounds. The present paper explores the idea that given the limitations of test-tube approaches with regards to such a 'random chemistry' scenario, an alternative mode of analysis should be pursued. It is argued that as computational tools for the reconstruction of molecular interactions improve rapidly, it may soon become possible to perform adequate computer-based simulations of prebiotic evolution. We thus propose to launch a computational origin of life endeavor (http://ool.weizmann.ac.il/CORE), involving computer simulations of realistic complex prebiotic chemical networks. In the present paper we provide specific examples, based on a novel algorithmic approach, which constitutes a hybrid of molecular dynamics and stochastic chemistry. As one potential solution for the immense hardware requirements dictated by this approach, we have begun to implement an idle CPU harvesting scheme, under the title ool@home.
The lipid world: From catalytic and informational headgroups to micelle replication and evolution without nucleic acids
Bar-Even A., Shenhav B., Kafri R. & Lancet D. (2004) Life In The Universe: From The Miller Experiment To The Search For Life On Other Worlds. Vol. 7. p. 111-114 Abstract
A widespread notion is that life arose from a single molecular replicator, probably a self-copying polynucleotide, in an RNA World (Joyce, 2002). We have proposed an alternative Lipid World scenario as an early evolutionary step in the emergence of cellular life on Earth (Segre et al., 2001). This concept combines the potential chemical activities of lipids and other amphiphiles, with their capacity to undergo spontaneous self-organization into supramolecular structures, such as micelles and bilayers. In quantitative, chemically-realistic computer simulations of our Graded Autocatalysis Replication Domain (GARD) model (Segre et al., 1998), we have shown that prebiotic molecular networks, potentially existing within assemblies of lipid-like molecules, manifest a behavior similar to self reproduction or self-replication.
Graded artificial chemistry in restricted boundaries
Shenhav B., Kafri R. & Lancet D. (2004) Artificial Life IX: Proceedings of the Ninth International Conference on the Simulation and Synthesis of Living Systems. p. 501-506 Abstract
The question of the origin of life is addressed by artificial life research, particularly in the realm of artificial chemistry. Such artificial chemistry is described by our Graded Autocatalysis Replication Domain (GARD) model. GARD depicts an unorthodox scenario suggested for emergence of life - the 'lipid world'. The model concerns molecular assemblies with mutual catalysis in an environment containing a plethora of molecular species. Many aspects of GARD were amply discussed. Here we concentrate on the importance of size constraints as depicted by the basic model and several of its variants. Occasional fission of a GARD assembly, which restricts the assembly size, is crucial for generating compositional quasi-stationary states ('composomes'). In a spatial version of GARD, bounded environments yield spontaneous emergence of different ecologies. Limiting the size of a population of GARD assemblies gives rise to a complex population dynamics. The last example, with possible wider impact to chemistry and nano-technology, suggests that size limit can give rise to spontaneous symmetry breaking. This latter result is compared to the classic Frank's model for homo-chirality, which requires explicit inhibition. We conclude that size restrictions are fundamental in the field of origin of life and artificial life, not only in order to facilitate evolutionary processes, as previously suggested, but also, for augmenting the dynamics portrayed by different scenarios and models.
Probability rule for chiral recognition
Kafri R. & Lancet D. (2004) Chirality. 16, 6, p. 369-378 Abstract
Molecular Chirality is of central interest in biological studies because enantiomeric compounds, while indistinguishable by most inanimate systems, show profoundly different properties in biochemical environments. Enantioselective separation methods, based on the differential recognition of two optical isomers by a chiral selector, have been amply documented. Also, great effort has been directed towards a theoretical understanding of the fundamental mechanisms underlying the chiral recognition process. Here we report a comprehensive data examination of enantioseparation measurements for over 72,000 chiral selector-selectand pairs from the chiral selection compendium CHIRBASE. The distribution of α = k_D/k_L values was found to follow a power law, equivalent to an exponential decay for chiral differential free energies. This observation is experimentally relevant in terms of the number of different individual or combinatorial selectors that need to be screened in order to observe α values higher than a preset minimum. A string model for enantiorecognition (SMED) formalism is proposed to account for this observation on the basis of an extended Ogston three-point interaction model. Partially overlapping molecular interaction domains are analyzed in terms of a string complementarity model for ligand-receptor complementarity. The results suggest that chiral selection statistics may be interpreted in terms of more general concepts related to biomolecular recognition.
Prediction of the odorant binding site of olfactory receptor proteins by human-mouse comparisons
Man O., Gilad Y. & Lancet D. (2004) Protein Science. 13, 1, p. 240-254 Abstract
Olfactory receptors (ORs) are a large family of proteins involved in the recognition and discrimination of numerous odorants. These receptors belong to the G-protein coupled receptor (GPCR) hyperfamily, for which little structural data are available. In this study we predict the binding site residues of OR proteins by analyzing a set of 1441 OR protein sequences from mouse and human. The central insight utilized is that functional contact residues would be conserved among pairs of orthologous receptors, but considerably less conserved among paralogous pairs. Using judiciously selected subsets of 218 ortholog pairs and 518 paralog pairs, we have identified 22 sequence positions that are both highly conserved among the putative orthologs and variable among paralogs. These residues are disposed on transmembrane helices 2 to 7, and on the second extracellular loop of the receptor. Strikingly, although the prediction makes no assumption about the location of the binding site, these amino acid positions are clustered around a pocket in a structural homology model of ORs, mostly facing the inner lumen. We propose that the identified positions constitute the odorant binding site. This conclusion is supported by the observation that all but one of the predicted binding site residues correspond to ligand-contact positions in other rhodopsin-like GPCRs.
Loss of olfactory receptor genes coincides with the acquisition of full trichromatic vision in primates
Gilad Y., Wiebel V., Przeworski M., Lancet D. & Paabo S. (2004) PLoS Biology. 2, 1, Abstract
Olfactory receptor (OR) genes constitute the molecular basis for the sense of smell and are encoded by the largest gene family in mammalian genomes. Previous studies suggested that the proportion of pseudogenes in the OR gene family is significantly larger in humans than in other apes and significantly larger in apes than in the mouse. To investigate the process of degeneration of the olfactory repertoire in primates, we estimated the proportion of OR pseudogenes in 19 primate species by surveying randomly chosen subsets of 100 OR genes from each species. We find that apes, Old World monkeys and one New World monkey, the howler monkey, have a significantly higher proportion of OR pseudogenes than do other New World monkeys or the lemur (a prosimian). Strikingly, the howler monkey is also the only New World monkey to possess full trichromatic vision, along with Old World monkeys and apes. Our findings suggest that the deterioration of the olfactory repertoire occurred concomitant with the acquisition of full trichromatic color vision in primates.
The olfactory receptor universe - From whole genome analysis to structure and evolution
Olender T., Feldmesser E., Atarot T., Eisenstein M. & Lancet D. (2004) Genetics and Molecular Research. 3, 4, p. 545-553 Abstract
Olfactory receptors (ORs) constitute the largest gene-family in the vertebrate genome. We have attempted to provide a comprehensive view of the OR universe through diverse tools of bioinformatics and computational biology. Among others, we have constructed the Human Olfactory Receptor Data Exploratorium (HORDE, http:// bioportal.weizmann.ac.il/HORDE/) as a free online resource, which integrates information on ORs from different species. We studied the genomic organization of 853 human ORs and divided the repertoire into 135 clusters, accessible through our new cluster viewer feature. An analysis of intact and pseudogenized ORs in different clusters, as well as of OR expression patterns, is provided, relevant to OR transcription control. Coding single nucleotide polymorphisms were integrated; these are to be used for genotype-phenotype correlation studies. HORDE allows a unique opportunity for discerning protein structural and functional information of the individual OR proteins. By applying novel data analysis strategies to the >3000 OR genes of mouse, dog and human within HORDE, we have generated a set of refined rhodopsin-based homology models for ORs. For model improvement, we employed a novel analysis of specific positions along the seven transmembrane helices at which prolines generate helix-breaking kinks. The model shows family-specific structural features, including idiosyncratic kink patterns, which lead to significant differences in the inferred odorant binding site structure. Such analyses form a basis for a comprehensive sequence-based classification of OR proteins in terms of potential odorant binding specificities.

2003

Natural selection on the olfactory receptor gene family in humans and chimpanzees
Gilad Y., Bustamante C., Lancet D. & Paabo S. (2003) American Journal of Human Genetics. 73, 3, p. 489-501 Abstract
The olfactory receptor (OR) genes constitute the largest gene family in mammalian genomes. Humans have >1,000 OR genes, of which only ∼40% have an intact coding region and are therefore putatively functional. In contrast, the fraction of intact OR genes in the genomes of the great apes is significantly greater (68%-72%), suggesting that selective pressures on the OR repertoire vary among these species. We have examined the evolutionary forces that shaped the OR gene family in humans and chimpanzees by resequencing 20 OR genes in 16 humans, 16 chimpanzees, and one orangutan. We compared the variation at the OR genes with that at intergenic regions. In both humans and chimpanzees, OR pseudogenes seem to evolve neutrally. In chimpanzees, patterns of variability are consistent with purifying selection acting on intact OR genes, whereas, in humans, there is suggestive evidence for positive selection acting on intact OR genes. These observations are likely due to differences in lifestyle, between humans and great apes, that have led to distinct sensory needs.
A feature extraction method for chemical sensors in electronic noses
Carmel L., Levy S., Lancet D. & Harel D. (2003) Sensors And Actuators B-Chemical. 93, 1-3, p. 67-76 Abstract
We propose a new feature extraction method for use with chemical sensors. It is based on fitting a parametric analytic model of the sensor's response over time to the measured signal, and taking the set of best-fitting parameters as the features. The process of finding the features is fast and robust, and the resulting set of features is shown to significantly enhance the performance of subsequent classification algorithms. Moreover, the model that we have developed fits equally well to sensors of different technologies and embeddings, suggesting its applicability to a diverse repertoire of sensors and analytic devices.
An eNose algorithm for identifying chemicals and determining their concentration
Carmel L., Sever N., Lancet D. & Harel D. (2003) Sensors And Actuators B-Chemical. 93, 1-3, p. 77-83 Abstract
We propose an algorithm for use with multisensor systems that is capable of the following: (a) identify an analyte independently of its concentration; (b) estimate the concentration of the analyte, even if the system was not previously exposed to this concentration; (c) tell when an analyte is of a chemical type not previously presented to the system. The algorithm, based upon recent work of Hopfield, uses the multiplicity of sensors explicitly, and is intuitive and easy to implement. We have tested it against real data, and it exhibits high quality performance.
Different noses for different people
Menashe I., Man O., Lancet D. & Gilad Y. (2003) Nature Genetics. 34, 2, p. 143-144 Abstract
Of more than 1,000 human olfactory receptor genes, more than half seem to be pseudogenes. We investigated whether the most recent of these disruptions might still segregate with the intact form by genotyping 51 candidate genes in 189 ethnically diverse humans. The results show an unprecedented prevalence of segregating pseudogenes, identifying one of the most pronounced cases of functional population diversity in the human genome.
From subgenome analysis to protein structure
Man O., Atarot T., Sadot A., Olender T. & Lancet D. (2003) Current Opinion in Structural Biology. 13, 3, p. 353-358 Abstract
Groups of related genes abound in large eukaryotic genomes. In such 'subgenomes', homology modeling carried out for a few genes will probably have relevance to the entire group. Subgenomes also afford unique ways of determining protein structural information. In addition to analyses based on the quantification of residue variability in paralogs, two-way comparisons, both within and among species, help to disclose functional amino acids. Comparative studies of gene families throughout the mammalian genome will also help elucidate the functional significance of single nucleotide polymorphisms in coding regions.
Towards an odor communication system
Harel D., Carmel L. & Lancet D. (2003) Computational Biology and Chemistry. 27, 2, p. 121-133 Abstract
We propose a setup for an odor communication system. Its different parts are described, and ways to realize them are outlined. Our scheme enables an output device - the whiffer - to release an imitation of an odorant read in by an input device - the sniffer - upon command. The heart of the system is the novel algorithmic scheme that makes the scheme feasible. We are currently at work researching and developing some of the components that constitute the algorithm, and we hope that the description of the overall scheme in this paper will help to get other groups to join in this effort.
Mesobiotic emergence: Molecular and ensemble complexity in early evolution
Shenhav B., Segre D. & Lancet D. (2003) Advances in Complex Systems. 6, 1, p. 15-35 Abstract
In addition to the visible complexity expressed in the morphogenesis of multicellular organisms, two levels of microscopic complexity may be discerned within every living cell. The first level is related to covalently bonded structures, namely molecules. The second level has to do with the generation of non-covalent molecular assemblies. Origin of life research has largely focused on the first complexity level, i.e. the appearance of covalent biopolymers. We present a life emergence scenario based mainly on the second complexity level. We argue that homeostatic molecular ensembles, for which we have coined the term "mesobiotic," have assumed a half-way position between prebiotic organic synthesis and full-fledged cellular (biotic) life.
Human specific loss of olfactory receptor genes
Gilad Y., Man O., Paabo S. & Lancet D. (2003) Proceedings of the National Academy of Sciences of the United States of America. 100, 6, p. 3324-3327 Abstract
Olfactory receptor (OR) genes constitute the basis for the sense of smell and are encoded by the largest mammalian gene superfamily of >1,000 genes. In humans, >60% of these are pseudogenes. In contrast, the mouse OR repertoire, although of roughly equal size, contains only ≈20% pseudogenes. We asked whether the high fraction of nonfunctional OR genes is specific to humans or is a common feature of all primates. To this end, we have compared the sequences of 50 human OR coding regions, regardless of their functional annotations, to those of their putative orthologs in chimpanzees, gorillas, orangutans, and rhesus macaques. We found that humans have accumulated mutations that disrupt OR coding regions roughly 4-fold faster than any other species sampled. As a consequence, the fraction of OR pseudogenes in humans is almost twice as high as in the non-human primates, suggesting a human-specific process of OR gene disruption, likely due to a reduced chemosensory dependence relative to apes.
GeneLoc: Exon-based integration of human genome maps
Rosen N., Chalifa-Caspi V., Shmueli O., Adato A., Lapidot M., Stampnitzky J., Safran M. & Lancet D. (2003) Bioinformatics. 19, SUPPL. 1, p. i222-i224 Abstract
Motivation: Despite the numerous available whole-genome mapping resources, no comprehensive, integrated map of the human genome yet exists. Results: GeneLoc, software adjunct to GeneCards and UDB, integrates gene lists by comparing genomic coordinates at the exon level and assigns unique and meaningful identifiers to each gene. Availability: http://bioinfo.weizmann.ac. il/genecards and http://genecards.weizmann.ac.il/udb Supplementary information: http://bioinfo.weizmann.ac.il/cards-bin/AboutGCids.cgi, http://genecards. weizmann.ac.il/GeneLocAlg.html.
Computer simulation of protocells
Lancet D. (2003) Computational Methods In Systems Biology, Proceedings. 2602, p. 194-197 Abstract
Keywords: Computer Science, Interdisciplinary Applications; Computer Science, Theory & Methods
Human Olfactory Receptors
Man O., Olender T. & Lancet D. (2003) Handbook of Cell Signaling. 1-3, p. 145-147 Abstract
[No abstract available]
Human gene-centric databases at the Weizmann Institute of science: GeneCards, UDB, CroW 21 and HORDE
Safran M., Chalifa-Caspi V., Shmueli O. et_al. (2003) Nucleic Acids Research. 31, 1, p. 142-146 Abstract
Recent enhancements and current research in the GeneCards (GC) (http://bioinfo.weizmann.ac.il/cards/) project are described, including the addition of gene expression profiles and integrated gene locations. Also highlighted are the contributions of specialized associated human gene-centric databases developed at the Weizmann Institute. These include the Unified Database (UDB) (http://bioinfo.weizmann.ac.il/udb) for human genome mapping, the human Chromosome 21 database at the Weizmann Insti-tute (CroW 21) (http:// bioinfo.weizmann.ac.il/crow21), and the Human Olfactory Receptor Data Explora-torium (HORDE) (http://bioinfo.weizmann.ac.il/HORDE). The synergistic relationships amongst these efforts have positively impacted the quality, quantity and usefulness of the GeneCards gene compendium.
GeneNote: Whole genome expression profiles in normal human tissues
Shmueli O., Horn-Saban S., Chalifa-Caspi V., Shmoish M., Ophir R., Benjamin-Rodrig H., Safran M., Domany E. & Lancet D. (2003) Comptes Rendus Biologies. 326, 10-11, p. 1067-1072 Abstract
A novel data set, GeneNote (Gene Normal Tissue Expression), was produced to portray complete gene expression profiles in healthy human tissues using the Affymetrix GeneChip HG-U95 set, which includes 62 839 probe-sets. The hybridization intensities of two replicates were processed and analyzed to yield the complete transcriptome for twelve human tissues. Abundant novel information on tissue specificity provides a baseline for past and future expression studies related to diseases. The data is posted in GeneNote (http://genecards.weizmann.ac.il/genenote/), a widely used compendium of human genes (http://bioinfo.weizmann.ac.il/genecard).

2002

GeneCards™ 2002: An evolving human gene compendium
Safran M., Solomon I., Shmueli O. et_al. (2002) Proceedings - IEEE Computer Society Bioinformatics Conference, CSB 2002. p. 339 Abstract
GeneCards™ (http://bioinfo.weizmann.ac.il/cards/) is an automated, integrated database of human genes, genomic maps, proteins, and diseases, with software that retrieves, consolidates, searches, and displays human genome information. Over the past few years, the system has consistently, added new features including sequence accessions, genomic locations, cDNA assemblies, orthologies, medical information, 3D protein structures, SNP summaries, and gene expression. In parallel, its infrastructure is being upgraded to use object-oriented Perl to produce, display, and search data that is formatted in Extensible Markup Language (XML, (http://www.w3.org/XML), providing a basis for schema-driven display code and context-specific searches.
Pharmacogenetic development of personalized medicine: Multiple sclerosis treatment as a model
Kirstein-Grossman I., Beckmann J., Lancet D. & Miller A. (2002) Drug News and Perspectives. 15, 9, p. 558-567 Abstract
The goal of pharmacogenetics is to identify "genetic fingerprints" that may predict a patient's response to pharmaceutical treatment. The use of pharmacogenetics replaces the trial-and-error strategy, which governs much of our clinical decision-making regarding treatment allocation in current medical practice, with individually tailored therapy. We review a pharmacogenetic research model, which implements high-throughput single nucleotide polymorphism technology to establish the correlation between drug-responsiveness and genetic polymorphisms of Copaxone®-treated multiple sclerosis patients. Implementation of similar pharmacogenetic approaches may promote the development of personalized medicine in multiple sclerosis as well as in other diseases.
GeneCards™ 2002: Towards a complete, object-oriented, human gene compendium
Safran M., Solomon I., Shmueli O. et_al. (2002) Bioinformatics. 18, 11, p. 1542-1543 Abstract
Motivation: In the post-genomic era, functional analysis of genes requires a sophisticated interdisciplinary arsenal. Comprehensive resources are challenged to provide consistently improving, state-of-the-art tools. Results: GeneCards (Rebhan et al., 1998) has made innovative strides: (a) regular updates and enhancements incorporating new genes enriched with sequences, genomic locations, cDNA assemblies, orthologies, medical information, 3D protein structures, gene expression, and focused SNP summaries; (b) restructured software using object-oriented Perl, migration to schema-driven XML, and (c) pilot studies, introducing methods to produce cards for novel and predicted genes.
Computational capacity of an odorant discriminator: The linear separability of curves
Caticha N., Tejada J., Lancet D. & Domany E. (2002) Neural Computation. 14, 9, p. 2201-2220 Abstract
We introduce and study an artificial neural network inspired by the probabilistic receptor affinity distribution model of olfaction. Our system consists of N sensory neurons whose outputs converge on a single processing linear threshold element. The system's aim is to model discrimination of a single target odorant from a large number p of background odorants within a range of odorant concentrations. We show that this is possible provided p does not exceed a critical value p_c and calculate the critical capacity α_c = p_c/N. The critical capacity depends on the range of concentrations in which the discrimination is to be accomplished. If the olfactory bulb may be thought of as a collection of such processing elements, each responsible for the discrimination of a single odorant, our study provides a quantitative analysis of the potential computational properties of the olfactory bulb. The mathematical formulation of the problem we consider is one of determining the capacity for linear separability of continuous curves, embedded in a large-dimensional space. This is accomplished here by a numerical study, using a method that signals whether the discrimination task is realizable, together with a finite-size scaling analysis.
DEFOG: A practical scheme for deciphering families of genes
Fuchs T., Malecova B., Linhart C. et_al. (2002) Genomics. 80, 3, p. 295-302 Abstract
We developed a novel efficient scheme, DEFOG (for "deciphering families of genes"), for determining sequences of numerous genes from a family of interest. The scheme provides a powerful means to obtain a gene family composition in species for which high-through-put genomic sequencing data are not available. DEFOG uses two key procedures. The first is a novel algorithm for designing highly degenerate primers based on a set of known genes from the family of interest. These primers are used in PCR reactions to amplify the members of the gene family. The second combines oligofingerprinting of the cloned PCR products with clustering of the clones based on their fingerprints. By selecting members from each cluster, a low-redundancy clone subset is chosen for sequencing. We applied the scheme to the human olfactory receptor (OR) genes. OR genes constitute the largest gene superfamily in the human genome, as well as in the genomes of other vertebrate species. DEFOG almost tripled the size of the initial repertoire of human ORs in a single experiment, and only 7% of the PCR clones had to be sequenced. Extremely high degeneracies, reaching over a billion combinations of distinct PCR primer pairs, proved to be very effective and yielded only 0.4% nonspecific products.
Population differences in haplotype structure within a human olfactory receptor gene cluster
Menashe I., Man O., Lancet D. & Gilad Y. (2002) Human Molecular Genetics. 11, 12, p. 1381-1390 Abstract
We investigated the population differences in patterns of single nucleotide polymorphisms (SNPs) for a 400 kb olfactory receptor (OR) gene cluster on human chromosome 17p13.3. Samples were drawn from 35 individuals, of four different ethnogeographical origins: Pygmies, Bedouins, Yemenite Jews and Ashkenazi Jews. Of the 74 SNPs identified, two segregated between pseudogenized and intact ORs, while a third involved a change in a highly conserved motif proposed to mediate ligand-induced signal transduction. Linkage disequilibrium (LD) was computed based on phase inference across the cluster using Clark's haplotype subtraction algorithm. We also calculated LD directly from the genotypes using the expectation-maximization (EM) algorithm. Both methods yielded very similar results. Our analyses revealed substantial differences in nucleotide diversity, haplotype distribution and LD patterns among the different human populations. In particular, the two Jewish populations had low haplotype diversity and negligible decay of LD across the entire genomic region. Intriguingly, the three functional SNPs segregated at different frequencies in the different ethnogeographical groups, with the Pygmies having higher frequencies of the intact OR genes. Our data suggests that OR genes may have evolved to create different functional repertoires in distinct human populations.
Test of a statistical model for molecular recognition in biological repertoires
Rosenwald S., Kafri R. & Lancet D. (2002) Journal of Theoretical Biology. 216, 3, p. 327-336 Abstract
A chance encounter between members of a random repertoire and a molecular target is characteristic of different biological systems, including the immune and olfactory pathways as well as combinatorial libraries. In such systems, the affinity between the target and members of the repertoire is distributed with a probability function describing the propensity of obtaining a particular affinity value. We have previously proposed a phenomenological receptor affinity distribution (RAD) formalism, which describes this probability function based on simple statistical considerations. In the present analysis, we use published data from diverse experimental systems, including phage display libraries, immunoglobulins and enzymes, to test the RAD model and to compare it to other affinity distribution formalisms. The RAD model is found to provide the best description for binding data for over eight orders of magnitude on the affinity scale, and to account for a relationship between repertoire size and the maximal obtainable affinity within different repertoires. This approach points to a potential universality of the rules that govern affinity distributions in biology.
Evidence for positive selection and population structure at the human MAO-A gene
Gilad Y., Rosenberg S., Przeworski M., Lancet D. & Skorecki K. (2002) Proceedings of the National Academy of Sciences of the United States of America. 99, 2, p. 862-867 Abstract
We report the analysis of human nucleotide diversity at a genetic locus known to be involved in a behavioral phenotype, the monoamine oxidase A gene. Sequencing of five regions totaling 18.8 kb and spanning 90 kb of the monoamine oxidase A gene was carried out in 56 male individuals from seven different ethnogeographic groups. We uncovered 41 segregating sites, which formed 46 distinct haplotypes. A permutation test detected substantial population structure in these samples. Consistent with differentiation between populations, linkage disequilibrium is higher than expected under panmixia, with no evidence of a decay with distance. The extent of linkage disequilibrium is not typical of nuclear loci and suggests that the underlying population structure may have been accentuated by a selective sweep that fixed different haplotypes in different populations, or by local adaptation. In support of this suggestion, we find both a reduction in levels of diversity (as measured by a Hudson-Kreitman-Aguade test with the DMD44 locus) and an excess of high frequency-derived variants, as expected after a recent episode of positive selection.
USH3A transcripts encode clarin-1, a four-transmembrane-domain protein with a possible role in sensory synapses
Adato A., Vreugde S., Joensuu T. et_al. (2002) European Journal of Human Genetics. 10, 6, p. 339-350 Abstract
Usher syndrome type 3 (USH3) is an autosomal recessive disorder characterised by the association of post-lingual progressive hearing loss, progressive visual loss due to retinitis pigmentosa and variable presence of vestibular dysfunction. Because the previously defined transcripts do not account for all USH3 cases, we performed further analysis and revealed the presence of additional exons embedded in longer human and mouse USH3A transcripts and three novel USH3A mutations. Expression of Ush3a transcripts was localised by whole mount in situ hybridisation to cochlear hair cells and spiral ganglion cells. The full length USH3A transcript encodes clarin-1, a four-transmembrane-domain protein, which defines a novel vertebrate-specific family of three paralogues. Limited sequence homology to stargazin, a cerebellar synapse four-transmembrane-domain protein, suggests a role for clarin-1 in hair cell and photoreceptor cell synapses, as well as a common pathophysiological pathway for different Usher syndromes.

2001

The molecular roots of compositional inheritance
Segre D., Shenhav B., Kafri R. & Lancet D. (2001) Journal of Theoretical Biology. 213, 3, p. 481-491 Abstract
Non-covalent compositional assemblies, made of monomeric mutually catalytic molecules, constitute an alternative to alphabet-based informational biopolymers as a mechanism of primordial inheritance. Such assemblies appear implicitly in many "Metabolism First" origin of life scenarios, and more explicitly in the Graded Autocatalysis Replication Domain (GARD) model [Segré et al. (2000). Proc. Natl Acad. Sci. U.S.A. 97, 4112-4117]. In the present work, we provide a detailed analysis of the quantitative molecular roots of such behavior. It is demonstrated that the fidelity of reproduction provided by a newly defined heritability measure η_s*, strongly depends on the values of molecular recognition parameters and on assembly size. We find that if the catalytic rate acceleration coefficients are distributed normally, transfer of compositional information becomes impossible, due to frequent "compositional error catastrophes". In contrast, if the catalytic acceleration rates obey a lognormal distribution, as actually predicted by a statistical formalism for molecular repertoires, high reproduction fidelity is obtained. There is also a clear dependence on assembly size N, whereby maximal η is seen in a narrow range around N ∼ 3.5N_G/λ, where N_G is the size of the primordial molecular repertoire and λ is a molecular interaction statistical parameter. Such relationships help define the physicochemical conditions that could underlie the early steps in pre-biotic evolution.
The RUNX3 gene - Sequence, structure and regulated expression
Bangsow C., Rubins N., Glusman G. et_al. (2001) Gene. 279, 2, p. 221-232 Abstract
The RUNX3 gene belongs to the runt domain family of transcription factors that act as master regulators of gene expression in major developmental pathways. In mammals the family includes three genes, RUNX1, RUNX2 and RUNX3. Here, we describe a comparative analysis of the human chromosome 1p36.1 encoded RUNX3 and mouse chromosome 4 encoded Runx3 genomic regions. The analysis revealed high similarities between the two genes in the overall size and organization and showed that RUNX3/Runx3 is the smallest in the family, but nevertheless exhibits all the structural elements characterizing the RUNX family. It also revealed that RUNX3/Runx3 bears a high content of the ancient mammalian repeat MIR. Together, these data delineate RUNX3/Runx3 as the evolutionary founder of the mammalian RUNX family. Detailed sequence analysis placed the two genes at a GC-rich H3 isochore with a sharp transition of GC content between the gene sequence and the downstream intergenic region. Two large conserved CpG islands were found within both genes, one around exon 2 and the other at the beginning of exon 6. RUNX1, RUNX2 and RUNX3 gene products bind to the same DNA motif, hence their temporal and spatial expression during development should be tightly regulated. Structure/function analysis showed that two promoter regions, designated P1 and P2, regulate RUNX3 expression in a cell type-specific manner. Transfection experiments demonstrated that both promoters were highly active in the GM1500 B-cell line, which endogenously expresses RUNX3, but were inactive in the K562 myeloid cell line, which does not express RUNX3.
Erratum: Initial sequencing and analysis of the human genome: International Human Genome Sequencing Consortium (Nature (2001) 409 (860-921))
Lander E. S., Linton L. M., Birren B. et_al. (2001) Nature. 412, 6846, p. 565-566 Abstract
The complete human olfactory subgenome
Glusman G., Yanai I., Rubin I. & Lancet D. (2001) Genome Research. 11, 5, p. 685-702 Abstract
Olfactory receptors likely constitute the largest gene superfamily in the vertebrate genome. Here we present the nearly complete human olfactory subgenome elucidated by mining the genome draft with gene discovery algorithms. Over 900 olfactory receptor genes and pseudogenes (ORs) were identified, two-thirds of which were not annotated previously. The number of extrapolated ORs is in good agreement with previous theoretical predictions. The sequence of at least 63% of the ORs is disrupted by what appears to be a random process of pseudogene formation. ORs constitute 17 gene families, 4 of which contain more than 100 members each. "Fish-like" Class I ORs, previously considered a relic in higher tetrapods, constitute as much as 10% of the human repertoire, all in one large cluster on chromosome II. Their lower pseudogene fraction suggests a functional significance. ORs are disposed on all human chromosomes except 20 and Y, and nearly 80% are found in clusters of 6-138 genes. A novel comparative cluster analysis was used to trace the evolutionary path that may have led to OR proliferation and diversification throughout the genome. The results of this analysis suggest the following genome expansion history: first, the generation of a "tetrapod-specific" Class II OR cluster on chromosome II by local duplication, then a single-step duplication of this cluster to chromosome I, and finally an avalanche of duplication events out of chromosome 1 to most other chromosomes. The results of the data mining and characterization of ORs can be accessed at the Human Olfactory Receptor Data Exploratorium Web site (http://bioinfo.weizmann.ac.il/HORDE).
Mouse-human orthology relationships in an olfactory receptor gene cluster
Lapidot M., Pilpel Y., Gilad Y., Falcovitz A., Sharon D., Haaf T. & Lancet D. (2001) Genomics. 71, 3, p. 296-306 Abstract
The olfactory receptor (OR) subgenome harbors the largest known gene family in mammals, disposed in clusters on numerous chromosomes. One of the best characterized OR clusters, located at human chromosome 17p13.3, has previously been studied by us in human and in other primates, revealing a conserved set of 17 OR genes. Here, we report the identification of a syntenic OR cluster in the mouse and the partial DNA sequence of many of its OR genes. A probe for the mouse M5 gene, orthologous to one of the OR genes in the human cluster (OR17-25), was used to isolate six PAC clones, all mapping by in situ hybridization to mouse chromosome 11B3-11B5, a region of shared synteny with human chromosome 17p13.3. Thirteen mouse OR sequences amplified and sequenced from these PACs allowed us to construct a putative physical map of the OR gene cluster at the mouse Olfr1 locus. Several points of evidence, including a strong similarity in subfamily composition and at least four cases of gene orthology, suggest that the mouse Olfr1 and the human 17p13.3 clusters are orthologous. A detailed comparison of the OR sequences within the two clusters helps trace their independent evolutionary history in the two species. Two types of evolutionary scenarios are discerned: cases of "true orthologous genes" in which high sequence similarity suggests a shared conserved function, as opposed to instances in which orthologous genes may have undergone independent diversification in the realm of "free reign" repertoire expansion.
Initial sequencing and analysis of the human genome
Lancet D. (2001) Nature. 409, 6822, p. 860-921 Abstract
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
Estimating the size of the olfactory repertoire
Carmel L., Harel D. & Lancet D. (2001) Bulletin of Mathematical Biology. 63, 6, p. 1063-1078 Abstract
The concept of shape space, which has been successfully implemented in immunology, is used here to construct a model for the discrimination power of the olfactory system. Using reasonable assumptions on the behaviour of the biological system, we are able to estimate the number of distinct olfactory receptor types. Our estimated value of around 1000 receptor types is in good agreement with experimental data.
The human olfactory subgenome: From sequence to structure and evolution
Fuchs T., Glusman G., Horn-Saban S., Lancet D. & Pilpel Y. (2001) Human Genetics. 108, 1, p. 1-13 Abstract
Olfactory receptors (ORs) constitute the largest multigene family in multicellular organisms. Their evolutionary proliferation has been driven by the need to provide recognition capacity for millions of potential odorants with arbitrary chemical configurations. Human genome sequencing has provided a highly informative picture of the "olfactory subgenome", the repertoire of OR genes. We describe here an analysis of 224 human OR genes, a much larger number than hitherto systematically analyzed. These are derived by literature survey, data mining at 14 genomic clusters, and by an OR-targeted experimental sequencing strategy. The presented set contains at least 53% pseudogenes and is minimally divided into 11 gene families. One of these (no. 7) has undergone a particularly extensive expansion in primates. The analysis of this collection leads to insight into the origin of OR genes, suggesting a graded expansion through mammalian evolution. It also allows us to delineate a structural map of the respective proteins. A sequence database and analysis package is provided (http://bioinformatics.weizmann.ac.il/HORDE), which will be useful for analyzing human OR sequences genome-wide.
Visualizing large-scale genomic sequences: The GESTALT workbench produces interactive genome graphs for quick and intuitive sequence interpretation
Glusman G. & Lancet D. (2001) IEEE Engineering in Medicine and Biology Magazine. 20, 4, p. 49-54 Abstract
The Lipid World
Segre D., Ben-Eli D., Deamer D. & Lancet D. (2001) Origins of Life and Evolution of the Biosphere. 31, 1-2, p. 119-145 Abstract
The continuity of abiotically formed bilayer membranes with similar structures in contemporary cellular life, and the requirement for microenvironments in which large and small molecules could be compartmentalized, support the idea that amphiphilic boundary structures contributed to the emergence of life. As an extension of this notion, we propose here a 'Lipid World' scenario as an early evolutionary step in the emergence of cellular life on Earth. This concept combines the potential chemical activities of lipids and other amphiphiles, with their capacity to undergo spontaneous self-organization into supramolecular structures such as micelles and bilayers. In particular, the documented chemical rate enhancements within lipid assemblies suggest that energy-dependent synthetic reactions could lead to the growth and increased abundance of certain amphiphilic assemblies. We further propose that selective processes might act on such assemblies, as suggested by our computer simulations of mutual catalysis among amphiphiles. As demonstrated also by other researchers, such mutual catalysis within random molecular assemblies could have led to a primordial homeostatic system displaying rudimentary life-like properties. Taken together, these concepts provide a theoretical framework, and suggest experimental tests for a Lipid World model for the origin of life.
A missense mutation in a highly conserved region of CASQ2 is associated with autosomal recessive catecholamine-induced polymorphic ventricular tachycardia in Bedouin families from Israel
Lahat H., Pras E., Olender T. et_al. (2001) American Journal of Human Genetics. 69, 6, p. 1378-1384 Abstract
Catecholamine-induced polymorphic ventricular tachycardia (PVT) is characterized by episodes of syncope, seizures, or sudden death, in response to physical activity or emotional stress. Two modes of inheritance have been described: autosomal dominant and autosomal recessive. Mutations in the ryanodine receptor 2 gene (RYR2), which encodes a cardiac sarcoplasmic reticulum (SR) Ca²⁺-release channel, were recently shown to cause the autosomal dominant form of the disease. In the present report, we describe a missense mutation in a highly conserved region of the calsequestrin 2 gene (CASQ2) as the potential cause of the autosomal recessive form. The CASQ2 protein serves as the major Ca²⁺ reservoir within the SR of cardiac myocytes and is part of a protein complex that contains the ryanodine receptor. The mutation, which is in full segregation in seven Bedouin families affected by the disorder, converts a negatively charged aspartic acid into a positively charged histidine, in a highly negatively charged domain, and is likely to exert its deleterious effect by disrupting Ca²⁺ binding.
The UDP-N-acetylglucosamine 2-epimerase/N-acetylmannosamine kinase gene is mutated in recessive hereditary inclusion body myopathy
Eisenberg I., Avidan N., Potikha T. et_al. (2001) Nature Genetics. 29, 1, p. 83-87 Abstract
Hereditary inclusion body myopathy (HIBM; OMIM 600737) is a unique group of neuromuscular disorders characterized by adult onset, slowly progressive distal and proximal weakness and a typical muscle pathology including rimmed vacuoles and filamentous inclusions. The autosomal recessive form described in Jews of Persian descent is the HIBM prototype. This myopathy affects mainly leg muscles, but with an unusual distribution that spares the quadriceps. This particular pattern of weakness distribution, termed quadriceps-sparing myopathy (QSM), was later found in Jews originating from other Middle Eastern countries as well as in non-Jews. We previously localized the gene causing HIBM in Middle Eastern Jews on chromosome 9p12-13 (ref. 5) within a genomic interval of about 700 kb (ref. 6). Haplotype analysis around the HIBM gene region of 104 affected people from 47 Middle Eastern families indicates one unique ancestral founder chromosome in this community. By contrast, single non-Jewish families from India, Georgia (USA) and the Bahamas, with QSM and linkage to the same 9p12-13 region, show three distinct haplotypes. After excluding other potential candidate genes, we eventually identified mutations in the UDP-N-acetylglucosamine-2-epimerase/N-acetylmannosamine kinase (GNE) gene in the HIBM families: all patients from Middle Eastern descent shared a single homozygous missense mutation, whereas distinct compound heterozygotes were identified in affected individuals of families of other ethnic origins. Our findings indicate that GNE is the gene responsible for recessive HIBM.
Architecture and anatomy of the genomic locus encoding the human leukemia-associated transcription factor RUNX1/AML1
Levanon D., Glusman C., Bangsow T. et_al. (2001) Gene. 262, 1-2, p. 23-33 Abstract
The RUNX1 gene on human chromosome 21q22.12 belongs to the 'runt domain' gene family of transcription factors (also known as AML/CBFA/PEBP2α). RUNX1 is a key regulator of hematopoiesis and a frequent target of leukemia associated chromosomal translocations. Here we present a detailed analysis of the RUNX1 locus based on its complete genomic sequence. RUNX1 spans 260 kb and its expression is regulated through two distinct promoter regions, that are 160 kb apart. A very large CpG island complex marks the proximal promoter (promoter-2), and an additional CpG island is located at the 3 end of the gene. Hitherto, 12 different alternatively spliced RUNX1 cDNAs have been identified. Genomic sequence analysis of intron/exon boundaries of these cDNAs has shown that all consist of properly spliced authentic coding regions. This indicates that the large repertoire of RUNX1 proteins, ranging in size between 20-52 kDa, are generated through usage of alternatively spliced exons some of which contain in frame stop codons. The gene's introns are largely depleted of repetitive sequences, especially of the LINE1 family. The RUNX1 locus marks the transition from a ∼1 Mb of gene-poor region containing only pseudogenes, to a gene-rich region containing several functional genes. A search for RUNX1 sequences that may be involved in the high frequency of chromosomal translocations revealed that a 555 bp long segment originating in chromosome 11 FLI1 gene was transposed into RUNX1 intron 4.1. This intron harbors the t(8;21) and t(3;21) chromosomal breakpoints involved in acute myeloid leukemia. Interestingly, the FLI1 homologous sequence contains a breakpoint of the t(11;22) translocation associated with Ewing's tumors, and may have a similar function in RUNX1.
Mucolipidosis type IV: Novel MCOLN1 mutations in Jewish and Mon-Jewish patients and the frequency of the disease in the Ashkenazi Jewish population
Bargal R., Avidan N., Olender T. et_al. (2001) Human Mutation. 17, 5, p. 397-402 Abstract
The gene MCOLN1 is mutated in Mucolipidosis type IV (MLIV), a neurodegenerative, recessive, lysosomal storage disorder. The disease is found in relatively high frequency among Ashkenazi Jews due to two founder mutations that comprise 95% of the MLIV alleles in this population [Bargal et al., 2000]. In this report we complete the mutation analysis of Jewish and non-Jewish MLIV patients whose DNA were available to us. Four novel mutations were identified in the MCOLN1 gene of severely affected patients: two missense, T232P and F465L; a nonsense, R322X; and an 11-bp insertion in exon 12. The nonsense mutation (R322X) was identified in two unrelated patients with different haplotypes in the MCOLN1 chromosomal region, indicating a mutation hotspot in this CpG site. An in-frame deletion (F408del) was identified in a patient with unusual mild psychomotor retardation. The frequency of MLIV in the general Jewish Ashkenazi population was estimated in a sample of 2,000 anonymous, unrelated individuals assayed for the two founder mutations. This analysis indicated a heterozygotes frequency of about 1/100. A preferred nucleotide numbering system for MCOLN1 mutations is presented and the issue of a screening program for the detection of high-risk families in the Jewish Ashkenazi population is discussed.

2000

Identification and characterization of coding single-nucleotide polymorphisms within a human olfactory receptor gene cluster
Sharon D., Gilad Y., Glusman G., Khen M., Lancet D. & Kalush F. (2000) Gene. 260, 1-2, p. 87-94 Abstract
Single-nucleotide polymorphisms (SNPs) were studied in 15 olfactory receptor (OR) coding regions, one control region and two noncoding sequences all residing within a 412 kb OR gene cluster on human chromosome 17p13.3, as well as in other G-protein coupled receptors (GPCRs). A total of 26 SNPs were identified in ORs, 21 of which are coding SNPs (cSNPs). The mean nucleotide diversity of OR coding regions was 0.078% (ranging from 0 to 0.16%), which is about twice higher than that of other GPCRs, and similar to the nucleotide diversity levels of noncoding regions along the human genome. The high polymorphism level in the OR coding regions might be due to a weak positive selection pressure acting on the OR genes. In two cases, OR genes have been found to share the same cSNP. This could be explained by recent gene conversion events, which might be a part of a concerted evolution mechanism acting on the OR superfamily. Using the genotype data of 85 unrelated individuals in 15 SNPs, we found linkage disequilibrium (LD) between pairs of SNPs located on the centromeric part of the cluster. On the other hand, no LD was found between SNPs located on the telomeric part of the cluster, suggesting the presence of several hot-spots for recombination within this cluster. Thus, different regions of this gene cluster may have been subject to different recombination rates.
The genomic structure of human olfactory receptor genes
Sosinsky A., Glusman G. & Lancet D. (2000) Genomics. 70, 1, p. 49-61 Abstract
The genomic and cDNA structures were studied for eight human olfactory receptor (OR) genes within the chromosome 17p13.3 cluster. A common gene structure was revealed, which included an ~1-kb intronless coding region terminated by a signal for polyadenylation add a variable number of upstream noncoding exons. The latter were found to be alternatively spliced, giving rise to different isoforms of OR mRNA. While the initial exons mostly agreed with previous computer predictions and were conserved within OR subfamilies, other upstream exons were novel and idiosyncratic. In some cases, repetitive sequences were involved in the generation of splice sites and putative transcription control elements. Such gene structure is consistent with early repertoire enhancement by retrogene generation, which was likely followed by extensive genomic duplication. Each OR gene had a unique signature of transcription factor elements, consistent with a combinatorial expression control mechanism. (C) 2000 Academic Press.
Dichotomy of single-nucleotide polymorphism haplotypes in olfactory receptor genes and pseudogenes
Gilad Y., Segre D., Skorecki K., Nachman M., Lancet D. & Sharon D. (2000) Nature Genetics. 26, 2, p. 221-224 Abstract
Substantial efforts are focused on identifying single-nucleotide polymorphisms (SNPs) throughout the human genome, particularly in coding regions (cSNPs), for both linkage disequilibrium and association studies. Less attention, however, has been directed to the clarification of evolutionary processes that are responsible for the variability in nucleotide diversity among different regions of the genome. We report here the population sequence diversity of genomic segments within a 450-kb cluster of olfactory receptor (OR) genes on human chromosome 17. We found a dichotomy in the pattern of nucleotide diversity between OR pseudogenes and introns on the one hand and the closely interspersed intact genes on the other. We suggest that weak positive selection is responsible for the observed patterns of genetic variation. This is inferred from a lower ratio of polymorphism to divergence in genes compared with pseudogenes or introns, high non-synonymous substitution rates in OR genes, and a small but significant overall reduction in variability in the entire OR gene cluster compared with other genomic regions. The dichotomy among functionally different segments within a short genomic distance requires high recombination rates within this OR cluster. Our work demonstrates the impact of weak positive selection on human nucleotide diversity, and has implications for the evolution of the olfactory repertoire.
Identification of the gene causing mucolipidosis type IV
Bargal R., Avidan N., Ben-Asher E. et_al. (2000) Nature Genetics. 26, 1, p. 118-121 Abstract
Mucolipidosis type IV (MLIV) is an autosomal recessive, neurodegenerative, lysosomal storage disorder characterized by psychomotor retardation and ophthalmological abnormalities including corneal opacities, retinal degeneration and strabismus. Most patients reach a maximal developmental level of 12-15 months. The disease was classified as a mucolipidosis following observations by electron microscopy indicating the lysosomal storage of lipids together with water-soluble, granulated substances. Over 80% of the MLIV patients diagnosed are Ashkenazi Jews, including severely affected and mildly affected patients. The gene causing MLIV was previously mapped to human chromosome 19p13.2-13.3 in a region of approximately 1 cM (ref. 7). Haplotype analysis in the MLIV gene region of over 70 MLIV Ashkenazi chromosomes indicated the existence of two founder chromosomes among 95% of the Ashkenazi MLIV families: a major haplotype in 72% and a minor haplotype in 23% of the MLIV chromosomes (ref. 7, and G.B., unpublished data). The remaining 5% are distinct haplotypes found only in single patients. The basic metabolic defect causing the lysosomal storage in MLIV has not yet been identified. Thus, positional cloning was an alternative to identify the MLIV gene. We report here the identification of a new gene in this human chromosomal region in which MLIV-specific mutations were identified.
Compositional genomes: Prebiotic information transfer in mutually catalytic noncovalent assemblies
Segré D., Ben-Eli D. & Lancet D. (2000) Proceedings of the National Academy of Sciences of the United States of America. 97, 8, p. 4112-4117 Abstract
Mutually catalytic sets of simple organic molecules have been suggested to be capable of self-replication and rudimentary chemical evolution. Previous models for the behavior of such sets have analyzed the global properties of short biopolymer ensembles by using graph theory and a mean field approach. In parallel, experimental studies with the autocatalytic formation of amphiphilic assemblies (e.g., lipid vesicles or micelles) demonstrated self-replication properties resembling those of living cells. Combining these approaches, we analyze here the kinetic behavior of small heterogeneous assemblies of spontaneously aggregating molecules, of the type that could form readily under prebiotic conditions. A statistical formalism for mutual rate enhancement is used to numerically simulate the detailed chemical kinetics within such assemblies. We demonstrate that a straightforward set of assumptions about kinetically enhanced recruitment of simple amphiphilic molecules, as well as about the spontaneous growth and splitting of assemblies, results in a complex population behavior. The assemblies manifest a significant degree of homeostasis, resembling the previously predicted quasi-stationary states of biopolymer ensembles (Dyson, F. J. (1982) J. Mol. Evol. 18, 344-350). Such emergent catalysis-driven, compositionally biased entities may be viewed as having rudimentary 'compositional genomes.' Our analysis addresses the question of how mutually catalytic metabolic networks, devoid of sequence-based biopolymers, could exhibit transfer of chemical information and might undergo selection and evolution. This computed behavior may constitute a demonstration of natural selection in populations of molecules without genetic apparatus, suggesting a pathway from random molecular assemblies to a minimal protocell.
Prebiotic evolution of amphiphilic assemblies far from equilibrium: From compositional information to sequence-based biopolymers
Segre D., Ben-Eli D. & Lancet D. (2000) Bioastronomy'99, A New Era In Bioastronomy, Proceedings. 213, p. 373-+ Abstract
The primordial emergence of biopolymers, agents of the genetic machinery in modern cells, is not less enigmatic than the emergence of the genetic code itself. Here we discuss how potential early replicating protocellular systems based on a rudimentary form of inheritance, a "compositional genome", could evolve towards the emergence of "alphabetic" polymers, predating the genetic code. A computer simulated evolutionary process based on our previously proposed kinetic model may help understand the appearance of chemical combinatorics through early natural selection.
Harvesting the human genome: The Israeli perspective
Ben-Asher E., Chalifa-Caspi V., Horn-Saban S. et_al. (2000) Israel Medical Association Journal. 2, 9, p. 657-664 Abstract
GESTALT: A workbench for automatic integration and visualization of large-scale genomic sequence analyses
Glusman G. & Lancet D. (2000) Bioinformatics. 16, 5, p. 482-483 Abstract
The GESTALT Workbench is a WWW-based tool for genomic sequence analysis, comparison and annotation, with strong emphasis on visualization. GESTALT integrates graphically the output of diverse sequence analysis algorithms producing an information-rich, interactive genomic map. Availability: The GESTALT Workbench, as well as a more detailed description, are available at http://bioinfo.weizmann.ac.il/GESTALT/.
Composing life
Segré D. & Lancet D. (2000) EMBO Reports. 1, 3, p. 217-222 Abstract
Textbooks often assert that life began with specialized complex molecules, such as RNA, that are capable of making their own copies. This scenario has serious difficulties, but an alternative has remained elusive. Recent research and computer simulations have suggested that the first steps toward life may not have involved biopolymers. Rather, non-covalent protocellular assemblies, generated by catalyzed recruitment of diverse amphiphilic and hydrophobic compounds, could have constituted the first systems capable of information storage, inheritance and selection. A complex chain of evolutionary events, yet to be deciphered, could then have led to the common ancestors of today's free-living cells, and to the appearance of DNA, RNA and protein enzymes.
Sequence, structure, and evolution of a complete human olfactory receptor gene cluster
Glusman G., Sosinsky A., Ben-Asher E. et_al. (2000) Genomics. 63, 2, p. 227-245 Abstract
The olfactory receptor (OR) gene cluster on human chromosome 17p13.3 was subjected to mixed shotgun automated DNA sequencing. The resulting 412 kb of genomic sequence include 17 OR coding regions, 6 of which are pseudogenes. Six of the coding regions were discovered only upon genomic sequencing, while the others were previously reported as partial sequences. A comparison of DNA sequences in the vicinity of the OR coding regions revealed a common gene structure with an intronless coding region and at least one upstream noncoding exon. Potential gene control regions including specific pyrimidine:purine tracts and Olf-1 sites have been identified. One of the pseudogenes apparently has evolved into a CpG island. Four extensive CpG islands can be discerned within the cluster, not coupled to specific OR genes. The cluster is flanked at its telomeric end by an unidentified open reading frame (C17orf2) with no significant similarity to any known protein. A high proportion of the cluster sequence (about 60%) belongs to various families of interspersed repetitive elements, with a clear predominance of LINE repeats. The OR genes in the cluster belong to two families and seven subfamilies, which show a relatively high degree of intermixing along the cluster, in seemingly random orientations. This genomic organization may be best accounted for by a complex series of evolutionary events. (C) 2000 Academic Press.
The olfactory receptor gene superfamily: Data mining, classification, and nomenclature
Glusman G., Bahar A., Sharon D., Pilpel Y., White J. & Lancet D. (2000) Mammalian Genome. 11, 11, p. 1016-1023 Abstract
The vertebrate olfactory receptor (OR) subgenome harbors the largest known gene family, which has been expanded by the need to provide recognition capacity for millions of potential odorants. We implemented an automated procedure to identify all OR coding regions from published sequences. This led us to the identification of 831 OR coding regions (including pseudogenes) from 24 vertebrate species. The resulting dataset was subjected to neighbor-joining phylogenetic analysis and classified into 32 distinct families, 14 of which include only genes from tetrapodan species (Class II ORs). We also report here the first identification of OR sequences from a marsupial (koala) and a monotreme (platypus). Analysis of these OR sequences suggests that the ancestral mammal had a small OR repertoire, which expanded independently in all three mammalian subclasses. Classification of 'fish-like' (Class I) ORs indicates that some of these ancient ORs were maintained and even expanded in mammals. A nomenclature system for the OR gene superfamily is proposed, based on a divergence evolutionary model. The nomenclature consists of the root symbol 'OR', followed by a family numeral, subfamily letter(s), and a numeral representing the individual gene within the subfamily. For example, OR3A1 is an OR gene of family 3, subfamily A, and OR7E12P is an OR pseudogene of family 7, subfamily E. The symbol is to be preceded by a species indicator. We have assigned the proposed nomenclature symbols for all 330 human OR genes in the database. A WWW tool for automated name assignment is provided.

1999

kPROT: A knowledge-based scale for the propensity of residue orientation in transmembrane segments. Application to membrane protein structure prediction
Pilpel Y., Ben-Tal N. & Lancet D. (1999) Journal of Molecular Biology. 294, 4, p. 921-935 Abstract
Modeling of integral membrane proteins and the prediction of their functional sites requires the identification of transmembrane (TM) segments and the determination of their angular orientations. Hydrophobicity scales predict accurately the location of TM helices, but are less accurate in computing angular disposition. Estimating lipid-exposure propensities of the residues from statistics of solved membrane protein structures has the disadvantage of relying on relatively few proteins. As an alternative, we propose here a scale of knowledge-based Propensities for Residue Orientation in Transmembrane segments (kPROT), derived from the analysis of more than 5000 non-redundant protein sequences. We assume that residues that tend to be exposed to the membrane are more frequent in TM segments of single-span proteins, while residues that prefer to be buried in the transmembrane bundle interior are present mainly in multi-span TMs. The kPROT value for each residue is thus defined as the logarithm of the ratio of its proportions in single and multiple TM spans. The scale is refined further by defining it for three discrete sections of the TM segment; namely, extracellular, central, and intracellular. The capacity of the kPROT scale to predict angular helical orientation was compared to that of alternative methods in a benchmark test, using a diversity of multi-span α-helical transmembrane proteins with a solved 3D structure. kPROT yielded an average angular error of 41°, significantly lower than that of alternative scales (62°-68°). The new scale thus provides a useful general tool for modeling and prediction of functional residues in membrane proteins. A WWW server (http://bioinfo.weizmann.ac.il/kPROT) is available for automatic helix orientation prediction with kPROT.
Primate evolution of an olfactory receptor cluster: Diversification by gene conversion and recent emergence of pseudogenes
Sharon D., Glusman G., Pilpel Y., Khen M., Gruetzner F., Haaf T. & Lancet D. (1999) Genomics. 61, 1, p. 24-36 Abstract
The olfactory receptor (OR) subgenome harbors the largest known gene family in mammals, disposed in clusters on numerous chromosomes. We have carried out a comparative evolutionary analysis of the best characterized genomic OR gene cluster, on human chromosome 17p13. Fifteen orthologs from chimpanzee (localized to chromosome 19p15), as well as key OR counterparts from other primates, have been identified and sequenced. Comparison among orthologs and paralogs revealed a multiplicity of gene conversion events, which occurred exclusively within OR subfamilies. These appear to lead to segment shuffling in the odorant binding site, an evolutionary process reminiscent of somatic combinatorial diversification in the immune system. We also demonstrate that the functional mammalian OR repertoire has undergone a rapid decline in the past 10 million years: while for the common ancestor of all great apes an intact OR cluster is inferred, in present-day humans and great apes the cluster includes nearly 40% pseudogenes.
A statistical chemistry approach to the origin of life
Segré D. & Lancet D. (1999) Chemtracts. 12, 6, p. 382-397 Abstract
We revisit some theoretical models dealing with the chemical emergence of lifelike properties in prebiotic systems. Special emphasis is given to models involving random assemblies of mutually catalytic organic molecules, as opposed to scenarios in which individual molecular species are endowed with the capacity of self-replication. We highlight here the challenge of tracing the very first steps of biogenesis, when self-replication, mutation, selection, and evolution may have been hardly recognizable. The models we discuss share the assumption that a large repertoire of relatively simple organic compounds could spontaneously form prebiotically, and the notion that a statistical approach, independent of detailed molecular properties, can uncover some general principles underlying biogenic processes. Fundamental models, put forward by Dyson and Kauffman, describe very early scenarios, whose statistical nature is reflected in the possibility of characterizing many random, mutually catalytic interactions with relatively few parameters. Further theoretical considerations indicate that mutually catalytic assemblies might also entail a primitive information transfer system, exclusively based on idiosyncratic chemical compositions, a situation described here as the inheritance of a 'compositional genome.' Amphiphilic molecules, due to their peculiar attributes, are suggested to potentially embody many of the properties necessary for these systems to emerge spontaneously, hinting to the possibility of an exclusively lipid-based origin of life. We stress that modem trends in molecular complementarity, combinatorial chemistry, and enzyme mimetics represent a source of conceptual and experimental information that can help extend previous models. This is exemplified here by the Graded Autocatalysis Replication Domain (GARD) model we developed, based on a statistical distribution of catalytic activities. A further extension of this model, the Amphiphile-GARD, aims at a more realistic and testable theoretical description of some scenarios for early prebiotic evolution.Errata available
Olfaction: Good reception in fruitfly antennae
Pilpel Y. & Lancet D. (1999) Nature. 398, 6725, p. 285-287 Abstract
Molecular biology of olfactory receptors
Pilpel Y., Sosinsky A. & Lancet D. (1999) Molecular Biology of the Brain. p. 93-104 Abstract
In order to elicit an olfactory response, a substance has to partition into the gas phase and diffuse into the nose. Such odorant molecules, usually low molecular-mass hydrophobic compounds, encounter the ciliated endings of sensory neuronal dendrites, which protrude into a mucus layer at the surface of the olfactory epithelium in the nasal cavity. Embedded in the membranes of such cilia are olfactory receptor (OR) proteins, which recognize odorants and elicit a transduction cascade that underlies the nerve cell response. The sensory axons project to the olfactory bulb in the brain, where they converge into synaptic structures called glomeruli. The specific convergence patterns of olfactory axons, which depend on OR expression, provide a model system for neuronal network development. Here, initial processing of odour information occurs, which is followed by additional analysis in higher olfactory brain centres.
The variable and conserved interfaces of modeled olfactory receptor proteins
Pilpel Y. & Lancet D. (1999) Protein Science. 8, 5, p. 969-977 Abstract
The accumulation of hundreds of olfactory receptor (OR) sequences, along with the recent availability of detailed models of other G-protein-coupled receptors, allows us to analyze the OR amino acid variability patterns in a structural context. A Fourier analysis of 197 multiply aligned olfactory receptor sequences showed an α-helical periodicity in the variability profile. This was particularly pronounced in the more variable transmembranal segments 3, 4, and 5. Rhodopsin-based homology modeling demonstrated that the inferred variable helical faces largely point to the interior of the receptor barrel. We propose that a set of 17 hypervariable residues, which point to the barrel interior and are more extracellularly disposed, constitute the odorant complementarity determining regions. While 12 of these residues coincide with established ligand-binding contact postions in other G-protein- coupled receptors, the rest are suggested to form an olfactory-unique aspect of the binding pocket. Highly conserved olfactory receptor-specific sequence motifs, found in the second and third intracellular loops, may comprise the G-protein recognition epitope. The prediction of olfactory receptor functional sites provides concrete suggestions of site-directed mutagenesis experiments for altering ligand and G-protein specificity.
GARDobes: Primordial cell nano-precursors with organic catalysis, compositional genome and capacity to evolve
Segre D., Ben-Eli D., Pilpel Y., Kedem O. & Lancet D. (1999) Instruments, Methods, And Missions For Astrobiology Ii. 3755, p. 144-162 Abstract
The Graded Autocatalysis Replication Domain (GARD) model described here depicts an early primordial scenario, prior to the emergence of biopolymers, such as RNA or proteins. The model describes, with the help of statistical chemistry computer simulations, a collection of organic molecular species capable of rudimentary selection and evolution. The GARD model provides a rigorous kinetic analysis of simple sets of chemicals that manifest mutual catalysis. It is shown that catalytic closure can sustain self replication up to a critical dilution rate, related to the extent of mutual catalysis. The capacity for self replication in a mutually catalytic set is shown to be a graded property, quantitated by a critical parameter λ_ci. GARD could be a simple model for a primordial scenario, in which replication and catalysis are performed by the same set of molecules. GARDobes are proposed to be entities that embody a GARD system, endowed with a non-DNA `compositional genome', and are presumed to have replicated slowly and imperfectly through mutually catalytic networks. Therefore, they are not bound by the standard cellular size constraints: GARDobes may be as small as a few nanometers, with 20-50 nanometers being rather large and elaborate. Active GARDobes, if ever found on earth or on other planets, would be distinguished by a highly biased organic chemistry, i.e. having only a small subset of the possible molecules of any given class. Their fossils might still bear the hallmarks of such a bias, with narrow spectra of molecules such as Polycyclic Aromatic Hydrocarbons or even with enantiomeric excesses.

1998

Organization and evolution of olfactory receptor genes on human chromosome 11
Buettner J. A., Glusman G., Ben-Arie N., Ramos P., Lancet D. & Evans G. A. (1998) Genomics. 53, 1, p. 56-68 Abstract
Olfactory receptors (OR) are encoded by a large multigene family including hundreds of members dispersed throughout the human genome. Cloning and mapping studies have determined that a large proportion of the olfactory receptor genes are located on human chromosomes 6, 11, and 17, as well as distributed on other chromosomes. In this paper, we describe and characterize the organization of olfactory receptor genes on human chromosome 11 by using degenerate PCR-based probes to screen chromosome 11-specific and whole genome clone libraries for members of the OR gene family. OR genes were identified by DNA sequencing and then localized to regions of chromosome 11. Physical maps of several gene clusters were constructed to determine the chromosomal relationships between various members of the family. This work identified 25 new OR genes located on chromosome 11 in at least seven distinct regions. Three of these regions contain gene clusters that include additional members of this gene family not yet identified by sequencing. Phylogenetic analysis of the newly described OR genes suggests a mechanism for the generation of genetic diversity.
GeneCards: A novel functional genomics compendium with automated data mining and query reformulation support
Rebhan M., Chalifa-Caspi V., Prilusky J. & Lancet D. (1998) Bioinformatics. 14, 8, p. 656-664 Abstract
Motivation: Modern biology is shifting from the 'one gene one postdoc' approach to genomic analyses that include the simultaneous monitoring of thousands of genes. The importance of efficient access to concise and integrated biomedical information to support data analysis and decision making is therefore increasing rapidly, in both academic and industrial research. However knowledge discovery in the widely scattered resources relevant for biomedical research is often a cumbersome and non-trivial task, one that requires a significant amount of training and effort. Results: To develop a model for a new type of topic-specific overview resource that provides efficient access to distributed information we designed a database called 'GeneCards'. It is a freely accessible Web resource that offers one hypertext (card) for each of the more than 7000 human genes that currently have an approved gene symbol published by the HUGO/GDB nomenclature committee. The presented information aims at giving immediate insight into current knowledge about the respective gene including a focus on its functions in health and disease. It is compiled by Perl scripts that automatically extract relevant information from several databases including SWISS-PROT, OMIM, Genatlas and GDB. Analyses of the interactions of users with the Web interface of GeneCards triggered development of easy-to-scan displays optimized for human browsing. Also, we developed algorithms that offer 'ready-to-click' query reformulation support to facilitate information retrieval and exploration. Many of the long-term users turn to GeneCards to quickly access information about the function of very large sets of genes, for example in the realm of large-scale expression studies using 'DNA chip' technology or two-dimensional protein ebectrophoresis. Availability: Freely available at http://bioinformatics.weizmann.ac.il/cards/ Contact: cards@@@bioinformatics.weizmann.ac.il.
Graded autocatalysis replication domain (GARD): kinetic analysis of self-replication in mutually catalytic sets
Segré D., Lancet D., Kedem O. & Pilpel Y. (1998) Origins of Life and Evolution of the Biosphere. 28, 4-6, p. 501-514 Abstract
A Graded Autocatalysis Replication Domain (GARD) model is proposed, which provides a rigorous kinetic analysis of simple chemical sets that manifest mutual catalysis. It is shown that catalytic closure can sustain self-replication up to a critical dilution rate, λ_c, related to the graded extent of mutual catalysis. We explore the behaviour of vesicles containing GARD species whose mutual catalysis is governed by a previously published statistical distribution. In the population thus generated, some GARD vesicles display a significantly higher replication efficiency than most others. GARD thus represents a simple model for primordial chemical selection of mutually catalytic sets.
Mutual catalysis in sets of prebiotic organic molecules: Evolution through computer simulated chemical kinetics
Segré D., Pilpel Y. & Lancet D. (1998) Physica A. 249, 1-4, p. 558-564 Abstract
A thorough outlook on the origin of life needs to delineate a chemically rigorous, self-consistent path from highly heterogeneous, random ensembles of relatively simple organic molecules, to an entity that has rudimentary life-like characteristics. Such entity should be endowed with a capacity to express variation, undergo mutation-like changes and manifest a simple evolutionary process. For simulating such system we developed the Graded Autocatalysis Replication Domain (GARD) model for explicit kinetic analysis of mutual catalysis in sets of random oligomers derived from energized precursor monomers. The kinetic properties of the GARD model are based on vesicle enclosure and expansion. With the additional assumption of spontaneous vesicle splitting, a GARD evolution scenario is envisaged as a consequence of pure chemical kinetics. Here we show how the GARD model can serve as a platform for investigating the dynamics of self-organization mechanisms in molecular evolutionary processes.
Genome dynamics, evolution, and protein modeling in the olfactory receptor gene superfamily
Sharon D., Glusman G., Pilpel Y., Horn-Saban S. & Lancet D. (1998) Olfaction And Taste Xii: An International Symposium. 855, p. 182-193 Abstract
The human olfactory subgenome represents several hundred olfactory receptor (OR) genes in a dozen or more clusters on several chromosomes. One OR gene cluster on human chromosome 17 has been characterized by us in detail. Based on a large-scale DNA sequence analysis, we have identified events of gene duplication and fusion as well as the generation of pseudogenes. The latter instances of 'gene death' could underlie the widespread phenomenon of human specific anosmias. Sixteen OR coding regions were found on this cluster, and six of them are pseudogenes. One of these pseudogenes, OR17-23, was found to be an intact open reading frame in an old world monkey. This may be a reflection of an OR repertoire diminution in man. A homology model of the OR protein was constructed by utilizing the rich information available on ~ 200 OR sequences. The putative odorant complementarity determining regions (CDR) was found to consist of 20 hypervariable residues facing an interior caving defined by transmembrane helices 3, 4 and 5. Such a model could be useful in analyzing additional OR gene sequences in the human genome in terms of odorant binding.
Mutually catalytic amphiphiles: Simulated chemical evolution and implications to exobiology
Segre D. & Lancet D. (1998) Exobiology: Matter, Energy, And Information In The Origin And Evolution Of Life In The Universe. p. 123-131 Abstract
A description of the emergence of life should delineate a chemically rigorous gradual transition from random collections of simple organic molecules to spatially confined assemblies displaying rudimentary self-reproduction capacity. It has been suggested that large sets of mutually catalytic molecules, and not self-replicating information-carrying biopolymers, could have been the precursors of life. We present here a stochastic model in which the mutually catalytic molecules are spontaneously aggregating amphiphiles. When such amphiphiles exert on each other random catalytic effects, biased molecular compositions emerge, that are endowed with replication-like properties. This approach may have important consequences to the understanding of very early chemical evolution. It could also guide a search for extraterrestrial forms of very primitive life.

1997

Guidelines for human gene nomenclature (1997)
White J., McAlpine P., Antonarakis S. et_al. (1997) Genomics. 45, 2, p. 468-471 Abstract
GeneCards: Integrating information about genes, proteins and diseases
Rebhan M., ChalifaCaspi V., Prilusky J. & Lancet D. (1997) Trends in Genetics. 13, 4, p. 163 Abstract
The UDP glycosyltransferase gene superfamily: Rcommended nomenclature update based on evolutionary divergence
Mackenzie P., Owens I., Burchell B. et_al. (1997) Pharmacogenetics. 7, 4, p. 255-269 Abstract
This review represents an update of the nomenclature system for the UDP glucuronosyltransferase gene superfamily, which is based on divergent evolution. Since the previous review in 1991, sequences of many related UDP glycosyltransferases from lower organisms have appeared in the database, which expand our database considerably. At latest count, in animals, yeast, plants and bacteria there are 110 distinct cDNAs/genes whose protein products all contain a characteristic 'signature sequence' and, thus, are regarded as members of the same superfamily. Comparison of a relatedness tree of proteins leads to the definition of 33 families, it should be emphasized that at least six cloned UDP-GlcNAc N-acetylglucosaminyltransferases are not sufficiently homologous to be included as members of this superfamily and may represent an example of convergent evolution. For naming each gene, it is recommended that the root symbol UGT for human (Ugt for mouse and Drosophila), denoting 'UDP glycosyltransferase,' be followed by an Arabic number representing the family a letter designating the subfamily, and an Arabic numeral denoting the individual gene within the family or subfamily, e.g, 'human UGT2B4' and mouse Ugt2b5'. We recommend the name 'UDP glycosyltransferase' because many of the proteins do not preferentially tially use UDP glucuronic acid, or their nucleotide sugar preference is unknown. Whereas the gene is italicized, the corresponding cDNA, transcript, protein and enzyme activity should be written with upper-case letters and without italics, e.g. 'human or mouse UGT1A1. 'The UGT1 gene (spanning > 500 kb) contains at least 12 promoters/first exons, which can be spliced and joined with common exons 2 through 5, leading to different N-terminal halves but identical C-terminal halves of the gene products; in this scheme each first exon is regarded as a distinct gene (e.g. UGT1A1, UGT1A2,... UGT1A12). When an orthologous gene between species cannot be identified with certainty, as occurs in the UGT2B subfamily, sequential naming of the genes is being carried out chronologically as they become characterized. We suggest that the Human Gene Nomenclature Guidelines (http://www.gene.acl.ac.uk/nomenclature/guidelines.html) be used for all species other than the mouse and Drosophila. Thirty published human UGT1A1 mutant alleles responsible for clinical hyperbilirubinemias are listed herein, and given numbers following an asterisk (e.g. UGT1A1*30) consistent with the Human Gene Nomenclature Guidelines. It is anticipated that this UGT gene nomenclature system will require updating on a regular basis.

1996

Sequence analysis in the olfactory receptor gene cluster on human chromosome 17: Recombinatorial events affecting receptor diversity
Glusman G., Clifton S., Roe B. & Lancet D. (1996) Genomics. 37, 2, p. 147-160 Abstract
A cosmid clone covering a region of high olfactory receptor (OR) gene density inside the OR gene cluster on human chromosome 17 (17p13.3) was subjected to shotgun automated DNA sequencing. The resulting 40-kb sequence revealed three known OR coding regions, as well as a new OR pseudogene (OR17- 25), fused to one of the previously identified OR genes (OR17-24). The suggested mechanism for the generation of this doublet structure involves an initial duplication mediated by flanking repeats and a subsequent deletion via nonhomologous recombination. Sequence analysis further suggests that the two other OR genes present in the cosmid (OR17-40 and OR17-228) may have evolved by ancient tandem duplication of an 11-kb fragment, mediated by recombination between mammalian-wide interspersed repeats. The duplicated genes appear to be complete and potentially functional. Their conserved structure reveals a long upstream intron and a previously uncharacterized 5' noncoding exon. No additional genes could be discerned in the cosmid, suggesting that the cluster may be part of a dedicated OR subgenome.
Overexpression, solubilization and purification of rat and human olfactory receptors
Nekrasova E., Sosinskaya A., Natochin M., Lancet D. & Gat U. (1996) European Journal of Biochemistry. 238, 1, p. 28-37 Abstract
The superfamily of olfactory receptor genes, whose products are thought to be activated by odorant ligands, is critical for odor recognition. Two olfactory receptors, olp4 from rat and OR17-4 From human, were overexpressed in Sf9 insect cells. The presence of the proteins in cell membranes was monitored by immunoblotting with peptide-specific polyclonal antibodies directed against the C-terminal sequences of these receptors and with a mAb against an N-terminal octapeptide epitope tag. A DNA sequence that codes for a His₆ tag, which binds tightly to a Ni²⁺-chelate-affinity column, was incorporated into the N-termini of both genes. The expressed olfactory receptors were found mainly in the cell-membrane fraction. The proteins were difficult to solubilize by many detergents and only lysophosphatidylcholine was found to be both suitable for efficient solubilization of the overexpressed olfactory receptors anti compatible with the purification system used. After solubilization, the olfactory receptors were purified to near homogeneity by affinity chromatography on nickel nitrilotriacetic acid resin and by cation-exchange chromatography. Electrophoresis of the purified proteins and visualization with Coomassie Blue staining or by immunoblotting with specific antibodies, revealed bands of 32, 69 and 94 kDa, which were identified as the monomeric, dimeric and trimeric forms of the receptor proteins. The oligomeric forms were resistant to reduction and alkylation, and are therefore thought to be held together by non-covalent hydrophobic interactions that are resistant to SDS. This finding is similar to previous observations for other guanine-nucleotide-binding-regulatory-protein-coupled receptors. Reconstitution in phospholipid vesicles showed that the purified olfactory receptors insert specifically into the lipid bilayer. This provides a means to study functional reconstitution with putative transduction components such as olfactory guanine-nucleotide-binding-regulatory protein.
Positive selection moments identify potential functional residues in human olfactory receptors
Singer M., WeisingerLewin Y., Lancet D. & Shepherd G. (1996) Receptors and Channels. 4, 3, p. 141-147 Abstract
Correlated mutation analysis and molecular models of olfactory receptors have provided evidence that residues in the transmembrane domains form a binding pocket for odor ligands. As an independent test of these results, we have calculated positive selection moments for the alpha-helical sixth transmembrane domain (TM6) of human olfactory receptors. The moments can be used to identify residues that have been preferentially affected by positive selection and are thus likely to interact with odor ligands. The results suggest that residue 622, which is commonly a serine or threonine, could form critical H-bonds. In some receptors a dual-serine subsite, formed by residues 622 and 625, could bind hydroxyl determinants on odor ligands. The potential importance of these residues is further supported by site-directed mutagenesis in the beta-adrenergic receptor. The findings should be of practical value for future physiological studies, binding assays, and site-directed mutagenesis.

1994

Olfactory Receptor Proteins: Expression, Characterization and Partial Purification
GAT U., NEKRASOVA E., Lancet D. & NATOCHIN M. (1994) European Journal of Biochemistry. 225, 3, p. 1157-1168 Abstract
A rat olfactory epithelium cDNA library was screened for olfactory receptor clones. One of the positively hybridizing cDNA clones was sequenced and found to encode a new member of the olfactory receptor superfamily. This cDNA, termed olp4, was used as a model of olfactory receptor for expression, both in vitro and in vivo. Expression of olp4, as well as of another previously cloned olfactory receptor (F5), was monitored by immunoprecipitation with a monoclonal antibody directed against a Flag peptide epitope tag, inserted at the Nterminus of the open reading frame, and a specific polyclonal antibody against a Cterminal peptide of olp4. Translation in vitro, followed by immunoprecipitation, showed a major olp4specific band of 2729 kDa. The olp4 and F5 polypeptides were found to be inserted into microsomal membranes as expected for integral membrane proteins. Expression in vivo of Flagolp4 in Sf9 insect cells, using the baculovirus expression system, showed a specific polypeptide of the same size as the in vitro species, with an additional band of 34 kDa, which is most likely a glycosylated form. Fluorescence cytometry and immunohistochemical assays demonstrated the localization of the Flagolp4 product on the cell surface of the infected host Sf9 cells, with the Nterminus and Cterminus in the proper orientation. Affinity chromatography was used for the partial purification of the olp4 polypeptide from infected Sf9 cells. The identification and purification of this expressed olfactory receptor polypeptide could open the way for further characterization and functional studies of the olfactory receptor superfamily members.
Olfactory receptor gene cluster on human chromosome 17: Possible duplication of an ancestral receptor repertoire
BENARIE N., Lancet D., TAYLOR C. et_al. (1994) Human Molecular Genetics. 3, 2, p. 229-235 Abstract
A gene superfamily of olfactory receptors (ORs) has recently been identified in a number of species. These receptors share a seven transmembrane domain structure with many neurotransmitter and hormone receptors, and are likely to underlie the recognition and G-protein-mediated transduction of odorant signals. Previously, OR genes cloned in different species were from random locations in the respective genomes. We report here the cloning of 16 human OR genes, all from chromosome 17(17p 13.3). The intronless coding regions are physically mapped (on 35 cosmids) in one 0.35Mb long contiguous cluster, with an average intergenic separation of 15kb. The human OR genes in the cluster belong to four different gene subfamilies, displaying as much sequence variability as any randomly selected group of ORs. This suggests that the cluster identified may be one of several copies of an ancestral OR gene repertoire whose existence may predate the divergence of mammals. The latter may have duplicated In some specles to form the present mammalian OR gene repertoire, with several hundred genes. The human chromosome 17 OR gene cluster may thus be a good model for understanding human olfeaction, as well as the ontogeny and phylogeny of the OR gene superfamlly.
Exclusive receptors
Lancet D. (1994) Nature. 372, 6504, p. 321-322 Abstract
Emergence of order in small autocatalytic sets maintained far from equilibrium: application of a probabilistic receptor affinity distribution (RDA) model
Lancet D., Kedem O. & Pilpel Y. (1994) Berichte der Bunsengesellschaft/Physical Chemistry Chemical Physics. 98, 9, p. 1166-1169 Abstract
We examined the behavior of auto-catalytic sets of polymers by a computer simulation. Polymers are allowed to interact with each other, whereby each polymer molecule may catalyze the formation and degradation of others. The system is subjected to a set of thermodynamic and kinetic constraints, including a constant influx of free energy, which keeps the system away from chemical equilibrium and thus enables the effect of catalysis. The system is found to continuously change and probe many possible values in the composition space. In this simulation we make use of a Receptor Affinity Distribution (RAD) model to predict the probabilities of interaction and catalysis. Our results indicate that initially random sets of polymers, under the assumptions of the model, might accumulate information (i.e., clustering in the composition space). Sets will occupy a limited region of composition space, and temporarily reproduce themselves or disperse and give rise to other sets.

1993

Olfactory receptors
Lancet D. & BENARIE N. (1993) Current Biology. 3, 10, p. 668-674 Abstract
Expression of olfactory receptor and transduction genes during rat development
Margarit T. & Lancet D. (1993) Developmental Brain Research. 73, 1, p. 7-16 Abstract
The molecular components of olfactory reception and regulation are expressed in a tissue-specific manner. The functional attributes mediated by some of these proteins have been previously shown to display a well-defined developmental emergence during the last week of rat gestation. To gain a better understanding of the relations between chemosensory function and neuronal development, we studied the ontogeny of 7 olfactory-specific genes by quantitative PCR. Relative levels of expression during rat development were determined for each gene, starting at embryonic day 15 (E15) and ending at postnatal day 35 (P35). In addition, the level of expression of the different genes was quantified in juvenile rats. The onset of expression for olfactory receptors and the olfactory cation channel at embryonic day 19 (E19) coincides with the functional maturation of the sensory neurons. Olfactory G-protein and adenylyl cyclase are expressed earlier (∼ E16) while olfactory biotransformation enzymes appear later (E20-E21), just before birth. The sequence of developmental expression of olfactory receptor genes has possible implications to the establishment of neuronal connectivity in this sensory pathway.
Probability model for molecular recognition in biological receptor repertoires: Significance to the olfactory system
Lancet D., Sadovsky E. & Seidemann E. (1993) Proceedings of the National Academy of Sciences of the United States of America. 90, 8, p. 3715-3719 Abstract
A generalized phenomenological model is presented for stereospecific recognition between biological receptors and their ligands. We ask what is the distribution of binding constants ψ(K) between an arbitrary ligand and members of a large receptor repertoire, such as immunoglobulins or olfactory receptors. For binding surfaces with B potential subsite and S different types of subsite configurations, the number of successful elementary interactions obeys a binomial distribution. The discrete probability function ψ(K) is then derived with assumptions on α, the free energy contribution per elementary interaction. The functional form of ψ(K) may be universal, although the parameter values could vary for different ligand types. An estimate of the parameter values of ψ(K) for iodovanillin, an analog of odorants and immunological haptens, is obtained by equilibrium dialysis experiments with nonimmune antibodies. Based on a simple relationship, predicted by the model, between the size of a receptor repertoire and its average maximal affinity toward an arbitrary ligand, the size of the olfactory receptor repertoire (N_olf) is calculated as 300-1000, in very good agreement with recent molecular biological studies. A very similar estimate, N_olf = 500, is independently derived by relating a theoretical distribution of maxima for ψ(K) with published human olfactory threshold variations. The present model also has implications to the question of olfactory coding and to the analysis of specific anosmias, genetic deficits in perceiving particular odorants. More generally, the proposed model provides a better understanding of ligand specificity in biological receptors and could help in understanding their evolution.
Olfaction: From signal transduction and termination to human genome mapping
Lancet D., GROSSISSEROFF R., MARGALIT T., SEIDEMANN E. & BENARIE N. (1993) Chemical Senses. 18, 2, p. 217-225 Abstract
Keywords: MOLECULAR-BASIS; RECOGNITION; NEURONS; GENES; CDNA
Glutathione S-transferases in rat olfactory epithelium: Purification, molecular properties and odorant biotransformation
BENARIE N., Khen M. & Lancet D. (1993) Biochemical Journal. 292, 2, p. 379-384 Abstract
The olfactory epithelium is exposed to a variety of xenobiotic chemicals, including odorants and airborne toxic compounds. Recently, two novel, highly abundant, olfactory-specific biotransformation enzymes have been identified: cytochrome P-450olf1 and olfactory UDP-glucuronosyltransferase (UGT(olf)). The latter is a phase II biotransformation enzyme which catalyses the glucuronidation of alcohols, thiols, amines and carboxylic acids. Such covalent modification, which markedly affects lipid solubility and agonist potency, may be particularly important in the rapid termination of odorant signals. We report here the identification and characterization of a second olfactory phase II biotransformation enzyme, a glutathione S-transferase (GST). The olfactory epithelial cytosol shows the highest GST activity among the extrahepatic tissues examined. Significantly, olfactory epithelium had an activity 4-7 times higher than in other airway tissues, suggesting a role for this enzyme in chemoreception. The olfactory GST has been affinity-purified to homogeneity, and shown by h.p.l.c. and N-terminal amino acid sequencing to constitute mainly the Yb₁ and Yb₂ subunits, different from most other tissues that have mixtures of more enzyme classes. The identity of the olfactory enzymes was confirmed by PCR cloning and restriction enzyme analysis. Most importantly, the olfactory GSTs were found to catalyse glutathione conjugation of several odorant classes, including many unsaturated aldehydes and ketones, as well as epoxides. Together with UGT(olf), olfactory GST provides the necessary broad coverage of covalent modification capacity, which may be crucial for the acuity of the olfactory process.
Olfactory Receptors: Transduction, Diversity, Human Psychophysics and Genome Analysis
Lancet D., Ben-Arie N., Cohen S. et_al. (1993) Molecular Basis Of Smell And Taste Transduction. p. 131-146 Abstract
The emerging understanding of the molecular basis of olfactory mechanisms allows one to answer some long-standing questions regarding the complex recognition machinery involved. The ability of the olfactory system to detect chemicals at sub-nanomolar concentrations is explained by a plethora of amplification devices, including the coupling of receptors to second messenger generation through GTP-binding proteins. Specificity and selectivity may be understood in terms of a diverse repertoire of olfactory receptors of the seven-transmembrane-domain receptor superfamily, which are probably disposed on olfactory sensory neurons according to a clonal exclusion rule. Signal termination may be related to sets of biotransformation enzymes that process odorant molecules, as well as to receptor desensitization. Many of the underlying molecular components show specific expression in olfactory epithelium, with a well-orchestrated developmental sequence of emergence, possibly related to sensory neuronal function and connectivity requirements. A general model for molecular recognition in biological receptor repertoires allows a prediction of the number of olfactory receptors necessary to achieve efficient detection and sheds light on the analogy between the immune and olfactory systems. The molecular cloning and mapping of a human genomic olfactory receptor cluster on chromosome 17 provides insight into olfactory receptor diversity, polymorphism and evolution. Combined with future genotype-phenotype correlation, with particular reference to specific anosmia, as well as with computer-based molecular modelling, these studies may provide insight into the odorant specificity of olfactory receptors.

1992

Evidence for genetic determination in human twins of olfactory thresholds for a standard odorant
GROSSISSEROFF R., OPHIR D., BARTANA A., VOET H. & Lancet D. (1992) Neuroscience Letters. 141, 1, p. 115-118 Abstract
Olfactory thresholds for four odorants were determined in groups of monozygotic and dizygotic human twins. Odorants were presented in an ascending dilution series in odorless solvent, using a three-way forced choice method. For two of the tested odorants, 5α-androst-16-en-3-one and isoamyl acetate, the thresholds showed a strong genetic component. This was demonstrated by respective values of 0.78 and 0.73 for the intraclass correlation difference, and of z = 3.69 and z = 2.71 in a within-pair difference analysis. The results for isoamyl acetate are novel, and suggest that genetic polymorphism in the affinity of odorant receptor proteins contributes to the (nearly normal) threshold distribution for this odorant.
Biotransformation Enzymes in Olfactory Signal Termination
Lancet D., Lazard D., Zupko K., Poria Y., Khen M., Margalit T. & Ben-Arie N. (1992) Journal of Basic and Clinical Physiology and Pharmacology. 3, Supplement, p. 86-87 Abstract
Olfactory reception: from transduction to human genetics.
Lancet D. (1992) Society of General Physiologists Series. 47, p. 73-91 Abstract

1991

The UDP Glucuronosyltransferase Gene Super family: Suggested Nomenclature Based on Evolutionary Divergence
BURCHELL B., NEBERT D., NELSON D. et_al. (1991) DNA and Cell Biology. 10, 7, p. 487-494 Abstract
A nomenclature system for the UDP glucuronosyltransferase superfamily is proposed, based on divergent evolution of the genes. A total of 26 distinct cDNAs in five mammalian species have been sequenced to date. Comparison of the deduced amino acid sequences leads to the definition of two families and a total of three subfamilies. For naming each gene, we propose that the root symbol UGT for human (Ugt for mouse), representing \u201cUDP glucuronosyltransferase, be followed by an Arabic number denoting the family, a letter designating the subfamily, and an Arabic numeral representing the individual gene within the family or sub-family (hyphen before the Arabic number for mouse), e.g., human UGT2B1 and murine Ugt2b-1. Whereas the gene and cDNA should be italicized, the corresponding transcript, protein, and enzyme activity should not be written with lowercase letters or in italics, e.g., human or murine UGT2B1. Recent experimental evidence suggests that several exons of the UGT1 gene might be shared, indicating that distinct UGT1 transcripts and proteins may arise via alternative splicing; the gene and gene product of alternative splicing will be designated with an asterisk, e.g., UGT1*6 and UGT1*6, respectively. When an orthologous gene between species cannot be identified with certainty, as occurs in the UGT2B subfamily, we recommend sequential naming of the genes chronologically as they become characterized. We suggest that the human nomenclature system be used for species other than the mouse. We anticipate that this UGT gene nomenclature system will require updating on a regular basis.
The strong scent of success
Lancet D. (1991) Nature. 351, p. 275-276 Abstract
Immunolocalization of cytochromes P450olf1 and P450olf2 in rat olfactory mucosa
ZUPKO K., PORIA Y. & LANCET D. (1991) European Journal of Biochemistry. 196, 1, p. 51-58 Abstract
Previously, we described two olfactoryspecific cytochromes P450: rat cytochrome P450olf1 (IIG1), identified by cDNA cloning, and bovine cytochrome P450olf2 (IIA), identified by peptide microsequencing of a transmembranal polypeptide (p52). Here we describe the preparation of polyclonal antisera against peptide sequences of these proteins and their use in the immunolocalization of cytochromes P450olf1 and P450olf2 in rat olfactory mucosa. Immunoreactivities related to both enzymes are found in the subepithelial Bowman's glands of olfactory mucosa. Practically no immunoreactivity was found in other rat tissues, including liver, lung, kidney and respiratory mucosa. In addition, doublelabeling experiments demonstrated that cytochromes P450olf1 and P450olf2 are present in the same population of Bowman's glands. The olfactoryspecific localization of cytochromes P450olf1 and P450olf2 is consistent with a role for these enzymes in the modification or clearance of odorants from the chemosensory tissue.
Most of the senses begin to make some sense
Lancet D. (1991) Nature. 353, 6347, p. 799-800 Abstract
Olfaction: Molecules to network
Lancet D. (1991) Nature. 351, 6324, p. 275 Abstract
Sweet Taste Transduction: A Molecular-Biological Analysis
Lancet D. & Ben-Arie N. (1991) Sweeteners. Vol. 450. p. 226-236 (trueACS Symposium Series). Abstract
While the chemistry of sweet tasting compounds has been extensively studied (1-5), precious little has been known until recently on the cellular mechanisms of sweet taste transduction. Work in the authors' laboratory, as well as in several others, has begun to shed light on this problem. Specifically, evidence has accumulated in the last three years, suggesting that sweet taste receptor proteins (as yet unidentified) activate a membrane transduction cascade. This molecular chain of events appears to be very similar to that which is associated with receptors for hormones and neurotransmitters, as well as visual photoreceptors and olfactory receptors (6-8). The proposed transduction cascade includes (see Figure 1):(1) A transmembrane protein receptor that binds sweet compounds stereospecifically and subsequently undergoes a conformational transition.(2) A membrane amplifier GTP-binding protein (G-protein) of the stimulatory type (Gs).(3) The membrane enzyme adenylyl cyclase, that produces an intracellular second messenger cyclic AMP (cAMP).
Odorant signal termination by olfactory UDP glucuronosyl transferase
LAZARD D., ZUPKO K., PORIA Y., NEF P., LAZAROVITS J., HORN S., Khen M. & Lancet D. (1991) Nature. 349, 6312, p. 790-793 Abstract
THE onset of olfactory transduction has been extensively studied^1-7, but considerably less is known about the molecular basis of olfactory signal termination^6,8,9. It has been suggested that the highly active cytochrome P₄₅₀ monooxygenases of olfactory neuroepithelium^10-12 are termination enzymes^5,8,11,12, a notion supported by the identification and molecular cloning of olfactory-specific cytochrome P₄₅₀s (refs. 13-16). But as reactions catalysed by cytochrome P₄₅₀ (refs 17, 18) often do not significantly alter volatility, lipophilicity or odour properties^9,11, cytochrome P₄₅₀ may not be solely responsible for olfactory signal termination. In liver and other tissues, drug hydroxylation by cytochrome P₄₅₀ is frequently followed by phase II biotransformation, for example by UDP glucuronosyl transferase (UGT), resulting in a major change of solubility and chemical properties¹⁹. We report here the molecular cloning and expression of an olfactory-specific UGT. The olfactory enzyme, but not the one in liver microsomes, shows preference for odorants over standard UGT substrates. Furthermore, glucuronic acid conjugation abolishes the ability of odorants^1,20 to stimulate olfactory adenylyl cyclase. This, together with the known broad spectrum of drug-detoxification enzymes^17,19, supports a role for olfactory UGT in terminating diverse odorant signals.

1990

The sweet taste inhibitor methyl 4,6-dichloro-4,6-dideoxy-α-d-galactopyranoside inhibits sucrose stimulation of the chorda tympani nerve and of the adenylate cyclase in anterior lingual membranes of rats
STRIEM B., YAMAMOTO T., NAIM M., Lancet D., JAKINOVICH W. & ZEHAVI U. (1990) Chemical Senses. 15, 5, p. 529-536 Abstract
The effects of the sweet taste inhibitor methyl 4,6-dichloro-4,6-dideoxy-α-D-galactopyranoside (MAD-diCl-Gal) and a few disaccharides, on the electrophysiological responses of the chorda tympani nerve and on adenylate cyclase in membranes prepared from the anterior tongue epithelium, were studied in rats. MAD-diCl-Gal inhibited the sucrose stimulation of whole chords tympani responses, and this inhibition was reversible. In addition, MAD-diCl-Gal inhibited the sucrose stimulation of adenylate cyclase activity in lingual (gustatory) membranes in a dose-dependent manner. High concentrations of MAD-diCl-Gal abolished the sucrose induced adenylate cyclase activity. The disaccharides sucrose, maltose, trehalose and melibiose stimulated both chords tympani nerve responses and adenylate cyclase activity. These stimulations were dose dependent. Sucrose was the most potent stimulator of the chorda tympani nerve. Other disaccharides resulted in lower responses than sucrose. Sucrose was also a more effective stimulus than maltose for adenylate cyclase activity. In contrast to electrophysiological data, trehalose and melibiose stimulated the adenylate cyclase activity to the same extent as sucrose. The results of this study support the suggestion of cAMP involvement in the cellular transduction of sweet taste in the rat.
Primary structure of cAMP-gated channel from bovine olfactory epithelium
LUDWIG J., MARGALIT T., EISMANN E., Lancet D. & KAUPP U. (1990) FEBS Letters. 270, 1-2, p. 24-29 Abstract
The complete amino-acid sequence of the bovine olfactory epithelium adenosine 3',5'cyclic monophosphate (cAMP)-gated channel has been determined by cloning and sequencing its cDNA. It exhibits a high degree of sequence homology with the cGMP-gated channel of rod photoreceptors, suggesting that cyclic nucleotide-gated channels fall into a new family of genetically related proteins.
Identification and Biochemical Analysis of Novel Olfactory-Specific Cytochrome P-450IIA and UDP-Glucuronosyl Transferase
LAZARD D., Tal N., Rubinstein M., Khen M., Lancet D. & ZUPKO K. (1990) Biochemistry. 29, 32, p. 7433-7440 Abstract
Two major transmembranal polypeptides of bovine olfactory epithelium were identified by SDS electrophoretic analysis of Triton X-114 solubilized membranes. Both polypeptides were present in large amounts in membranes of the olfactory epithelium but were barely detectable in membranes of the nasal respiratory epithelium. Both polypeptides are enriched in the deciliated epithelium as compared with isolated cilia. One of them is a glycoprotein with an apparent molecular mass of 56 kDa (gp56); the other is an unglycosylated protein with an apparent molecular mass of 52 kDa (p52). Sequence analysis of peptides obtained by CNBr cleavage of purified gp56 indicates that it is highly homologous to UDP-glucuronosyl transferase (UDPGT). Parallel analysis shows that p52 is highly homologous to cytochrome P-450 sequences of the IIA subfamily. This protein is assigned the name P-450olf2. Polyclonal antibodies were raised against synthetic peptides corresponding to gp56 and p52 peptide sequences. Immunoblots with these antibodies reveal the following properties of gp56 and p52: (1) they are enriched in the microsomal fraction of the bovine olfactory epithelium; (2) they are possibly specific to the olfactory epithelium, as we could not detect reactivity in microsomes derived from respiratory epithelium or lung, and only a very small amount of basal reactivity was seen with liver microsomes; (3) cross-reacting proteins exist in microsomes derived from the rat olfactory epithelium. These results are consistent with a mechanism whereby the microsomal enzymes are involved in odorant modification and clearance from the nasal tissue.

1989

Olfactory function following late repair of choanal atresia
Gross-Isseroff R., Ophir D., Marshak G., Ganchrow J. R., Beizer M. & Lancet D. (1989) Laryngoscope. 99, 11, p. 1165-1166 Abstract
Results of olfactory function tests (threshold determination and odor identification) in three cases of bilateral and one case of unilateral choanal atresia are reported. All four patients underwent successful repair of choanal atresia at relatively advanced ages (8 to 31 years). Test results showed that patients who had suffered from bilateral atresia had permanent olfactory deficits, while the patient who had suffered from unilateral atresia appeared to have normal olfactory acuity. Although these results should be interpreted with caution due to the small number of cases examined, they suggest the possibility that early sensory exposure might be needed for the normal development of central olfactory functions in analogy to the visual system.
Bovine olfactory cilia preparation: thiol-modulated odorant-sensitive adenylyl cyclase
LAZARD D., Barak Y. & Lancet D. (1989) Biochimica et Biophysica Acta - Molecular Cell Research. 1013, 1, p. 68-72 Abstract
We have characterized the adenylyl cyclase activity in a newly developed preparation of isolated olfactory cilia from the bovine chemosensory neuroepithelium. Like its counterparts from frog and rat, the ciliary enzyme was stimulated by guanine nucleotides, by forskolin, and by a variety of odorants in the presence of GTP. The main difference between the bovine olfactory cilia preparation and the frog and rat olfactory cilia preparation is that odorant stimulation of the bovine olfactory adenylyl cyclase is strongly inhibited by submillimolar concentrations of dithiothreitol. This inhibition is a consequence of a concomitant increase in the GTP-stimulated level and the decrease of the odorant stimulation of the enzyme. Nasal respiratory cilia have a much lower level of adenylyl cyclase activity and show no odorant stimulation. Owing to the large quantities of material available, the bovine olfactory cilia preparation is advantageous for studies of the proteins involved in chemosensory transduction.
Olfaction in Prolonged Administration of Pyridostygmine
ROTH Y., GLICKSON M., NEUMAN R., KARNI A., Lancet D., GROSSISSEROFF R., RAM Z. & GLOVINSKY Y. (1989) Journal of Clinical Pharmacology. 29, 4, p. 370-372 Abstract
Sweet tastants stimulate adenylate cyclase coupled to GTP-binding protein in rat tongue membranes
STRIEM B., PACE U., ZEHAVI U., NAIM M. & Lancet D. (1989) Biochemical Journal. 260, 1, p. 121-126 Abstract
Sucrose and other saccharides, which produce an appealing taste in rats, were found to significantly stimulate the activity of adenylate cyclase in membranes derived from the anterior-dorsal region of rat tongue. In control membranes derived from either tongue muscle or tongue non-sensory epithelium, the effect of sugars on adenylate cyclase activity was either much smaller or absent. Sucrose enhanced adenylate cyclase activity in a dose-related manner, and this activation was dependent on the presence of guanine nucleotides, suggesting the involvement of a GTP-binding protein ('G-protein'). The activation of adenylate cyclase by various momo- and di-saccharides correlated with their electrophysiological potency. Among non-sugar sweeteners, sodium saccharin activated the enzyme, whereas aspartame and neohesperidin dihydrochalcone did not, in correlation with their sweet-taste effectiveness in the rat. Sucrose activation of the enzyme was partly inhibited by Cu²⁺ and Zn²⁺, in agreement with their effect on electrophysiological sweet-taste responses. Our results are consistent with a sweet-taste transduction mechanism involving specific receptors, a guanine-nucleotide-binding protein and the cyclic AMP-generating enzyme adenylate cyclase.
Olfactory-specific cytochrome P-450. cDNA cloning of a novel neuroepithelial enzyme possibly involved in chemoreception
NEF P., Heldman J., LAZARD D., MARGALIT T., JAYE M., HANUKOGLU I. & Lancet D. (1989) Journal of Biological Chemistry. 264, 12, p. 6780-6785 Abstract
We isolated cDNA clones for cytochrome P-450 genes expressed in the olfactory neuroepithelium by screening a corresponding rat cDNA library. Sequence analysis and RNA blot hybridization revealed a new cytochrome P-450, designated cytochrome P-450olf1, which is the first reported cytochrome P-450 mRNA uniquely expressed in the chemosensory organ. Cytochrome P-450olf1 shows intermediate level of sequence similarity (38-53% identity) to several liver cytochrome P-450 enzymes, suggesting that it belongs to the cytochrome P-450II family, but defines a new subfamily (cytochrome P-450IIG) within it. Cytochrome P-450II enzymes are known to process diverse organic compounds, including odorants. This, together with the specificity of cytochrome P-450olf1 to the sensory neuroepithelium, may indicate a role for this protein in olfactory reception.
Olfactory adenylyl cyclase. Identification and purification of a novel enzyme form
Pfeuffer E., Mollner S., Lancet D. & Pfeuffer T. (1989) Journal of Biological Chemistry. 264, 31, p. 18803-18807 Abstract
Rat olfactory adenylyl cyclase has been identified by means of a monoclonal antibody BBC-2, which reacts with both Ca²⁺/calmodulin-sensitive and -insensitive forms of adenylyl cyclase (Mollner, S., and Pfeuffer, T. (1988) Eur. J. Biochem. 171, 265-271). The antibody recognized a 180-kDa polypeptide in olfactory cilia but not in decilitated olfactory epithelial membranes. A protein of the same mobility was observed when olfactory adenylyl cyclase was purified by forskolin-agarose affinity chromatography followed by radioiodination. Its identity was further established by cross-linking to [³²P]ADP-ribosylated G(8α) (GTP-binding protein), to yield a single radiolabeled product of M(r) ~ 220. Olfactory adenylyl cyclase has a ~3-fold higher turnover number, as assessed from stoichiometric binding of [³⁵S]guanosine 5'-(3-O-thio)triphosphate. Therefore, the considerably higher specific adenylyl cyclase activity in olfactory cilia must be due to a ~100-fold higher molar concentration of enzyme in this tissue.

1988

Expression of intermediate filaments and desmoplakin in vertebrate olfactory mucosa
Ophir D. & Lancet D. (1988) Anatomical Record. 221, 3, p. 754-760 Abstract
The expression of intermediate filaments (IF) and desmoplakin was investigated in frog, bovine, and human (fetal) olfactory mucosa. IF are tissuespecific molecular cytoskeletal markers; desmoplakin is the major desmosomal protein. Positive immunoreactivity was observed in the epithelium and in the subepithelial Bowman's glands to keratin and to desmoplakin, indicating the epithelial nature of this tissue. Desmin, neurofilaments, and glial fibrillary acidic protein (GFAP) were not detected in the mucosa. The absence of neurofilaments and GFAP in the tissue containing sensory neurons and glialike supporting cells is a unique feature and may be related to the fact that the chemosensory neurons are situated in a bonafide epithelium and are known to undergo continuous turnover. In view of the controversy regarding the expression of vimentin were used; weak or no labeling was found in the epithelium, whereas mesenchymal cells in the lamina propia were labeled with all three antibodies. Olfactory nerve fascicles in the lamina propia were heterogenously labeled: VIM 13.2 gave very weak labeling; aVimAS showed mild labeling and SBV21 showed intensive labeling in the nerve fascicle. This heterogenous labeling pattern may suggest that olfactory vimentin is distinct in reacting only with some of the antivimentin antibodies.
Concentration-dependent changes of perceived odor quality
Gross-Isseroff R. & Lancet D. (1988) Chemical Senses. 13, 2, p. 191-204 Abstract
In order to assess the dependence of perceived odor quality on odorant concentration, we studied 21 subjects. For eight subjects all possible pairs from a pool of six odorants at three decimal dilutions were presented, and subjects were requested to state whether members of the pair were qualitatively 'similar' or 'different' It was found that while pairs with the same odorant at identical concentrations were judged 'similar' in >90% of the cases by all subjects, scores went down to ≤10% 'similar' judgements in some cases when the same odorant was presented at a 100-fold concentration difference. Large time-invariable differences were found among subjects and among odorants. For the additional 13 subjects, all possible pairs from a pool of four odorants at three decimal dilutions were presented. Subjects were instructed to state whether members of the pair were qualitatively 'same' or 'different', and were also requested to rank the degree of difference on a visual analogue scale. Results for this group were, in general, similar to the results of the former group of subjects and good agreement between the two tasks was found. The results suggest that variations in olfactory stimulus magnitude may be perceived as quality differences, as previously shown for vision and audition.
Molecular transduction in smell and taste
Lancet D., LAZARD D., Heldman J., Khen M. & NEF P. (1988) Cold Spring Harbor Symposia on Quantitative Biology. 53, 1, p. 343-348 Abstract

1987

Toward a Comprehensive Molecular Analysis of Olfactory Transduction
Lancet D., CHEN Z., CIOBOTARIU A., ECKSTEIN F., Khen M., Heldman J., OPHIR D., SHAFIR I. & PACE U. (1987) Annals of the New York Academy of Sciences. 510, 1, p. 27-32 Abstract
Olfactory sensitivity to androstenone in schizophrenic patients
Isseroff R. G., Stoler M., Ophir D., Lancet D. & Sirota P. (1987) Biological Psychiatry. 22, 7, p. 922-925 Abstract
The molecular basis of odor recognition
Lancet D. & Pace U. (1987) Trends in Biochemical Sciences. 12, C, p. 63-66 Abstract
Following decades of study and speculation, the molecular mechanisms of olfaction are beginning to be understood. Odorant receptors appear to activate a cyclic nucleotide enzyme cascade, including a GTP-binding protein, analogous with the processes of hormone, neurotransmitter and visual reception.

1986

Cyclic AMPDependent Protein Phosphorylation in Chemosensory Neurons: Identification of Cyclic NucleotideRegulated Phosphoproteins in Olfactory Cilia
Heldman J. & Lancet D. (1986) Journal of Neurochemistry. 47, 5, p. 1527-1533 Abstract
Abstract Chemosensory dendritic membranes (olfactory cilia) contain protein kinase activity that is stimulated by cyclic AMP and more efficiently by the nonhydrolyzable GTP analog guanosine5O(3thio)triphosphate (GTPγS). In control nonsensory (respiratory) cilia, the cyclic AMPdependent protein kinase is practically GTPγSinsensitive. GTPγS activation of the olfactory enzyme appears to be mediated by a stimulatory GTPbinding protein (Gprotein) and adenylate cyclase previously shown to be enriched in the sensory membranes. Protein kinase C activity cannot be detected in the chemosensory cilia preparation under the conditions tested. Incubation of olfactory cilia with [γ³²P]ATP leads to the incorporation of [³²P]phosphate into many polypeptides, four of which undergo covalent modification in a cyclic nucleotidedependent manner. The phosphorylation of one polypeptide, pp24, is strongly and specifically enhanced by cyclic AMP at concentrations lower than 1 μM. This phosphoprotein is not present in respiratory cilia, but is seen also in membranes prepared from olfactory neuroepithelium after cilia removal. Cyclic AMPdependent protein kinase and phosphoprotein pp24 may be candidate components of the molecular machinery that transduces odor signals.
Monoclonal antibodies to ciliary glycoproteins of frog olfactory neurons
CHEN Z., OPHIR D. & Lancet D. (1986) Brain Research. 368, 2, p. 329-338 Abstract
Monoclonal antibodies were produced against isolated frog olfactory cilia, a preparation enriched in dendritic extensions of the chemosensory neurons. Two antibodies, 18.1 and 35.6, were found to react against specific glycoproteins of the sensory organelles. These glycoproteins were identified by their differential binding to the lectins wheat germ agglutinin and Concanavalin A. The antibodies fluorescently labeled isolated olfactory cilia, as well as the ciliary surface layer of olfactory epithelium, whose extent was defined by anti-tubulin and anti-keratin antibodies. Respiratory epithelium (or other tissues) as well as isolated respiratory cilia were not labeled by antibodies 18.1 and 35.6, indicating tissue specificity. The olfactory-specific antibodies can be used as markers of the sensory epithelium and of the sensory regions of olfactory dendritic membranes. Antibody 18.1 recognized gp95, a specific and major integral membrane glycoprotein of frog olfactory cilia. Since gp95 has been suggested as candidate olfactory receptor protein (Chen, Z. and Lancet, D., Proc. Natl. Acad. Sci. U.S.A., 84 (1984) 1859-1863), antibody 18.1 could also be useful for functional studies.
Changes in Olfactory Acuity Induced by Total Inferior Turbinectomy
Ophir D., Gross Isseroff R., Lancet D. & Marshak G. (1986) JAMA Otolaryngology - Head and Neck Surgery. 112, 2, p. 195-197 Abstract
The short-term and long-term effects of total inferior turbinectomy on smell acuity was assessed in two groups of patients. Olfactory thresholds were determined by a three-way forced-choice method, using four odorants. Resection of obstructive inferior turbinates resulted in a decrease in olfactory thresholds in 22 of 24 tested patients. No deleterious effect on smell acuity was observed in 16 patients tested 2½ years or more after surgery. Subjective assessment of olfactory acuity is unreliable. It is our intention to focus attention on an aspect of intranasal surgery not frequently reported. (Arch Otolaryngol Head Neck Surg 1986;112:195-197)
Vertebrate olfactory reception
Lancet D. (1986) Annual Review of Neuroscience. VOL. 9, p. 329-355 Abstract
Isolated frog olfactory cilia: A preparation of dendritic membranes from chemosensory neurons
CHEN Z., PACE U., Heldman J., Shapira A. & Lancet D. (1986) Journal of Neuroscience. 6, 8, p. 2146-2154 Abstract
Polypeptide gp95. A unique glycoprotein of olfactory cilia with transmembrane receptor properties
CHEN Z., PACE U., RONEN D. & Lancet D. (1986) Journal of Biological Chemistry. 261, 3, p. 1299-1305 Abstract
Polypeptide gp95 is a major glycoprotein present in preparations of isolated ciliary extensions from frog olfactory sensory neurons. We report here that gp95 is distinct among the ciliary polypeptides in having several properties that make it a plausible receptor candidate: it is specific to olfactory cilia, it has the appropriate bilayer density, and it is a transmembrane protein. Polypeptide gp95 has a uniquely high content of complex type oligosaccharides compared to other ciliary glycoproteins, a property which is used for its partial purification and can also serve as a probe for functional identification. The present biochemical characterization of frog gp95 and of its putative homologs from other species may open the way to a future assignment of its role in chemosensory reception.
Olfactory GTP-binding protein: Signal-transducing polypeptide of vertebrate chemosensory neurons
Pace U. & Lancet D. (1986) Proceedings of the National Academy of Sciences of the United States of America. 83, 13, p. 4947-4951 Abstract
The sense of smell involves the stimulation of sensory neurons by odorants to produce depolarization and action potentials. We show that olfactory responses may be mediated by a GTP-binding protein (G protein), a homolog of the visual, hormonal, and brain signal transducing polypeptides. The olfactory G protein is identified in isolated dendritic membranes (olfactory cilia preparations) of chemosensory neurons from three vertebrate species and is shown to mediate the stimulation by odorants of the highly active adenylate cyclase in these membranes. The G protein of olfactory neurons is most similar to G(s), the hormonal stimulatory GTP-binding protein. Its α subunit has a molecular weight of about 42,000, and it undergoes ADP-ribosylation catalyzed by cholera toxin that leads to adenylate cyclase activation. The slight difference in molecular weights of the frog olfactory and the liver G(s) α subunits and the higher sensitivity of olfactory adenylate cyclase to nonhydrolyzable GTP analogs are consistent with the possible existence of different G(s) variants. Signal amplification due to the olfactory G protein may be responsible for the unusual acuity of the sense of smell.

1985

An inexpensive microcomputer-based image-analysis system: novel applications to quantitative autoradiography
Isseroff A. & Lancet D. (1985) Journal of Neuroscience Methods. 12, 4, p. 265-275 Abstract
We describe a relatively inexpensive, yet versatile and powerful microcomputer-based image-analysis system, and its applications to processing of deoxyglucose autoradiographic data. Images are acquired via a video camera mounted on a light microscope or a light box, and digitized in 40 ms to 512 × 512 picture elements with 8-bit resolution (256 gray levels). The bit-mapped image analysis hardware can provide up to 256 colors for pseudo-color coding, and virtually instantaneous readout of brightness values for densitometry. The system is controlled by an 8-bit S-100 bus microcomputer, providing flexibility and ease of expansion. In addition to pseudo-color coding and densitometry, we have developed programs for averaging of successive sections, image subtraction and quantitative reconstruction of different planes of section from serial autoradiograms.
Odorant-sensitive adenylate cyclase may mediate olfactory reception
PACE U., HANSKI E., Salomon Y. & Lancet D. (1985) Nature. 316, 6025, p. 255-258 Abstract
The mechanism of the sense of smell has long been a subject for theory and speculation^1-5. More recently, the notion of odorant recognition by stereospecific protein receptors has gained wide acceptance^6-11, but the receptor molecules remained elusive^9-15. The recognition molecules are believed to be quite diverse^9,11,13-15, which would partly explain the unusual difficulties encountered in their isolation by conventional ligand-binding techniques^12,13. An alternative approach would be to probe the receptors through transductory components that may be common to all receptor types^7-9,12-14. Here we report the identification of one such transductory molecular component. This is an odorant-sensitive adenylate cyclase, present in very large concentrations in isolated dendritic membranes of olfactory sensory neurones. Odorant activation of the enzyme is ligand and tissue specific, and occurs only in the presence of GTP, suggesting the involvement of receptor(s) coupled to a guanine nucleotide binding protein (G-protein)^15-19. The olfactory G-protein is independently identified by labelling with bacterial toxins, and found to be similar to stimulatory G-proteins in other systems^17-19. Our results suggest a role for cyclic nucleotides in olfactory transduction ^13,20-22, and point to a molecular analogy between olfaction and visual^15,16, hormone^17,18 and neurotransmitter ¹⁹ reception. Most importantly, the present findings reveal new ways to identify and isolate olfactory receptor proteins.

1984

Molecular view of olfactory reception
Lancet D. (1984) Trends in Neurosciences. 7, 2, p. 35-36 Abstract
Membrane proteins unique to vertebrate olfactory cilia: Candidates for sensory receptor molecules
CHEN Z. & Lancet D. (1984) Proceedings Of The National Academy Of Sciences Of The United States Of America-Biological Sciences. 81, 6 I, p. 1859-1863 Abstract
In search for olfactory receptor molecules, we carried out comprehensive electrophoretic mapping of membrane proteins in the cilia of frog olfactory epithelium. Seven polypeptides, extracted from isolated cilia by nonionic detergent, were unique to the sensory organelles, compared to nonsensory (respiratory) counterparts. Olfactory cilia contained 3-10 times more membrane-associated protein as compared to respiratory cilia, in agreement with reported densities of freeze-fracture intramembranous particles. Four of the olfactory polypeptides were major constituents of the ciliary membrane, each amounting to >10% of its total protein. Three major and one minor specific polypeptide were glycosylated, whereas membranes of nonsensory cilia were practically devoid of glycoproteins. A clear difference in surface composition was also shown by microscopic visulalization of fluoresceinated lectin bound to intact isolated cilia. Two of the olfactory glycoproteins displayed pronounced heterogeneity of apparent molecular weight, which could partly be due to protein sequence diversity, as expected for odorant receptor molecules. The recently described inhibition of odorant-evoked sensory potentials by the lectin concanavalin A is consistent with the hypothesis that one or more of the specific glycoproteins described here plays a role in olfactory reception.

1980

N.m.r. investigation of hapten binding to the myeloma protein M460
Morris A. T., Lancet D., Pecht I., Givol D. & Dwek R. A. (1980) International Journal of Biological Macromolecules. 2, 1, p. 39-44 Abstract
The binding of the haptens DnpOH, Dnp-lysine and Dnp-aspartate to the mouse myeloma IgA protein was studied using ¹H 270 MHz nuclear magnetic resonance spectroscopy. The n.m.r. difference spectra showed fewer resonance perturbed than expected. This is explained in terms of chemical exchange between the T and R states of the protein as described by the kinetic scheme of Lancet and Pecht (Lancet, D. and Pecht, I. Proc. Natl Acad. Sci. USA 1976, 73 3549 53). Large upfield chemical shifts were observed on the resonances of the hapten DnpOH on binding to M460. These are interpreted as indicating an aromatic environment for the Dnp ring. In contrast, the Dnp-aspartate resonances were not shifted at all, as would be expected from the observed rate constants using the kinetic scheme. The shifts observed on the hapten Dnp-lysine were much smaller than those observed for DnpOH. A range of possible values of the shifts were calculated for the T and R states, for Dnp-lysine and DnpOH. For both haptens the combining site environment differed between the T and R conformational states of M460, suggesting that the conformational change involves the combining site.
Structural studies of the membrane-associated products of the human major histocompatibility complex.
Orr H., Fuks A., Kaufman J., Lancet D., Ploegh H., Robb R., Strominger J. & Parham P. (1980) Advances in pathobiology. 7, p. 318-330 Abstract

1979

Effect of azathioprine on the affinity of antibodies against acetylcholine receptor: Analysis with purified antibodies
Schwartz M., Lancet D., TARRABHAZDAI R. & Fuchs S. (1979) Molecular Immunology. 16, 7, p. 483-487 Abstract
Rabbit anti-acetylcholine receptor (AChR) AChR, acetylcholine receptor; Az, azathioprine CFA, complete Freund's adjuvant; CNBr, cyanogen bromide; NRS, normal rabbit serum; SDS, sodium dodecyl sulfate; EAMG, experimental autoimmune myasthenia gravis. antibodies were purified on an AChR-toxin-Sepharose immunoadsorbent. The immunoadsorbent was prepared by attaching first toxin covalently to Sepharose and then reacting the toxin-Sepharose with AChR. Purified anti-AChR antibodies were utilized for studying the association between the immunosuppressive effect of azathioprine (Az) on AChR-induced experimental autoimmune myasthenia gravis and on the affinity of the antibodies. Mathematical analysis of the data obtained from binding experiments of. ¹²⁵I-AChR to the purified antibodies suggest that treatment with Az decreases the amount of anti-AChR antibodies possessing high affinity values.
Folding Pathways of Immunoglobulin Domains. The Folding Kinetics of the Cγ3 Domain of Human IgGl
Pecht I., Isenman D. E. & Lancet D. (1979) Biochemistry. 18, 15, p. 3327-3336 Abstract
The in vitro folding kinetics of a fragment corresponding to an intact dimer of the Cγ3 domain of human IgGl (pFc) were monitored via the large changes in tryptophan fluorescence which accompany these processes. In going from the guanidine hydrochloride (Gdn-HCl) induced unfolded state (4.0 M Gdn-HCl) to the native state (0.5 M Gdn-HCl), three well-separated first-order processes were observed having time constants of 5, 50, and 350 s and roughly equal amplitudes. These values were concentration independent, a fact consistent with there being no fluorescence change accompanying dimerization. These time constants are one to two orders of magnitude slower than those observed for proteins of similar size such as ribonuclease or cytochrome c, most probably reflecting the complex processes involved in forming the correct β-sheet arrangement of immunoglobulin domains. The corresponding unfolding transition is biphasic having time constant values of 50 and 500 s, the latter comprising 80% of the fluorescence change. These data indicate the presence of at least one species with intermediate fluorescence along the unfolding pathway. Gdn-HCl concentration jumps were also performed over various intervals within the transition zone. The results are not consistent with a fully reversible mechanism. In the absence of the intrachain disulfide bond, pFc exists in an unfolded state even at 0.5 M Gdn-HCl. In a concomitant refolding and reoxidation experiment (at 0.5 M Gdn-HCl and using an optimal disulfide interchange catalytic system), the time constant for disulfide formation was in the range of 80-200 s and the fluorescence change revealed a lag phase analyzable in terms of rate-limiting reoxidation and refolding times consistent with those observed for the initially disulfide bonded species. Under similar conditions but at 4 M Gdn-HCl, reoxidation was more than two orders of magnitude slower, suggesting that reoxidation is directed by a refolding nucleation event.

1978

Interactions between staphylococcal protein A and immunoglobulin domains
Lancet D., ISENMAN D., SJODAHL J., SJOQUIST J. & Pecht I. (1978) Biochemical and Biophysical Research Communications. 85, 2, p. 608-614 Abstract
The affinity and stoichiometry of interaction between staphylococcal protein A and different domains of immunoglobulins have been studied. Light scattering and tryptophan fluorescence quenching titrations along with direct binding assays were performed. The lack of binding to protein A of pFc fragment (corresponding to C_H3 domain of IgG) or of Facb derivative of rabbit IgG (which is devoid of the C_H3) suggests that the locus of protein A binding is at the interface between the C_H2 and C_H3 domains. This assignment is also supported by results of the tryptophan fluorescence quenching and C1 binding experiments.
Affinity and avidity of antibodies to the random polymer (T,G)-A-L and a related ordered synthetic polypeptide
Schwartz M., Lancet D., Mozes E. & Sela M. (1978) Immunochemistry. 15, 7, p. 477-481 Abstract
The affinity values of antibodies to the random synthetic antigen poly-(Tyr,Glu)-poly(dl Ala)-poly(Lys) [known as (T,G)-A-L] and to the ordered (Tyr-Tyr-Glu-Glu)-poly(dlAla)-poly(Lys) [known as (T-T-G-G)-A-L] were measured. Determinations of the association constants were performed by antigen binding capacity assay, ABC, using the whole antigen, and by equilibrium dialysis, using a radioactive conjugate of the ordered peptide T-T-G-G which was found to represent the major determinant of the random (T,G)-A-L. The affinity values of antibodies elicited by high responder mice to (T,G)-A-L and to (T-T-G-G)-A-L were found to be similar. However, a difference of two orders of magnitude was found between the values obtained by the two methods. This difference is partially explained by probability analysis. Upon immunizing low responder (H-2^k) mice with complexes of the antigens with methylated bovine serum albumin (MBSA), an increase in their antibody response was observed. The affinity values of these antibodies were found to be similar to those of high responder antibodies after immunization with either the antigens complexed to MBSA or with the antigens alone. On the basis of these data we conclude that low responder (H-2^k) B cells have the potential to produce antibodies of the same specificity and quality as the high responders to the random (T,G)-A-L and to the ordered (T-T-G-G)-A-L, and that the genetic defect in these mice should be located at the level of the B-cell acceptor for the T-cell signal.
Allostery in an immunoglobulin light-chain dimer: a chemical relaxation study
Lancet D., Licht A. & Pecht I. (1978) Biophysical Journal. 24, 1, p. 247-249 Abstract
Keywords: Biophysics
Hapten-linked conformational equilibria in immunolglobulins XRPC-24 and J-539 observed by chemical relaxation
VUKPAVLOVIC S., BLATT Y., GLAUDEMANS C., Lancet D. & Pecht I. (1978) Biophysical Journal. 24, 1, p. 161-174 Abstract
The interaction of oligogalactan haptens with the murine myeloma proteins XRPC-24 and J-539 has been investigated by the fluorescence temperature-jump method. The relaxation spectrum is composed of two processes, the faster representing hapten assocaition and the slower a protein isomerization. In both cases the concentration dependence of relaxation times and amplitudes was consistent with the general mechanism formulated by Lancet and Pecht (1976, Proc. Natl. Acad. Sci. U.S.A. 73:3549), in which the equilibrium between two conformations of the protein is shifted by hapten binding. The intact proteins and their Fab fragment had identical kinetic behavior, indicating that the conformational changes are located in the Fab region. Temperature dependence analysis for protein J-539 permitted the calculation of activation parameters and led to a consistent energy profile for all the elementary steps. The conformational states are separated by large activation barriers, but have similar free energies. The results suggest that hapten-induced conformational changes in immunoglobulins are more general phenomena than was previously thought.

1977

Spectroscopic and Immunochemical Studies with Nitrobenzoxadiazolealanine, a Fluorescent Dinitrophenyl Analogue
Lancet D. & Pecht I. (1977) Biochemistry. 16, 23, p. 5150-5157 Abstract
The fluorescent nitro compound 4-(α-N-L-alanine)-7-nitrobenz-2-oxa-1,3-diazole (NBDA) is a structural andfunctional analogue of the 2,4-dinitrophenyl group (DNP). It binds to all induced anti-DNP antibodies examined and to several monoclonal immunoglobulins with nitroaromatic specificity. The fluorescence of NBDA is quenched upon binding to these proteins. Similar quenching of NBDA fluorescence is observed in the presence of aromatic amino acid analogues, and also upon binding to serum albumin and apomyoglobin. NBDA does not bind to immunoglobulins of unrelated specificity or to bovine trypsinogen. The absorption and fluorescence characteristics of NBDA in different solvents reveal large changes which correlate with medium polarity. A few important exceptions, however, exist, suggesting that NBDA is not a simple polarity probe and that its spectral properties are sensitive to specific binding interactions. The observed spectral parameters of NBDA when bound to immunoglobulins clearly indicate that binding does not occur through hydrophobic interactions only and suggest the formation of specific interactions such as a charge-transfer complex and hydrogen bonds. The IgA myeloma protein 460 binds NBDA with an association constant of 3.2 × 10⁵ M^-1 (at 25 °C). The bound hapten undergoes full quenching of its fluorescence and marked changes in its absorption spectrum. A large induced circular dichroism in the bound hapten's absorption is also observed. NBDA is the first environmentsensitive fluorescent probe reported to bind specifically to a homogeneous immunoglobulin. It may also be used to detect and characterize antinitroaromatic antibodies, even in crude preparations, and possibly on cell surfaces.
Hapten-induced allosteric transition in the light chain dimer of an immunoglobulin
Lancet D., Licht A., Schechter I. & Pecht I. (1977) Nature. 269, 5631, p. 827-829 Abstract
THE dimer of immunoglobulin light chains (L₂) was shown by X-ray crystallography to be structurally homologous to the native immunoglobulin Fab fragment (HL) (refs 1-3). This led to the suggestion that L₂ is a model for a primitive antibody¹ and for the T-cell receptor ^22,23. In L₂ 315 (from the mouse myeloma protein MOPC 215) functional homology was also implicated in that both L₂ and HL bind the same nitroaromatic haptens with similar fine specificity^4,5. We have considered the possibility that this homology extends also to hapten-induced conformational changes and that studies with L₂ will yield information on the nature of such processes in immunoglobulins. L ₂ 315 constitutes a particularly good model system for such studies as it binds two hapten molecules per dimer at symmetry-related sites ^4,5. It is thus possible to monitor the conformational effects of the first binding hapten through the mode of interaction with the second, as revealed by the shape of the saturation curve⁶. We report here that dinitrophenyl-lysine (DNPL) and its fluorescent analogue nitrobenzoxadiazole- alanine (NBDA) (ref. 7) both display sigmoidal binding curves, implying positive cooperativity in their interaction with L₂ 315. This observed cooperativity is shown to arise from a hapten-induced conformational transition which may be described by the allosteric model⁶. We propose that this allosteric transition involves changes in the relative position of the chains in L₂, lending support to suggestions^8,9 that a similar process occurs in the structurally homologous intact immunoglobulin. These findings are, therefore, relevant to the question of antigen-triggered effector functions of antibodies through an allosteric mechanism^8,10-12.
Thermodynamic and spectroscopic comparison of the binding sites of the mouse myeloma protein 315 and of its light chain dimer
Licht A., Lancet D., Schechter I. & Pecht I. (1977) FEBS Letters. 78, 2, p. 211-215 Abstract
Kinetics of antibody-hapten interactions.
Pecht I. & Lancet D. (1977) Chemical Relaxation in Molecular Biology. Pecht I. & Rigler R.(eds.). 1 ed. Vol. 24. p. 306-338 Abstract
The immune system, which is the major defence complex present in vertebrates against foreign cells and pathogens, is comprised of lymphocytes and their products, the antibodies. The latter carry out the function of recognition of antigenic determinants and the resultant triggering of a wide range of biological responses. Antibodies are all immunoglobulins - a group of multichain proteins whose similar gross structure may be expressed as (HL)2n where H and L are the heavy and light polypeptide chains, respectively, linked together by noncovalent and disulfide bonds. For the IgG class n = 1 and, for the IgM class n = 5. The light chains are 22,500 daltons whereas the heavy chainsary according to class between 53,000 for the IgG to 75,000 daltons for IgE (EDELMAN and GALL, 1969). A typical property of antibodies, even from one individual animal and having the same specificity, is the pronounced heterogeneity found in their primary structure. This heterogeneity is confined, however, to the first 110 N-terminal residues of the two chains (variable region) and expresses the diversity of the antigen combining site. Certain stretches in the variable region were found to exhibit higher variability, and these hypervariable residues were proposed to form the main contact areas of the binding site (RABAT and WU, 1971). X-ray crystallography has verified this hypothesis (DAVIES et al., 1976; POLJAK, 1975).

1976

Oxidative titrations of Rhus vernicifera laccase and its specific interaction with hydrogen peroxide
FARVER O., GOLDBERG M., Lancet D. & Pecht I. (1976) Biochemical and Biophysical Research Communications. 73, 2, p. 494-500 Abstract
The reaction of oxidized Rhus vernicifera laccase and H₂O₂ leads specifically to the formation of a stable, high affinity complex. It is characterized by an absorption band at 325 nm and is most probably formed with the type 3 site. Oxidative titrations of laccase show a different pathway from the reductive ones. This is also expressed in different Nernst coefficients observed for each half of the redox cycle (2 for reduction, 1 for oxidation). Oxidation of the type 3 site by H₂O₂ proceeds in a bimolecular reaction, whereas type 1 is oxidized in an indirect pathway.
Kinetic evidence for hapten induced conformational transition in immunoglobulin MOPC 460
Lancet D. & Pecht I. (1976) Proceedings of the National Academy of Sciences of the United States of America. 73, 10, p. 3549-3553 Abstract
The kinetics of hapten binding to the homogeneous immunoglobulin A secreted by the murine plasmacytoma MOPC 460 was investigated by the chemical relaxation method. Two distinct relaxation times were observed in the binding equilibrium with three different haptens. A detailed concentration dependence analysis of relaxation times and amplitudes was performed with the hapten ε N (2,4 dinitrophenyl) lysine (Dnp Lys). The results support a mechanism in which two interconvertible conformational states of the protein bind the hapten with different association constants. Hapten binding shifts the equilibrium towards the better binding state. These observations form kinetic evidence for a conformational transition induced in the immunoglobulin by ligand binding to its antigen binding site, and are in line with the allosteric hypothesis for the initiation of physiological functions by antigen antibody association.