March 28, 1994 - March 28, 2027

  • Date:17ThursdayJanuary 2019

    Special Guest Seminar with Ariel Schwartz

    More information
    Time
    10:00
    Title
    “Deep Semantic Genome and Protein Representation for Annotation, Discovery, and Engineering”
    Location
    Arthur and Rochelle Belfer Building for Biomedical Research
    Botnar Auditorium
    Lecturer
    Dr. Ariel Schwartz
    Co-founder and Chief Technology Officer at Denovium Inc
    Organizer
    Department of Molecular Genetics
    Guest Lecture
    Contact
    AbstractShow full text abstract about Computational assignment of function to proteins with no kno...»
    Computational assignment of function to proteins with no known homologs is still an unsolved problem. We have created a novel, function-based approach to protein annotation and discovery called D-SPACE (Deep Semantic Protein Annotation Classification and Exploration), comprised of a multi-task, multi-label deep neural network trained on over 70 million proteins. Distinct from homology and motif-based methods, D-SPACE encodes proteins in high-dimensional representations (embeddings), allowing the accurate assignment of over 180,000 labels for 13 distinct tasks. The embedding representation enables fast searches for functionally related proteins, including homologs undetectable by traditional approaches. D-SPACE annotates all 109 million proteins in UniProt in under 35 hours on a single computer and searches the entirety of these in seconds. D-SPACE further quantifies the relative functional effect of mutations, facilitating rapid in-silico mutagenesis for protein engineering applications. D-SPACE incorporates protein annotation, search, and other exploratory efforts into a single cohesive model. We have recently extended this work from protein to DNA, enabling assignment of function to whole genomes and metagenomic contigs in seconds. Conserved genomic motifs as well as the functional impact of mutations in coding as well as non-coding genomic regions can be predicted directly from raw DNA sequence without the use of traditional comparative genomics approaches for motif detection, such as multiple sequence alignments, PSSMs, and profile HMMs.
    Lecture