Skip Navigation
Search

ECE Departmental Seminar

Interpretable machine learning approaches for understanding functional genomics in the human brain

Prof. Daifeng Wang
Department of Biomedical Informatics
Stony Brook University

Daifeng WangFriday, 3/8/19, 11:00am
Light Engineering 250

Abstract: Disorders of the brain affect nearly a fifth of the world’s population. Robust phenotype-genotype associations have been established for a number of brain disorders including psychiatric diseases (e.g., schizophrenia, bipolar disorder). However, understanding the molecular causes of brain disorders is still a challenge. To address this, recent large scientific projects have generated comprehensive genomic datasets for the human brain -- e.g., the PsychENCODE consortium generated ~5,500 genotype, transcriptome, chromatin, and single-cell datasets from 1,866 individuals. Using these data, we have developed a set of interpretable machine learning approaches for deciphering functional genomic elements and linkages in the brain and psychiatric disorders. In particular, we have found ~79,000 brain-active enhancers and ~2.5M eQTLs comprising ~238K linkage-disequilibrium-independent SNPs. We deconvolved the bulk tissue expression across individuals using single-cell data and found that differences in the proportions of cell types explain >85% of the cross-population variation observed. Leveraging our QTLs and Hi-C datasets, we predicted a full regulatory network for the brain, linking TFs, enhancers, and target genes. For this, we use elastic-net modeling, which linearly combines the L1 and L2 regularizations to determine TF-target causal relationships. Using the full regulatory network, we connected genes and epigenetic changes to GWAS variants for psychiatric disorders (e.g., connecting a total 321 genes to SNPs for schizophrenia and finding new genes potentially associated with the disease). Additionally, we developed an interpretable deep-learning model embedding the physical regulatory network to predict phenotype from genotype. Our model uses a conditional Deep Boltzmann Machine architecture and introduces lateral connectivity at the visible layer to embed the biological structure learned from the regulatory network and QTL linkages. Further, we develop a rank-statistic based interpretation scheme which allows us to functionally annotate hidden nodes and prioritize them relative to disorders. Our model improves disease prediction (by 6-fold compared to additive polygenic risk scores), highlights key genes for disorders, and allows imputation of missing transcriptome information from genotype data alone. We recently published this work in Science (Wang et al, Science, Dec 14 2018, DOI: 10.1126/science.aat8464).

Bio: Daifeng Wang is an assistant professor in Department of Biomedical Informatics at Stony Brook University. He is also affiliated with Departments of Electrical and Computer Engineering, Computer Science, and Applied Mathematics and Statistics. He had worked at Yale University as postdoctoral associate (2012 – 2015) and associate research scientist (2015 – 2016). He obtained a Ph.D. in Electrical and Computer Engineering at the University of Texas at Austin. His research focuses on developing interpretable machine learning approaches and bioinformatics tools to integrate and analyze multi-omics data for understanding functional genomics and gene regulation in the human brain disorders and cancers; e.g., he is currently working on deciphering the functional genomics for deep phenotypes in the human disease, aiming to discover the molecular mechanisms and genomic engineering principles for precision medicine.