Cientistas usam marcas epigenéticas para predizer como o DNA se dobra

quarta-feira, novembro 01, 2017

De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture

Michele Di Pierroa,1,2, Ryan R. Chenga,1, Erez Lieberman Aidena,b, Peter G. Wolynesa,c,d, and José N. Onuchica,d,

Author Affiliations

aCenter for Theoretical Biological Physics, Rice University, Houston, TX 77005;

bCenter for Genome Architecture, Baylor College of Medicine, Houston, TX 77030;

cDepartment of Chemistry, Rice University, Houston, TX 77005;

dDepartment of Physics & Astronomy, Rice University, Houston, TX 77005

Contributed by José N. Onuchic, October 4, 2017 (sent for review August 24, 2017; reviewed by Jie Liang and Tamar Schlick)


In the nucleus of eukaryotic cells, the genome is organized in three dimensions in an architecture that depends on cell type. This organization is a key element of transcriptional regulation, and its disruption often leads to disease. We demonstrate that it is possible to predict how a genome will fold based on the epigenetic marks that decorate chromatin. Epigenetic marking patterns are used to predict the corresponding ensemble of 3D structures by leveraging both energy landscape theory and neural network-based machine learning. These predictions are extensively validated by the results of DNA-DNA ligation assays and fluorescence microscopy, which are found to be in exceptionally good agreement with theory.


Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible.

epigenetics machine learning energy landscape theory genomic architecture Hi-C


1M.D.P. and R.R.C. contributed equally to this work.

2To whom correspondence may be addressed. Email: or

Author contributions: M.D.P., R.R.C., E.L.A., P.G.W., and J.N.O. designed research; M.D.P. and R.R.C. performed research; M.D.P. and R.R.C. contributed new reagents/analytic tools; M.D.P., R.R.C., E.L.A., P.G.W., and J.N.O. analyzed data; and M.D.P., R.R.C., E.L.A., P.G.W., and J.N.O. wrote the paper.

Reviewers: J.L., University of Illinois at Chicago; and T.S., New York University.

The authors declare no conflict of interest.

This article contains supporting information online at

Copyright © 2017 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).