BG, BMMB, MCIBS
My research aims to understand where transcription factors bind in the
genome, and what they do once they get there. There are many forces that
can affect a TF's choice of binding targets once it is introduced into
the nucleus. The inherent DNA-binding preference of the protein will
specify the sites that could potentially be bound, but the vast majority
of high-affinity sequences will not in fact be occupied by the TF in
any given cell type. Binding selectivity is thus determined by the
regulatory environment of the cell: chromatin accessibility,
interactions with co-factors, DNA methylation, and histone
post-translational modifications all play roles in specifying the TF's
binding sites. These forces are context-specific, which allows the same
TF to target different binding sites in different cell types. However, a
TF's choice of binding targets is only part of the equation; many bound
sites do not seem to directly affect gene expression. We understand
little about how enhancers can regulate genes that are thousands,
sometimes millions, of bases away on the genome.
Fortunately, high-throughput sequencing assays are giving us
unprecedented insight into the regulatory environment of the cell.
ChIP-seq and ChIP-exo allow us to profile TF and histone modification
occupancy at high resolution over the entire genome. RNA-seq lets us
profile the global transcriptional activity. DNase-seq profiles the
genome-wide accessibility landscape, while new assays such as ChIA-PET
and Hi-C are opening a window on the three-dimensional architecture of
the nucleus. The challenge will be integrating these voluminous data
types into a cohesive understanding of cellular activity.
I believe that integrative machine-learning approaches that model the
biological and experimental processes that generate such data will help
us to understand the context-specific activity of transcription factors.