CENTRE: A Bioconductor package for cell type specific enhancer-promoter prediction

CENTRE: A Bioconductor package for cell type specific enhancer-promoter prediction


Author(s): Sara Lopez Ruiz de Vargas,Trisevgeni Rapakoulia,Persia Akbari-Omgba,Verena Laupert,Martin Vingron

Affiliation(s): Max Planck Institute For Molecular Genetics



Identifying active enhancer-promoter pairs is a crucial step to understand gene regulation, phenotypes and diseases. Up to now, several computational methods were developed to predict enhancer gene interactions, but they require many epigenomic and transcriptomic experimental assays to generate cell-type specific predictions. Thus, inferring enhancer gene interactions becomes a laborious and costly task, especially when looking for cell type (CT) specific contacts. Recently, we introduced the workflow of CENTRE, aimed at predicting CT-specific enhancer-target interactions with minimal experimental input. CENTRE uses an XGBoost classifier to predict enhancer target interactions in a CT-specific manner, relying solely on gene expression and ChIP-seq data for three histone modifications pertinent to the CT of interest. Leveraging the abundance of available datasets, CENTRE combines cell-type-agnostic statistics with CT-specific features. For enhanced software accessibility, we now implement the CENTRE workflow as a Bioconductor package and introduce a novel Hi-C module, enabling users to incorporate Hi-C data seamlessly without needing to provide it themselves. The CENTRE Bioconductor package encompasses four primary steps: the collection of enhancer-gene pairs within 500KB of user-specified genes, the computation of generic and CT-specific features and the calculation of predictions from said features. Users interested in integrating Hi-C features can use the precomputed library of Directionality Indices (DI) and Insulation scores (IS). CENTRE has undergone extensive validation across diverse datasets and cell types, consistently demonstrating comparable or superior performance to existing algorithms reliant on extensive experimental data. The open-source code for CENTRE is available on GitHub at https://github.com/slrvv/CENTRE.