Datasets for Predicting TF Binding Using Virtual ChIP-seq

This repository contains datasets necessary for using the Virtual ChIP-seq software, such as matrices of correlation between TF binding and gene expression, FIMO PWM scores for JASPAR motifs, and ChIP-seq data of ENCODE and Cistrome database, among others.

Description: This repository contains datasets necessary for using the Virtual ChIP-seq software. Virtual ChIP-seq requires the following datasets to predict transcription factor binding: chipExpDir_AtoH_V1.0.0.tar.gz: Reference matrices of correlation between TF binding and gene expression for TFs starting with letters A-H. chipExpDir_ItoZ_V1.0.0.tar.gz: Reference matrices of correlation between TF binding and gene expression for TFs starting with letters I-Z. refTables_V1.1.0.tar.gz: PhastCons genomic conservation, FIMO PWM scores for JASPAR motifs, and ChIP-seq data of ENCODE and Cistrome database. hg38_chrsize.tsv: Length of chromosomes in hg38 trainedModels_V1.0.0.tar.gz: Virtual ChIP-seq scikit-learn trained models saved in joblib format <CellType>.tar.gz: Pre-calculated matrices suitable for training with other algorithms or re-training with Virtual ChIP-seq. Some predictive features of TF binding are the same in each cell type and are stored together for simplicity in refTables_V1.0.0.tar.gz. You can use datasets from other cell types (named here as <CellType>.tar.gz) for the purpose of re-training the model. The <CellType>.tar.gz files contain pre-calculated predictive features of transcription factor binding in 4 chromosomes (5, 10, 15, 20). These features include: PhastCons genomic conservation FIMO score for sequence motifs of TF in the JASPAR database Chromatin accessibility TF binding in ENCODE + Cistrome DB datasets Virtual ChIP-seq expression score
Authors:
Mehran Karimzadeh, & Michael M. Hoffman
Lab: Hoffman
Year: 2018
Keywords: VirChIP, Virtual ChIP-seq, PhastCons, FIMO, ENCODE, CistromeDB, JASPAR

Citation

Citation not available.

- samples

Sample Type:N/A
Species:Human
Datatype:N/A
Technology:Various

Contact

This dataset is public.

Contact: Michael Hoffman

Contact email: Email