CHuM: Cis-acting human mutation

Reference:

Montgomery S.B., Griffith O.L., Schuetz J.M., Brooks-Wilson, A., Jones S.J.M. "A Computational Discrimination Strategy For Regulatory Polymorphisms In The Upstream Non-Coding Regions of Homo Sapiens" (Submitted).

Supplemental Information:

Download Everything:

Download all data and scripts (warning 140Mb; 132Mb of data/8Mb of scripts)

Analysis pipeline:

There are 17 steps in the pipeline for generating and analyzing the data in this study. To run the pipeline, you will need the CHuM modules and scripts installed.

rSNP to Transcript Mapping: Using the ORegAnno rSNPs, get the ENST ids from EnsEMBL
rSNP to dbSNP Mapping: Using the ORegAnno rSNPs, get the dbSNP ids
rSNP to GFF: Turn the ORegAnno rSNPs into GFF data files
rSNP groups to process: Build a group file for the ufSNPs and rSNPs to process (grouped by ENST id)
Groups to GFF: Make GFF file from Group file
Generate stacks: Get orthologues from EnsEMBL using BLASTZ_NET
Reciprocal blast stacks: Ensure orthologues are reciprocal best blasts
Extended orthologue building (optional): Use THOR package to find orthologous sequences from incomplete genomes (trace archives)
Make RSNPA XML file: Generate XML processing file from Sequences and IDs
Run all property analyses for each SNP
Summarize results: To verify what values were missed in the previous step
Build a table of all the results
SVM work: Run SVM
SVM scoring: Run crossfold analysis on SVM
Visualization: Generate plots of SNPs in transcripts
Range testing: Generate group files based on specific ranges (such as the 152bp range used in the study)
Population testing: Supplementary analyses using HapMap population
Download the pipeline scripts

Perl Modules:

Data:

Statistics:

R analysis commands and input analysis files