Description
The tracks available in this set have been generated by the Centre for Epigenome Mapping Technology (CEMT) at Canada's Michael Smith Genome Sciences Centre (BCGSC) as a part of the contribution of Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC) to the International Human Epigenome Consortium (IHEC).
The data tracks represent raw signal generated from aligned reads. Access to the raw data underlying the tracks is controlled via European Bioinformatics Institute. The data will be submitted to CEMT Reference Epigenomes (Study: EGAS00001000552) as it becomes available. More information about the project is available at www.epigenomes.ca.
Sample metadata is available at: CEMT Samples.
Methods
The wet lab protocols used are described in protocols.
The protocol for strand specific mRNA-seq assays was paired end. Fastq files corresponding to the two mate pairs were generated. The reads were aligned to a genome + transcriptome reference (see JAGuaR: Repositioning of RNA-seq Reads) using Burrows-Wheeler Aligner version 0.5.7 and SAMtools (version 0.1.13). The resulting bam files were repositioned to GRCh37-lite using JAGuaR (version 2.0.2). The bams were annotated using in-house tools (including flagging of chastity failed reads) and the duplicates were marked using Picard Tools' MarkDuplicates.jar (version 1.71).
Using in-house tools the bam was split by strand of originally sequenced cDNA fragment and gzipped wig files were generated. SAMtools flags "-F 516 -q 0" were used and GRCh37-lite chromosome names were changed to UCSC chromosome names. An in-house RNA QC and Analysis pipeline was used to generate a report containing a normalization constant for computing rpkm values. The constant was inferred from the total number of exonic reads (excluding mitchochondrial reads, reads from ribosomal genes, or reads from highest 0.5% expressed exons). The signal values from the wig files were scaled. The scaled wigs were converted to bigwigs using UCSC tools.
The command lines used were:
commands |
---|
$BWA_PATH/bwa aln -t 16 $GENOME_PLUS_JUNCTION_75_BP_REFERENCE $MATE_1_FASTQ > $MATE_1_SAI |
$BWA_PATH/bwa aln -t 16 $GENOME_PLUS_JUNCTION_75_BP_REFERENCE $MATE_2_FASTQ > $MATE_2_SAI |
$BWA_PATH/bwa sampe -s $GENOME_PLUS_JUNCTION_75_BP_REFERENCE $MATE_1_SAI $MATE_2_SAI $MATE_1_FASTQ $MATE_2_FASTQ | $SAMTOOLS_PATH/samtools view -bt $GENOME_PLUS_JUNCTION_75_BP_REFERENCE_INDEX - | $SAMTOOLS_PATH/samtools sort -n - $SORTED_PREFIX |
$PYTHON_PATH/python $JAGUAR_PATH/JAGuaR_v2.0.2/convertJunctions.py -f $BAM -i $GENOME_JUNCTION_75_BP_REFERENCE_INDEX_LOCATION -o $OUTDIR -m 37013 --samtools $SAMTOOLS_PATH |
$SAMTOOLS_PATH/samtools view -bS $JAGUAR_OUTPUT_NAME_SORTED_SAM > $NAME_SORTED_BAM |
$SAMTOLLS_PATH/samtools sort $NAME_SORTED_BAM $POSITION_SORTED_PREFIX |
$PICARD_PATH/MarkDuplicates.jar VALIDATION_STRINGENCY=SILENT I=$POSITION_SORTED_BAM O=$DUPS_FLAGGED_BAM M=$METRICS TMP_DIR=$TEMP ASSUME_SORTED=true QUIET=true CREATE_INDEX=false |
export BAM2WIG_OPTS="-s -F 516 -q 0 -n $OUTPUT_PREFIX -chr $CHR_BWA2UCSC_NAMES" |
java -jar -Xmx10G $BAM2WIG_PATH/BAM2WIG.jar -bamFile $INFILE $BAM2WIG_OPTS -out $OUTPUT_DIRECTORY; |
gunzip --stdout $SIGNAL_WIG | awk '{if($0~"^[-0-9]+$") printf $0 * $NORM "\n" ; else printf $0 "\n"}' | gzip > $RPKM_SIGNAL_WIG; |
$WIG2BIGWIG_PATH/wigToBigWig -clip $RPKM_SIGNAL_WIG $WIG2BIGWIG_PATH/hg19.chrom.sizes $RPKM_SIGNAL_BW |
Display Scheme
The various assays have been color coded as follows:
Mark | Colour |
---|---|
RNA Seq |
Where possible information about the tracks and data is included in the title. However, due to constraints on the length of these fields, this is sometimes not possible. The library information for each track is still included as the first field in the colon delimited title string. Please refer to the metadata table included on this page to look up further details by the library.
Note
When analyzing data from different sources, please note that underlying data processing and handling procedures may be different.
Contacts
Please direct any questions to: edcc@bcgsc.ca