evidence module

class mavis.validate.evidence.GenomeEvidence(*pos, **kwargs)[source]

Bases: mavis.validate.base.Evidence

compute_fragment_size(read, mate=None)[source]
generate_window(breakpoint)[source]

given some input breakpoint uses the current evidence setting to determine an appropriate window/range of where one should search for supporting reads

Parameters:
  • breakpoint (Breakpoint) – the breakpoint we are generating the evidence window for
  • read_length (int) – the read length
  • call_error (int) – adds a buffer to the calculations if confidence in the breakpoint calls is low can increase this
Returns:

the range where reads should be read from the bam looking for evidence for this event

Return type:

Interval

class mavis.validate.evidence.TranscriptomeEvidence(annotations, *pos, **kwargs)[source]

Bases: mavis.validate.base.Evidence

compute_fragment_size(read, mate)[source]
distance(start, end, strand='?', chrom=None)[source]

give the current list of transcripts, computes the putative exonic/intergenic distance given two genomic positions. Intronic positions are ignored

Intergenic calculations are only done if exonic only fails

exon_boundary_shift_cigar(read)[source]

given an input read, converts deletions to N when the deletion matches the exon boundaries. Also shifts alignments to correspond to the exon boundaries where possible

generate_window(breakpoint)[source]

given some input breakpoint uses the current evidence setting to determine an appropriate window/range of where one should search for supporting reads

Parameters:
  • breakpoint (Breakpoint) – the breakpoint we are generating the evidence window for
  • annotations (dict of str and list of Gene) – the set of reference annotations: genes, transcripts, etc
  • read_length (int) – the read length
  • median_fragment_size (int) – the median insert size
  • call_error (int) – adds a buffer to the calculations if confidence in the breakpoint calls is low can increase this
  • stdev_fragment_size – the standard deviation away from the median for regular (non STV) read pairs
Returns:

the range where reads should be read from the bam looking for evidence for this event

Return type:

Interval

min_cds_shift(pos, strand='?', chrom=None)[source]
standardize_read(read)[source]
traverse(start, distance, direction, strand='?', chrom=None)[source]

given some genomic position and a distance. Uses the input transcripts to compute all possible genomic end positions at that distance if intronic positions are ignored

Parameters:
  • start (int) – the genomic start position
  • distance (int) – the amount of exonic/intergenic units to traverse
  • direction (ORIENT) – the direction wrt to the positive/forward reference strand to traverse
  • transcripts (list of PreTranscript) – list of transcripts to use