variant module

class mavis.annotate.variant.Annotation(bpp, transcript1=None, transcript2=None, proximity=5000, data=None, **kwargs)[source]

Bases: mavis.breakpoint.BreakpointPair

a fusion of two transcripts created by the associated breakpoint_pair will also hold the other annotations for overlapping and encompassed and nearest genes

Holds a breakpoint call and a set of transcripts, other information is gathered relative to these

Parameters:
  • bpp (BreakpointPair) – the breakpoint pair call. Will be adjusted and then stored based on the transcripts
  • transcript1 (Transcript) – transcript at the first breakpoint
  • transcript2 (Transcript) – Transcript at the second breakpoint
  • data (dict) – optional dictionary to hold related attributes
  • event_type (SVTYPE) – the type of event
add_gene(input_gene)[source]

adds a input_gene to the current set of annotations. Checks which set it should be added to

Parameters:input_gene (input_gene) – the input_gene being added
flatten()[source]

generates a dictionary of the annotation information as strings

Returns:dictionary of attribute names and values
Return type:dict of str by str
class mavis.annotate.variant.FusionTranscript[source]

Bases: mavis.annotate.genomic.UsTranscript

classmethod build(ann, reference_genome, min_orf_size=None, max_orf_cap=None, min_domain_mapping_match=None)[source]
Parameters:
  • ann (Annotation) – the annotation object we want to build a FusionTranscript for
  • reference_genome (dict of Bio.SeqRecord by str) – dict of reference sequence by template/chr name
Returns:

the newly built fusion transcript

Return type:

FusionTranscript

exon_number(exon)[source]
Parameters:exon (Exon) – the exon to be numbered
Returns:the number of the exon in the original transcript (prior to fusion)
Return type:int
get_cdna_seq(splicing_pattern, reference_genome=None, ignore_cache=False)[source]
Parameters:
  • splicing_pattern (list of int) – the list of splicing positions
  • reference_genome (dict of Bio.SeqRecord by str) – dict of reference seq by template/chr name
Returns:

the spliced cDNA seq

Return type:

str

get_seq(reference_genome=None, ignore_cache=False)[source]
map_region_to_genome(chr, interval_on_fusion, genome_interval, flipped=False)[source]
mavis.annotate.variant.annotate_events(bpps, annotations, reference_genome, max_proximity=5000, min_orf_size=200, min_domain_mapping_match=0.95, max_orf_cap=3, log=<function devnull>, filters=None)[source]
Parameters:
  • bpps (list of BreakpointPair) – list of events
  • annotations – reference annotations
  • reference_genome (dict of string by string) – dictionary of reference sequences by name
  • max_proximity (int) – see max_proximity
  • min_orf_size (int) – see min_orf_size
  • min_domain_mapping_match (float) – see min_domain_mapping_match
  • max_orf_cap (int) – see max_orf_cap
  • log (callable) – callable function to take in strings and time_stamp args
  • filters (list of callable) – list of functions taking in a list and returning a list for filtering
Returns:

list of the putative annotations

Return type:

list of Annotation

mavis.annotate.variant.choose_more_annotated(ann_list)[source]

for a given set of annotations if there are annotations which contain transcripts and annotations that are simply intergenic regions, discard the intergenic region annotations

similarly if there are annotations where both breakpoints fall in a transcript and annotations where one or more breakpoints lands in an intergenic region, discard those that land in the intergenic region

Parameters:ann_list (list of Annotation) – list of input annotations

Warning

input annotations are assumed to be the same event (the same validation_id) the logic used would not apply to different events

Returns:the filtered list
Return type:list of Annotation
mavis.annotate.variant.choose_transcripts_by_priority(ann_list)[source]

for each set of annotations with the same combinations of genes, choose the annotation with the most “best_transcripts” or most “alphanumeric” choices of transcript. Throw an error if they are identical

Parameters:ann_list (list of Annotation) – input annotations

Warning

input annotations are assumed to be the same event (the same validation_id) the logic used would not apply to different events

Returns:the filtered list
Return type:list of Annotation
mavis.annotate.variant.determine_prime(transcript, breakpoint)[source]

determine the side of the transcript 5’ or 3’ which is ‘kept’ given the breakpoint

Parameters:
Returns:

5’ or 3’

Return type:

PRIME

Raises:

AttributeError – if the orientation of the breakpoint or the strand of the transcript is not specified

mavis.annotate.variant.flatten_fusion_transcript(spliced_fusion_transcript)[source]
mavis.annotate.variant.flatten_fusion_translation(translation)[source]

for a given fusion product (translation) gather the information to be output to the tabbed files

Parameters:translation (Translation) – the translation which is on the fusion transcript
Returns:the dictionary of column names to values
Return type:dict
mavis.annotate.variant.overlapping_transcripts(ref_ann, breakpoint)[source]
Parameters:
  • ref_ann (dict of list of Gene by str) – the reference list of genes split by chromosome
  • breakpoint (Breakpoint) – the breakpoint in question
Returns:

a list of possible transcripts

Return type:

list of UsTranscript