constants module¶
module responsible for small utility functions and constants used throughout the structural_variant package
-
mavis.constants.
CALL_METHOD
= MavisNamespace(CONTIG='contig', FLANK='flanking reads', INPUT='input', SPAN='spanning reads', SPLIT='split reads', _defns={}, _types={'CONTIG': <class 'str'>, 'SPLIT': <class 'str'>, 'FLANK': <class 'str'>, 'SPAN': <class 'str'>, 'INPUT': <class 'str'>})¶ MavisNamespace
– holds controlled vocabulary for allowed call methodsCONTIG
: a contig was assembled and aligned across the breakpointsSPLIT
: the event was called by split readFLANK
: the event was called by flanking read pairSPAN
: the event was called by spanning read
-
mavis.constants.
CIGAR
= MavisNamespace(D=2, EQ=7, H=5, I=1, M=0, N=3, P=6, S=4, X=8, _defns={}, _types={'M': <class 'int'>, 'I': <class 'int'>, 'D': <class 'int'>, 'N': <class 'int'>, 'S': <class 'int'>, 'H': <class 'int'>, 'P': <class 'int'>, 'X': <class 'int'>, 'EQ': <class 'int'>})¶ MavisNamespace
– Enum-like. For readable cigar valuesM
: alignment match (can be a sequence match or mismatch)I
: insertion to the referenceD
: deletion from the referenceN
: skipped region from the referenceS
: soft clipping (clipped sequences present in SEQ)H
: hard clipping (clipped sequences NOT present in SEQ)P
: padding (silent deletion from padded reference)EQ
: sequence match (=)X
: sequence mismatch
note: descriptions are taken from the samfile documentation
-
mavis.constants.
COLUMNS
= MavisNamespace(_defns={}, _types={'tracking_id': <class 'str'>, 'library': <class 'str'>, 'cluster_id': <class 'str'>, 'cluster_size': <class 'str'>, 'validation_id': <class 'str'>, 'annotation_id': <class 'str'>, 'product_id': <class 'str'>, 'event_type': <class 'str'>, 'pairing': <class 'str'>, 'inferred_pairing': <class 'str'>, 'gene1': <class 'str'>, 'gene1_direction': <class 'str'>, 'gene2': <class 'str'>, 'gene2_direction': <class 'str'>, 'gene1_aliases': <class 'str'>, 'gene2_aliases': <class 'str'>, 'gene_product_type': <class 'str'>, 'transcript1': <class 'str'>, 'transcript2': <class 'str'>, 'fusion_splicing_pattern': <class 'str'>, 'fusion_cdna_coding_start': <class 'str'>, 'fusion_cdna_coding_end': <class 'str'>, 'fusion_mapped_domains': <class 'str'>, 'fusion_sequence_fasta_id': <class 'str'>, 'fusion_sequence_fasta_file': <class 'str'>, 'annotation_figure': <class 'str'>, 'annotation_figure_legend': <class 'str'>, 'genes_encompassed': <class 'str'>, 'genes_overlapping_break1': <class 'str'>, 'genes_overlapping_break2': <class 'str'>, 'genes_proximal_to_break1': <class 'str'>, 'genes_proximal_to_break2': <class 'str'>, 'break1_chromosome': <class 'str'>, 'break1_position_start': <class 'str'>, 'break1_position_end': <class 'str'>, 'break1_orientation': <class 'str'>, 'exon_last_5prime': <class 'str'>, 'exon_first_3prime': <class 'str'>, 'break1_strand': <class 'str'>, 'break1_seq': <class 'str'>, 'break2_chromosome': <class 'str'>, 'break2_position_start': <class 'str'>, 'break2_position_end': <class 'str'>, 'break2_orientation': <class 'str'>, 'break2_strand': <class 'str'>, 'break2_seq': <class 'str'>, 'opposing_strands': <class 'str'>, 'stranded': <class 'str'>, 'protocol': <class 'str'>, 'disease_status': <class 'str'>, 'tools': <class 'str'>, 'call_method': <class 'str'>, 'break1_ewindow': <class 'str'>, 'break1_ewindow_count': <class 'str'>, 'break1_ewindow_practical_coverage': <class 'str'>, 'break1_homologous_seq': <class 'str'>, 'break1_split_read_names': <class 'str'>, 'break1_split_reads': <class 'str'>, 'break1_split_reads_forced': <class 'str'>, 'break2_ewindow': <class 'str'>, 'break2_ewindow_count': <class 'str'>, 'break2_ewindow_practical_coverage': <class 'str'>, 'break2_homologous_seq': <class 'str'>, 'break2_split_read_names': <class 'str'>, 'break2_split_reads': <class 'str'>, 'break2_split_reads_forced': <class 'str'>, 'contig_alignment_query_consumption': <class 'str'>, 'contig_alignment_score': <class 'str'>, 'contig_alignment_query_name': <class 'str'>, 'contig_read_depth': <class 'str'>, 'contig_break1_read_depth': <class 'str'>, 'contig_break2_read_depth': <class 'str'>, 'contig_blat_rank': <class 'str'>, 'contig_build_score': <class 'str'>, 'contig_remap_score': <class 'str'>, 'contig_remap_coverage': <class 'str'>, 'contig_remapped_read_names': <class 'str'>, 'contig_remapped_reads': <class 'str'>, 'contig_seq': <class 'str'>, 'contig_strand_specific': <class 'str'>, 'contigs_aligned': <class 'str'>, 'contigs_assembled': <class 'str'>, 'spanning_reads': <class 'str'>, 'spanning_read_names': <class 'str'>, 'flanking_median_fragment_size': <class 'str'>, 'flanking_pairs': <class 'str'>, 'flanking_pairs_compatible': <class 'str'>, 'flanking_pairs_read_names': <class 'str'>, 'flanking_pairs_compatible_read_names': <class 'str'>, 'flanking_stdev_fragment_size': <class 'str'>, 'linking_split_read_names': <class 'str'>, 'linking_split_reads': <class 'str'>, 'raw_break1_half_mapped_reads': <class 'str'>, 'raw_break1_split_reads': <class 'str'>, 'raw_break2_half_mapped_reads': <class 'str'>, 'raw_break2_split_reads': <class 'str'>, 'raw_flanking_pairs': <class 'str'>, 'raw_spanning_reads': <class 'str'>, 'untemplated_seq': <class 'str'>, 'filter_comment': <class 'str'>, 'cdna_synon': <class 'str'>, 'protein_synon': <class 'str'>}, annotation_figure='annotation_figure', annotation_figure_legend='annotation_figure_legend', annotation_id='annotation_id', break1_chromosome='break1_chromosome', break1_ewindow='break1_ewindow', break1_ewindow_count='break1_ewindow_count', break1_ewindow_practical_coverage='break1_ewindow_practical_coverage', break1_homologous_seq='break1_homologous_seq', break1_orientation='break1_orientation', break1_position_end='break1_position_end', break1_position_start='break1_position_start', break1_seq='break1_seq', break1_split_read_names='break1_split_read_names', break1_split_reads='break1_split_reads', break1_split_reads_forced='break1_split_reads_forced', break1_strand='break1_strand', break2_chromosome='break2_chromosome', break2_ewindow='break2_ewindow', break2_ewindow_count='break2_ewindow_count', break2_ewindow_practical_coverage='break2_ewindow_practical_coverage', break2_homologous_seq='break2_homologous_seq', break2_orientation='break2_orientation', break2_position_end='break2_position_end', break2_position_start='break2_position_start', break2_seq='break2_seq', break2_split_read_names='break2_split_read_names', break2_split_reads='break2_split_reads', break2_split_reads_forced='break2_split_reads_forced', break2_strand='break2_strand', call_method='call_method', cdna_synon='cdna_synon', cluster_id='cluster_id', cluster_size='cluster_size', contig_alignment_query_consumption='contig_alignment_query_consumption', contig_alignment_query_name='contig_alignment_query_name', contig_alignment_score='contig_alignment_score', contig_blat_rank='contig_blat_rank', contig_break1_read_depth='contig_break1_read_depth', contig_break2_read_depth='contig_break2_read_depth', contig_build_score='contig_build_score', contig_read_depth='contig_read_depth', contig_remap_coverage='contig_remap_coverage', contig_remap_score='contig_remap_score', contig_remapped_read_names='contig_remapped_read_names', contig_remapped_reads='contig_remapped_reads', contig_seq='contig_seq', contig_strand_specific='contig_strand_specific', contigs_aligned='contigs_aligned', contigs_assembled='contigs_assembled', disease_status='disease_status', event_type='event_type', exon_first_3prime='exon_first_3prime', exon_last_5prime='exon_last_5prime', filter_comment='filter_comment', flanking_median_fragment_size='flanking_median_fragment_size', flanking_pairs='flanking_pairs', flanking_pairs_compatible='flanking_pairs_compatible', flanking_pairs_compatible_read_names='flanking_pairs_compatible_read_names', flanking_pairs_read_names='flanking_pairs_read_names', flanking_stdev_fragment_size='flanking_stdev_fragment_size', fusion_cdna_coding_end='fusion_cdna_coding_end', fusion_cdna_coding_start='fusion_cdna_coding_start', fusion_mapped_domains='fusion_mapped_domains', fusion_sequence_fasta_file='fusion_sequence_fasta_file', fusion_sequence_fasta_id='fusion_sequence_fasta_id', fusion_splicing_pattern='fusion_splicing_pattern', gene1='gene1', gene1_aliases='gene1_aliases', gene1_direction='gene1_direction', gene2='gene2', gene2_aliases='gene2_aliases', gene2_direction='gene2_direction', gene_product_type='gene_product_type', genes_encompassed='genes_encompassed', genes_overlapping_break1='genes_overlapping_break1', genes_overlapping_break2='genes_overlapping_break2', genes_proximal_to_break1='genes_proximal_to_break1', genes_proximal_to_break2='genes_proximal_to_break2', inferred_pairing='inferred_pairing', library='library', linking_split_read_names='linking_split_read_names', linking_split_reads='linking_split_reads', opposing_strands='opposing_strands', pairing='pairing', product_id='product_id', protein_synon='protein_synon', protocol='protocol', raw_break1_half_mapped_reads='raw_break1_half_mapped_reads', raw_break1_split_reads='raw_break1_split_reads', raw_break2_half_mapped_reads='raw_break2_half_mapped_reads', raw_break2_split_reads='raw_break2_split_reads', raw_flanking_pairs='raw_flanking_pairs', raw_spanning_reads='raw_spanning_reads', spanning_read_names='spanning_read_names', spanning_reads='spanning_reads', stranded='stranded', tools='tools', tracking_id='tracking_id', transcript1='transcript1', transcript2='transcript2', untemplated_seq='untemplated_seq', validation_id='validation_id')¶ MavisNamespace
– Column names for i/o files used throughout the pipeline- annotation_figure_legend
- annotation_figure
- annotation_id
- break1_chromosome
- break1_ewindow_count
- break1_ewindow_practical_coverage
- break1_ewindow
- break1_homologous_seq
- break1_orientation
- break1_position_end
- break1_position_start
- break1_seq
- break1_split_reads_forced
- break1_split_reads
- break1_strand
- break2_chromosome
- break2_ewindow_count
- break2_ewindow_practical_coverage
- break2_ewindow
- break2_homologous_seq
- break2_orientation
- break2_position_end
- break2_position_start
- break2_seq
- break2_split_reads_forced
- break2_split_reads
- break2_strand
- call_method
- cdna_synon
- cluster_id
- cluster_size
- contig_alignment_cigar
- contig_alignment_query_name
- contig_alignment_reference_start
- contig_alignment_score
- contig_build_score
- contig_remap_coverage
- contig_remap_score
- contig_remapped_read_names
- contig_remapped_reads
- contig_seq
- contig_strand_specific
- contigs_aligned
- contigs_assembled
- event_type
- flanking_median_fragment_size
- flanking_pairs_compatible
- flanking_pairs
- flanking_stdev_fragment_size
- fusion_cdna_coding_end
- fusion_cdna_coding_end
- fusion_cdna_coding_start
- fusion_mapped_domains
- fusion_sequence_fasta_file
- fusion_sequence_fasta_id
- fusion_splicing_pattern
- gene1_aliases
- gene1_direction
- gene1
- gene2_aliases
- gene2_direction
- gene2
- gene_product_type
- genes_encompassed
- genes_overlapping_break1
- genes_overlapping_break2
- genes_proximal_to_break1
- genes_proximal_to_break2
- inferred_pairing
- library
- linking_split_reads
- opposing_strands
- pairing
- product_id
- protein_synon
- protocol
- raw_break1_split_reads
- raw_break2_split_reads
- raw_flanking_pairs
- raw_spanning_reads
- spanning_read_names
- spanning_reads
- stranded
- tools
- tracking_id
- transcript1
- transcript2
- untemplated_seq
- validation_id
-
mavis.constants.
DISEASE_STATUS
= MavisNamespace(DISEASED='diseased', NORMAL='normal', _defns={}, _types={'DISEASED': <class 'str'>, 'NORMAL': <class 'str'>})¶ MavisNamespace
– holds controlled vocabulary for allowed disease statusDISEASED
: diseasedNORMAL
: normal
-
mavis.constants.
GENE_PRODUCT_TYPE
= MavisNamespace(ANTI_SENSE='anti-sense', SENSE='sense', _defns={}, _types={'SENSE': <class 'str'>, 'ANTI_SENSE': <class 'str'>})¶ MavisNamespace
– controlled vocabulary for gene productsSENSE
: the gene product is a sense fusionANTI_SENSE
: the gene product is anti-sense
-
mavis.constants.
GIEMSA_STAIN
= MavisNamespace(ACEN='acen', GNEG='gneg', GPOS100='gpos100', GPOS25='gpos25', GPOS50='gpos50', GPOS75='gpos75', GVAR='gvar', STALK='stalk', _defns={}, _types={'GNEG': <class 'str'>, 'GPOS50': <class 'str'>, 'GPOS75': <class 'str'>, 'GPOS25': <class 'str'>, 'GPOS100': <class 'str'>, 'ACEN': <class 'str'>, 'GVAR': <class 'str'>, 'STALK': <class 'str'>})¶ MavisNamespace
– holds controlled vocabulary relating to stains of chromosome bands
-
class
mavis.constants.
MavisNamespace
(**kwargs)[source]¶ Bases:
argparse.Namespace
Namespace to hold module constants
Example
>>> nspace = MavisNamespace(thing=1, otherthing=2) >>> nspace.thing 1 >>> nspace.otherthing 2
-
add
(attr, *pos, **kwargs)[source]¶ Add an attribute to the name space. Optionally include cast_type and definition
Example
>>> nspace = MavisNamespace() >>> nspace.add('thing', 1, int, 'I am a thing') >>> nspace = MavisNamespace() >>> nspace.add('thing', 1, int) >>> nspace = MavisNamespace() >>> nspace.add('thing', 1) >>> nspace = MavisNamespace() >>> nspace.add('thing', value=1, cast_type=int, defn='I am a thing')
-
define
(attr, *pos)[source]¶ Get the definition of a given attribute or return a default (when given) if the attribute does not exist
Returns: definition for the attribute Return type: str Raises: KeyError
– the attribute does not exist and a default was not givenExample
>>> nspace = MavisNamespace() >>> nspace.add('thing', 1, defn='I am a thing') >>> nspace.add('otherthing', 2) >>> nspace.define('thing') 'I am a thing' >>> nspace.define('otherthing') Traceback (most recent call last): .... >>> nspace.define('otherthing', 'I am some other thing') 'I am some other thing'
-
enforce
(value)[source]¶ checks that the current namespace has a given value
Returns: the input value Raises: KeyError
– the value did not existExample
>>> nspace = MavisNamespace(thing=1, otherthing=2) >>> nspace.enforce(1) 1 >>> nspace.enforce(3) Traceback (most recent call last): ....
-
flatten
()[source]¶ returns the namespace (minus types and definitions) as a dictionary
Example
>>> MavisNamespace(thing=1, otherthing=2).flatten() {'thing': 1, 'otherthing': 2}
-
get
(key, *pos)[source]¶ get an attribute, return a default (if given) if the attribute does not exist
Example
>>> nspace = MavisNamespace(thing=1, otherthing=2) >>> nspace.get('thing', 2) 1 >>> nspace.get('nonexistant_thing', 2) 2 >>> nspace.get('nonexistant_thing') Traceback (most recent call last): ....
-
items
()[source]¶ Example
>>> MavisNamespace(thing=1, otherthing=2).items() [('thing', 1), ('otherthing', 2)]
-
keys
()[source]¶ get the attribute keys as a list
Example
>>> MavisNamespace(thing=1, otherthing=2).keys() ['thing', 'otherthing']
-
reserved_attr
= ['_types', '_defns']¶
-
reverse
(value)[source]¶ for a given value, return the associated key
Parameters: value – the value to get the key/attribute name for
Raises: Example
>>> nspace = MavisNamespace(thing=1, otherthing=2) >>> nspace.reverse(1) 'thing'
-
-
mavis.constants.
NA_MAPPING_QUALITY
= 255¶ int
– mapping quality value to indicate mapping was not performed/calculated
-
mavis.constants.
ORIENT
= MavisNamespace(LEFT='L', NS='?', RIGHT='R', _defns={}, _types={'LEFT': <class 'str'>, 'RIGHT': <class 'str'>, 'NS': <class 'str'>}, compare=<function <lambda>>, expand=<function <lambda>>)¶ MavisNamespace
– holds controlled vocabulary for allowed orientation valuesLEFT
: left wrt to the positive/forward strandRIGHT
: right wrt to the positive/forward strandNS
: orientation is not specified
-
mavis.constants.
PRIME
= MavisNamespace(FIVE=5, THREE=3, _defns={}, _types={'FIVE': <class 'int'>, 'THREE': <class 'int'>})¶ MavisNamespace
– holds controlled vocabularyFIVE
: five primeTHREE
: three prime
-
mavis.constants.
PROTOCOL
= MavisNamespace(GENOME='genome', TRANS='transcriptome', _defns={}, _types={'GENOME': <class 'str'>, 'TRANS': <class 'str'>})¶ MavisNamespace
– holds controlled vocabulary for allowed protocol valuesGENOME
: genomeTRANS
: transcriptome
-
mavis.constants.
PYSAM_READ_FLAGS
= MavisNamespace(BLAT_ALIGNMENTS='ba', BLAT_PERCENT_IDENTITY='bi', BLAT_PMS='bp', BLAT_RANK='br', BLAT_SCORE='bs', FIRST_IN_PAIR=64, LAST_IN_PAIR=128, MATE_REVERSE=32, MATE_UNMAPPED=8, MULTIMAP=1, RECOMPUTED_CIGAR='rc', REVERSE=16, SECONDARY=256, SUPPLEMENTARY=2048, TARGETED_ALIGNMENT='ta', UNMAPPED=4, _defns={}, _types={'REVERSE': <class 'int'>, 'MATE_REVERSE': <class 'int'>, 'UNMAPPED': <class 'int'>, 'MATE_UNMAPPED': <class 'int'>, 'FIRST_IN_PAIR': <class 'int'>, 'LAST_IN_PAIR': <class 'int'>, 'SECONDARY': <class 'int'>, 'MULTIMAP': <class 'int'>, 'SUPPLEMENTARY': <class 'int'>, 'TARGETED_ALIGNMENT': <class 'str'>, 'RECOMPUTED_CIGAR': <class 'str'>, 'BLAT_RANK': <class 'str'>, 'BLAT_SCORE': <class 'str'>, 'BLAT_ALIGNMENTS': <class 'str'>, 'BLAT_PERCENT_IDENTITY': <class 'str'>, 'BLAT_PMS': <class 'str'>})¶ MavisNamespace
– Enum-like. For readable PYSAM flag constantsMULTIMAP
: template having multiple segments in sequencingUNMAPPED
: segment unmappedMATE_UNMAPPED
: next segment in the template unmappedREVERSE
: SEQ being reverse complementedMATE_REVERSE
: SEQ of the next segment in the template being reverse complementedFIRST_IN_PAIR
: the first segment in the templateLAST_IN_PAIR
: the last segment in the templateSECONDARY
: secondary alignmentSUPPLEMENTARY
: supplementary alignment
note: descriptions are taken from the samfile documentation
-
mavis.constants.
STRAND
= MavisNamespace(NEG='-', NS='?', POS='+', _defns={}, _types={'POS': <class 'str'>, 'NEG': <class 'str'>, 'NS': <class 'str'>}, compare=<function <lambda>>, expand=<function <lambda>>)¶ MavisNamespace
– holds controlled vocabulary for allowed strand valuesPOS
: the positive/forward strandNEG
: the negative/reverse strandNS
: strand is not specified
-
mavis.constants.
SUBCOMMAND
= MavisNamespace(ANNOTATE='annotate', CHECKER='checker', CLUSTER='cluster', CONFIG='config', CONVERT='convert', OVERLAY='overlay', PAIR='pairing', PIPELINE='pipeline', SUMMARY='summary', VALIDATE='validate', _defns={}, _types={'ANNOTATE': <class 'str'>, 'VALIDATE': <class 'str'>, 'PIPELINE': <class 'str'>, 'CLUSTER': <class 'str'>, 'PAIR': <class 'str'>, 'SUMMARY': <class 'str'>, 'CHECKER': <class 'str'>, 'CONFIG': <class 'str'>, 'CONVERT': <class 'str'>, 'OVERLAY': <class 'str'>})¶ MavisNamespace
– holds controlled vocabulary for allowed pipeline stage values- annotate
- validate
- pipeline
- cluster
- pairing
- summary
- checker
- config
- convert
-
mavis.constants.
SVTYPE
= MavisNamespace(DEL='deletion', DUP='duplication', INS='insertion', INV='inversion', ITRANS='inverted translocation', TRANS='translocation', _defns={}, _types={'DEL': <class 'str'>, 'TRANS': <class 'str'>, 'ITRANS': <class 'str'>, 'INV': <class 'str'>, 'INS': <class 'str'>, 'DUP': <class 'str'>})¶ MavisNamespace
– holds controlled vocabulary for acceptable structural variant classificationsDEL
: deletionTRANS
: translocationITRANS
: inverted translocationINV
: inversionINS
: insertionDUP
: duplication
-
mavis.constants.
float_fraction
(num)[source]¶ cast input to a float
Parameters: num – input to cast Returns: float Raises: TypeError
– if the input cannot be cast to a float or the number is not between 0 and 1
-
mavis.constants.
reverse_complement
(s)[source]¶ wrapper for the Bio.Seq reverse_complement method
Parameters: s (str) – the input DNA sequence Returns: the reverse complement of the input sequence Return type: str
Warning
assumes the input is a DNA sequence
Example
>>> reverse_complement('ATCCGGT') 'ACCGGAT'