read module¶
-
class
mavis.bam.read.
SamRead
(reference_name=None, next_reference_name=None, alignment_score=None, **kwargs)[source]¶ Bases:
pysam.libcalignedsegment.AlignedSegment
Subclass to extend the pysam.AlignedSegment class adding some utility methods and convenient representations
Allows next_reference_name and reference_name to be set directly so that is does not depend on a bam header
-
alignment_id
¶
-
next_reference_name
¶
-
query_length
¶
-
reference_name
¶
-
-
mavis.bam.read.
breakpoint_pos
(read, orient='?')[source]¶ assumes the breakpoint is the position following softclipping on the side with more softclipping (unless and orientation has been specified)
Parameters: - read (
AlignedSegment
) – the read object - orient (ORIENT) – the orientation
Returns: the position of the breakpoint in the input read
Return type: - read (
-
mavis.bam.read.
calculate_alignment_score
(read, consec_bonus=1)[source]¶ calculates a score for comparing alignments
Parameters: read (pysam.AlignedSegment) – the input read Returns: the score Return type: float
-
mavis.bam.read.
convert_events_to_softclipping
(read, orientation, max_event_size, min_anchor_size=None)[source]¶ given an alignment, simplifies the alignment by grouping everything past the first anchor and including the first event considered too large and unaligning them turning them into softclipping
-
mavis.bam.read.
get_samtools_version
()[source]¶ executes a subprocess to try and run samtools and parse the version number from the output
Example
>>> get_samtools_version() (1, 2, 1)
-
mavis.bam.read.
map_ref_range_to_query_range
(read, ref_range)[source]¶ Parameters: - ref_range (Interval) – 1-based inclusive
- read (pysam.AlignedSegment) – read used for the mapping
Returns: 1-based inclusive range
Return type:
-
mavis.bam.read.
nsb_align
(ref, seq, weight_of_score=0.5, min_overlap_percent=1, min_match=0, min_consecutive_match=1, scoring_function=<function calculate_alignment_score>)[source]¶ given some reference string and a smaller sequence string computes the best non-space-breaking alignment i.e. an alignment that does not allow for indels (straight-match). Positions in the aligned segments are given relative to the length of the reference sequence (1-based)
Parameters: - ref (str) – the reference sequence
- seq (str) – the sequence being aligned
- weight_of_score (float) – when scoring alignments this determines the amount of weight to place on the cigar match. Should be a number between 0 and 1
- min_overlap_percent (float) – the minimum amount of overlap of the input sequence to the reference should be a number between 0 and 1
- min_match (float) – the minimum number of matches compared to total
- scoring_function (callable) – any function that will take a read as input and return a float used in comparing alignments to choose the best alignment
Returns: list of aligned segments
Return type: Note
using a higher min_match may improve performance as low quality alignments are rejected more quickly. However this may also result in no match being returned when there is no high quality match to be found.
-
mavis.bam.read.
orientation_supports_type
(read, event_type)[source]¶ checks if the orientation is compatible with the type of event
Parameters: - read (
AlignedSegment
) – a read from the pair - event_type (SVTYPE) – the type of event to check
Returns: True
- the read pair is in the correct orientation for this event typeFalse
- the read is not in the correct orientation
Return type: - read (
-
mavis.bam.read.
read_pair_type
(read)[source]¶ assumptions based on illumina pairs: only 4 possible combinations
Parameters: read ( AlignedSegment
) – the input readReturns: the type of input read pair Return type: READ_PAIR_TYPE Raises: NotImplementedError
– for any read that does not fall into the four expected configurations (see below)++++> <---- is LR same-strand ++++> ++++> is LL opposite <---- <---- is RR opposite <---- ++++> is RL same-strand
-
mavis.bam.read.
sequenced_strand
(read, strand_determining_read=2)[source]¶ determines the strand that was sequenced
Parameters: - read (
AlignedSegment
) – the read being used to determine the strand - strand_determining_read (int) – which read in the read pair is the same as the sequenced strand
Returns: the strand that was sequenced
Return type: Raises: ValueError
– if strand_determining_read is not 1 or 2Warning
if the input pair is unstranded the information will not be representative of the strand sequenced since the assumed convention is not followed
- read (