blat module¶
In general the coordinates in psl files are “zero based half open.” The first base in a sequence is numbered
zero rather than one. When representing a range the end coordinate is not included in the range. Thus the first
100 bases of a sequence are represented as 0-100, and the second 100 bases are represented as 100-200. There is
another little unusual feature in the .psl format. It has to do with how coordinates are handled on the
negative strand. In the qStart/qEnd fields the coordinates are where it matches from the point of view of the forward
strand (even when the match is on the reverse strand). However on the qStarts[] list, the coordinates are reversed.
– http://wiki.bits.vib.be/index.php/Blat
-
class
mavis.blat.
Blat
[source]¶ Bases:
object
-
static
millibad
(row, is_protein=False, is_mrna=True)[source]¶ this function is used in calculating percent identity direct translation of the perl code # https://genome.ucsc.edu/FAQ/FAQblat.html#blat4
-
static
pslx_row_to_pysam
(row, bam_cache, reference_genome)[source]¶ given a ‘row’ from reading a pslx file. converts the row to a BlatAlignedSegment object
Parameters:
-
static
score
(row, is_protein=False)[source]¶ direct translation from ucsc guidelines on replicating the web blat score https://genome.ucsc.edu/FAQ/FAQblat.html#blat4
below are lines from the perl code i’ve re-written in python
my $sizeMul = pslIsProtein($blockCount, $strand, $tStart, $tEnd, $tSize, $tStarts, $blockSizes); sizmul = 1 for DNA my $pslScore = $sizeMul * ($matches + ($repMatches >> 1) ) - $sizeMul * $misMatches - $qNumInsert - $tNumIns ert)
-
static