[Bioperl-l] StandAloneBlast or bl2seq quietly converts Ns to Ts

Sam Kalat sam.kalat at gmail.com
Mon May 9 15:40:02 EDT 2005


I'm not sure if this is a quirk of bl2seq or in bioperl.  My task is
to compare sequences that came from the same trace file, but were
processed differently: with different basecallers, trimmers, screens,
and the like.  I take two sequences at a time that come from the same
source, and BLAST them against each other using StandAloneBlast with
bl2seq.  I noticed in testing that I could take a sequence and BLAST
it against itself, and frequently such a comparison isn't perfect -
the fraction of identical bases might be somewhere in the 90's.

On examination I see stuff like this (fake data shown):

Query 1: ctgactgannnnnnnctgatcgatcgtacgtacg
Sbjct 1: ctgactgatttttttctgatcgatcgtacgtacg

The target was supposed to be the same as the subject, but anything
that was an N becomes a T in the subject, but not the query, so they
don't match up perfectly.  I don't know why T was chosen, but it is
always T.

Anyone know if this is intentional behavior?  Ultimately it means that
all Ns in sequences treated this way are mismatches.  It seems weird
to me because the sequence in question didn't have a string of Ts, and
now anything that does have a string of Ts will be more likely to
match.

Code available on request, but it doesn't try to do anything out of
the ordinary, and it runs w/o errors.

Thanks in advance
Sam Kalat



More information about the Bioperl-l mailing list