[Bioperl-l] Small word sizes with BLAST (WU, NCBI)
Andrew Walsh
paeruginosa at hotmail.com
Wed Mar 3 14:51:59 EST 2004
Hello,
My question is not really related to a specific Bioperl library, so I
apologize. If there is a specific 'BLAST' newsgroup, I will be happy to
post there. But I was hoping somebody on the Bioperl list had some
experience doing nucleic acid searches with small word sizes.
I would like to search for small (5-7) bp matches between an oligo sequence
and a ~100,000 mRNA database. I've tried doing this with WU-BLAST and
NCBI-BLAST. NCBI-BLAST does not allow word sizes below 7, so I've tried
lots of different command line parameters for WU-BLAST.
I've tried these searches with versions 2.0a19 (alpha) and 2.0 of WU-BLAST.
I get quite strange results when I start lowering the word size below the
default (11). For example, with the alpha version, I get more hits with a
word size of 10 than I do with a word size of 7. With the beta version, I
get the same number of hits with word sizes 10 and 7. I've checked this by
hand, and the 'missing' hits do in fact have stretches of 7 continuous bps
matching.
Here is an example of one of the command lines I've tried running:
blastn human_refseq.fasta seq3.fasta W=5 S=5 M=1 V=100000 B=100000
I've tried adjusting every parameter I thought would affect the search
results, but still cannot recover the 'missing' hits.
Maybe BLAST is the wrong tool for this. I'd just like something that's
fast. If anyone has some advice, it would be greatly appreciated.
Thanks a lot,
Andrew
_________________________________________________________________
Add photos to your messages with MSN 8. Get 2 months FREE*.
http://join.msn.com/?page=dept/features&pgmarket=en-ca&RU=http%3a%2f%2fjoin.msn.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca
More information about the Bioperl-l
mailing list