[Bioperl-l] BLAST parameters

P B itatsumaki@hotmail.com
Fri, 09 Aug 2002 17:13:44 +0000


Hi all, a newbie question I think.

I haven't used bioperl before, so some of these questions might be a little 
dumb, so flame away where needed.  Let me first give the goal, in case I'm 
missing something conceptual here:

Goal:
I have a long list of sequences (15,000) that I would like to identify.  In 
particular, I want to find out what (rat) cluster they most likely 
represent.

Approach:
- submit genes one by one to remote BLAST (it's a lot of BLASTing so I'm 
waiting 60 seconds between submissions (I do realize this will take 10 days, 
btw, and I don't have access to a local BLAST)
- retrieve the BLAST results and parse out the top ten hits by e-value or 
bit-score (undecided if there is a reason to prefer expectation values to 
the normalized bit-scores?)
- for each of the top 10 hits, parse out the genbank accession
- use this accession to determine the corresponding cluster (I expect I will 
have to download the unigene .dat file to do this)
- if I can assign a conclusive identity to the sequence, great, if not store 
the results for future analysis

I hope to be able to automatically identify 70-80% of the sequences using 
selection criteria like:
2 top hits for same cluster
3 of the top 5 hits for same cluster
6 of the top 10 hits for same cluster
or something similar.  The assignations don't have to be perfect, just 
reasonably close.

Now, my (first) two problems involve submitting the BLAST to NCBI.  I'm 
doing a test case with a 3-sequence FASTA file, btw.  What I would like is 
to restrict my BLAST searches to "Rattus norvegicus" as you can on the NCBI 
web-site under advanced options.

In addition, I would like to be able to submit customized nucleotide 
substitution matrices to use with the BLAST.

That latter point isn't as critical, but I really would like to avoid having 
to get back a pile of BLAST hits and have to filter through non-rat hits if 
possible.

The RemoteBlast module accepts an @params array array to its ->new() method, 
but I don't know what to call these parameters that I would like to use.

Any comments, suggestions, ideas are very much welcome.
Thanks in advance!
Tats

_________________________________________________________________
Send and receive Hotmail on your mobile device: http://mobile.msn.com