[Bioperl-l] BLAST parameters

Brian Osborne brian_osborne@cognia.com
Fri, 9 Aug 2002 13:35:18 -0400


Tats,

I just added this to bptutorial.pl, you might find it useful:

You may want to change some parameter of the remote job and this example
shows how to change the matrix:

$Bio::Tools::Run::RemoteBlast::HEADER{'MATRIX_NAME'} = 'BLOSUM25';

For a description of the many CGI parameters see:

http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html


Brian O.


-----Original Message-----
From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
Behalf Of P B
Sent: Friday, August 09, 2002 1:14 PM
To: bioperl-l@bioperl.org
Subject: [Bioperl-l] BLAST parameters

Hi all, a newbie question I think.

I haven't used bioperl before, so some of these questions might be a little
dumb, so flame away where needed.  Let me first give the goal, in case I'm
missing something conceptual here:

Goal:
I have a long list of sequences (15,000) that I would like to identify.  In
particular, I want to find out what (rat) cluster they most likely
represent.

Approach:
- submit genes one by one to remote BLAST (it's a lot of BLASTing so I'm
waiting 60 seconds between submissions (I do realize this will take 10 days,
btw, and I don't have access to a local BLAST)
- retrieve the BLAST results and parse out the top ten hits by e-value or
bit-score (undecided if there is a reason to prefer expectation values to
the normalized bit-scores?)
- for each of the top 10 hits, parse out the genbank accession
- use this accession to determine the corresponding cluster (I expect I will
have to download the unigene .dat file to do this)
- if I can assign a conclusive identity to the sequence, great, if not store
the results for future analysis

I hope to be able to automatically identify 70-80% of the sequences using
selection criteria like:
2 top hits for same cluster
3 of the top 5 hits for same cluster
6 of the top 10 hits for same cluster
or something similar.  The assignations don't have to be perfect, just
reasonably close.

Now, my (first) two problems involve submitting the BLAST to NCBI.  I'm
doing a test case with a 3-sequence FASTA file, btw.  What I would like is
to restrict my BLAST searches to "Rattus norvegicus" as you can on the NCBI
web-site under advanced options.

In addition, I would like to be able to submit customized nucleotide
substitution matrices to use with the BLAST.

That latter point isn't as critical, but I really would like to avoid having
to get back a pile of BLAST hits and have to filter through non-rat hits if
possible.

The RemoteBlast module accepts an @params array array to its ->new() method,
but I don't know what to call these parameters that I would like to use.

Any comments, suggestions, ideas are very much welcome.
Thanks in advance!
Tats

_________________________________________________________________
Send and receive Hotmail on your mobile device: http://mobile.msn.com

_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l