[Bioperl-l] BLAST parameters

P B itatsumaki@hotmail.com
Fri, 09 Aug 2002 19:27:42 +0000


Hi Brian,

Thanks!  It will be great to know how to change parameters like this.  My 
one question is: what is that line of code $Bio::...HEADER{'MATRIX_NAME'} = 
'BLOSUM25' actually doing?  Is HEADER a hash in the RemoteBlast name-space?  
It's not a crucial point, but I like knowing what's actually going on as 
much as possible.

Thanks again,
Tats

>From: "Brian Osborne" <brian_osborne@cognia.com>
>To: "P B" <itatsumaki@hotmail.com>, <bioperl-l@bioperl.org>
>Subject: RE: [Bioperl-l] BLAST parameters
>Date: Fri, 9 Aug 2002 13:35:18 -0400
>Return-Path: brian_osborne@cognia.com
>X-OriginalArrivalTime: 09 Aug 2002 17:35:43.0969 (UTC)
>
>Tats,
>
>I just added this to bptutorial.pl, you might find it useful:
>
>You may want to change some parameter of the remote job and this example
>shows how to change the matrix:
>
>$Bio::Tools::Run::RemoteBlast::HEADER{'MATRIX_NAME'} = 'BLOSUM25';
>
>For a description of the many CGI parameters see:
>
>http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html
>
>
>Brian O.
>
>
>-----Original Message-----
>From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
>Behalf Of P B
>Sent: Friday, August 09, 2002 1:14 PM
>To: bioperl-l@bioperl.org
>Subject: [Bioperl-l] BLAST parameters
>
>Hi all, a newbie question I think.
>
>I haven't used bioperl before, so some of these questions might be a little
>dumb, so flame away where needed.  Let me first give the goal, in case I'm
>missing something conceptual here:
>
>Goal:
>I have a long list of sequences (15,000) that I would like to identify.  In
>particular, I want to find out what (rat) cluster they most likely
>represent.
>
>Approach:
>- submit genes one by one to remote BLAST (it's a lot of BLASTing so I'm
>waiting 60 seconds between submissions (I do realize this will take 10 
>days,
>btw, and I don't have access to a local BLAST)
>- retrieve the BLAST results and parse out the top ten hits by e-value or
>bit-score (undecided if there is a reason to prefer expectation values to
>the normalized bit-scores?)
>- for each of the top 10 hits, parse out the genbank accession
>- use this accession to determine the corresponding cluster (I expect I 
>will
>have to download the unigene .dat file to do this)
>- if I can assign a conclusive identity to the sequence, great, if not 
>store
>the results for future analysis
>
>I hope to be able to automatically identify 70-80% of the sequences using
>selection criteria like:
>2 top hits for same cluster
>3 of the top 5 hits for same cluster
>6 of the top 10 hits for same cluster
>or something similar.  The assignations don't have to be perfect, just
>reasonably close.
>
>Now, my (first) two problems involve submitting the BLAST to NCBI.  I'm
>doing a test case with a 3-sequence FASTA file, btw.  What I would like is
>to restrict my BLAST searches to "Rattus norvegicus" as you can on the NCBI
>web-site under advanced options.
>
>In addition, I would like to be able to submit customized nucleotide
>substitution matrices to use with the BLAST.
>
>That latter point isn't as critical, but I really would like to avoid 
>having
>to get back a pile of BLAST hits and have to filter through non-rat hits if
>possible.
>
>The RemoteBlast module accepts an @params array array to its ->new() 
>method,
>but I don't know what to call these parameters that I would like to use.
>
>Any comments, suggestions, ideas are very much welcome.
>Thanks in advance!
>Tats


_________________________________________________________________
MSN Photos is the easiest way to share and print your photos: 
http://photos.msn.com/support/worldwide.aspx