[Bioperl-l] cdd-search with remoteblast?

Chris Fields cjfields1 at gmail.com
Fri Jul 10 18:04:43 UTC 2009


Malcolm,

Nice!  Go ahead and add the test in; we can look at trying to get  
CDD_SEARCH working at some point but this is a nice workaround.

chris

On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote:

> Chris, I've added a test to bioperl RemoteBlast.t that demonstrates  
> the following.  Is it appropriate to submit it?
>
> Jonas, OK, I was a little quick on the gun... but I've got it now.
>
> You don't need to change the wrapper.  Here is what you need to do:
>
> # 1) set your database like this:
>
> -database => 'cdsearch/cdd', # c.f. http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html 
>  for other cdd database options
>
> # 2) add this line before submitting the job:
> $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast';
>
> You're in - No other changes needed.
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
>> -----Original Message-----
>> From: Jonas Schaer [mailto:Brotelzwieb at gmx.de]
>> Sent: Friday, July 10, 2009 4:18 AM
>> To: BioPerl List; Cook, Malcolm; Chris Fields
>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>
>> Hi,
>> I tried to do what Malcom proposed my ($prog = 'rpsblast';
>> my $db   =
>> 'CDD';)  but that didn't work.
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: Value rpsblast for PUT parameter PROGRAM does not match
>> expression t?blast[ pnx]. Rejecting.
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>> C:/Perl/site/lib/Bio/Tools
>> /Run/RemoteBlast.pm:329
>> STACK: Bio::Tools::Run::RemoteBlast::new
>> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl
>> ast.pm:257
>> STACK: blast_a_seq2.pm:14
>> -----------------------------------------------------------
>> So I should try to "change the wrapper to allow 'rpsblast'",
>> right? Could You tell me how to do that, please? So sorry but
>> I have no idea yet...:) If that doesn't work, is there any
>> other way to run cdd-searches with perl?
>> Thank you so much!
>> Regards, Jonas
>>
>> ----- Original Message -----
>> From: "Chris Fields" <cjfields1 at gmail.com>
>> To: "Cook, Malcolm" <MEC at stowers.org>
>> Cc: "'Jonas Schaer'" <Brotelzwieb at gmx.de>; "'BioPerl List'"
>> <bioperl-l at lists.open-bio.org>; "'Smithies, Russell'"
>> <Russell.Smithies at agresearch.co.nz>; <cjfields at bioperl.org>
>> Sent: Thursday, July 09, 2009 9:19 PM
>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>
>>
>>> I've scheduled this tentatively for the 1.6 release series (just not
>>> sure when yet).  It may work as is, but I haven't tried it out yet
>>> (and am hazarding to guess it only retrieves the single main RID at
>>> the moment).
>>>
>>> chris
>>>
>>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote:
>>>
>>>> Jonas,
>>>>
>>>> If you want to continue to use the bioperl remoteblast interface,
>>>> probably what you should do is simply call it twice.
>>>>
>>>> Once, as you already know how to do, which will return without CDD
>>>> results.
>>>>
>>>> Secondly, to get the CDD results, call remoteblast a second time.
>>>> This time, using
>>>> -database => 'CDD'
>>>> -program => 'rpsblast'
>>>>
>>>> However, the wrapper may object to the 'rpsblast' program.  It is
>>>> not listed in the POD -
>>>>
>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R
>> emoteBlast.pm)
>>>>   If so, my guess is that changing the perl wrapper to allow
>>>> rpsblast will "just work" (tm).  I've cc:ed
>> cjfields at bioperl.org for
>>>> his opinion on this.
>>>>
>>>> Also, you might want to perform the CDD search first, especially if
>>>> you are streaming results to eyeball that might like something to
>>>> look at while the second (presumably longer) search is running.
>>>>
>>>> Cheers,
>>>>
>>>> Malcolm Cook
>>>> Stowers Institute for Medical Research - Kansas City, Missouri
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>> Jonas Schaer
>>>>> Sent: Thursday, July 09, 2009 5:16 AM
>>>>> To: BioPerl List; Smithies, Russell
>>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>>>>
>>>>> Hi guys,
>>>>> Thank you all so much for your help and patience :). Of
>>>>> course you were right and I finaly found the right
>>>>> put-parameter to get exactly the same hits as on the homepage.
>>>>> I do have an other question though :)...
>>>>> I now want to include a search for conserved domains, but
>>>>> when I try to use the CDD_SEARCH-parameter
>>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#
>>>>> sub:CDD_SEARCH)
>>>>> like the other put-parameters the way chris once told
>>>>> me(works fine with the other params):
>>>>>
>>>>> my %put = (
>>>>>   WORD_SIZE => 3,
>>>>>   HITLIST_SIZE => 100,
>>>>>   THRESHOLD => 11,
>>>>>   FILTER => 'R',
>>>>>   GENETIC_CODE => 1,
>>>>>   CDD_SEARCH => 'on'
>>>>> ###I tried it
>>>>> with 'true' and '1', too.
>>>>>
>>>>> );
>>>>>
>>>>> for my $putName (keys %put) {
>>>>>   $factory->submit_parameter($putName,$put{$putName});
>>>>> }
>>>>>
>>>>>
>>>>> ...an exception is thrown:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: CDD_SEARCH is not a valid PUT parameter.
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw
>> C:/Perl/site/lib/Bio/Root/Root.pm:359
>>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>>> C:/Perl/site/lib/Bio/Tools
>>>>> /Run/RemoteBlast.pm:325
>>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383
>>>>> STACK: main::blast_it firsteval0.8.pm:288
>>>>> STACK: firsteval0.8.pm:35
>>>>> ----------------------------------------------------------- .
>>>>> I guess somehow this could be the solution to my problem:
>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s
>>>>> ub:RID-for-Simultaneous
>>>>> , but unfortunately I don't understand what to do.
>>>>> I'm so sorry to bother you with this but please help me once
>>>>> more...:)
>>>>>
>>>>> Best regards and thanks in advance,
>>>>> Jonas
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
>>>>> To: "'Jonas Schaer'" <Brotelzwieb at gmx.de>
>>>>> Cc: "'Chris Fields'" <cjfields at illinois.edu>; "'BioPerl List'"
>>>>> <bioperl-l at lists.open-bio.org>
>>>>> Sent: Monday, July 06, 2009 10:56 PM
>>>>> Subject: RE: [Bioperl-l] different results with
>> remote-blast skript
>>>>>
>>>>>
>>>>> Hi Jonas,
>>>>> You can't just play with the BLAST parameters and hope
>> for a "better"
>>>>> result.
>>>>> I'd suggest that if you aren't sure what they do, you should
>>>>> leave them
>>>>> alone as small changes can make huge differences in the
>>>>> output - it's quite
>>>>> possible to miss finding what you're looking for by using
>> the wrong
>>>>> parameters.
>>>>> If all else fails, read the blast manual:
>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall
>>>>> _all.html
>>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/
>>>>> Or Read Ian Korfs' excellent book:
>>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp
>>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3
>>>>>
>>>>> Don't worry about the integer overflow bug as there's nothing
>>>>> you can do
>>>>> about it. If you're interested, Google and Wikipedia are your
>>>>> friends:
>>>>> http://en.wikipedia.org/wiki/Integer_overflow
>>>>>
>>>>>
>>>>> Russell
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer
>>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m.
>>>>>> To: BioPerl List; Chris Fields
>>>>>> Subject: Re: [Bioperl-l] different results with
>> remote-blast skript
>>>>>>
>>>>>> Hi guys, thanks for your answers so far.
>>>>>> @jason: integer overflow in blast.... sorry, but what do
>>>>> you mean by that?
>>>>>> how can I fix it...?
>>>>>>
>>>>>> Since I never really changed any parameters I thought them
>>>>> all to be
>>>>>> default.
>>>>>> whatever, I tried to get "better" results with my prog
>> by changing
>>>>>> these:
>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>>>>
>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI
>>>>> STICS'} =
>>>>>> '1';
>>>>>> with no effect...I guess these were default values anyway.
>>>>>>
>>>>>> So please maybe you can tell me all the other parameters I
>>>>> can change with
>>>>>> my
>>>>>> perl-skript AND how to do that?
>>>>>> Unfortunately both, perl and the blast-algorithm are pretty
>>>>> much new to
>>>>>> me,
>>>>>> maybe thats why I just cannot find out how to do that on my
>>>>> own... :/
>>>>>>
>>>>>> Here is the output I get with my remote-blast skript:
>>>>>>
>>>>> ##############################################################
>>>>> ################
>>>>>> ###################################
>>>>>> Query Name:
>>>>>>
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
>>>>>> L
>>>>>>       hit name is ref|XP_001702807.1|
>>>>>>               score is 442
>>>>>> BLASTP 2.2.21+
>>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
>>>>> A. Schaffer,
>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
>>>>> Lipman (1997),
>>>>>> "Gapped
>>>>>> BLAST and PSI-BLAST: a new generation of protein database search
>>>>>> programs",
>>>>>> Nucleic Acids Res. 25:3389-3402.
>>>>>>
>>>>>>
>>>>>> Reference for composition-based statistics: Alejandro A.
>>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
>>>>> John L. Spouge,
>>>>>> Yuri
>>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
>>>>> "Improving the
>>>>>> accuracy of PSI-BLAST protein database searches with
>>>>> composition-based
>>>>>> statistics and other refinements", Nucleic Acids Res.
>> 29:2994-3005.
>>>>>>
>>>>>>
>>>>>> RID: 53STX5G2013
>>>>>>
>>>>>>
>>>>>> Database: All non-redundant GenBank CDS
>>>>>> translations+PDB+SwissProt+PIR+PRF excluding
>> environmental samples
>>>>>> from WGS projects
>>>>>>          9,252,587 sequences; 3,169,972,781 total letters Query=
>>>>>>
>>>>>
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
>>>>>>
>>>>>
>> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
>>>>>> ATGPDPDDEYE
>>>>>> Length=150
>>>>>>
>>>>>>
>>>>>>
>>>>>      Score
>>>>>> E
>>>>>> Sequences producing significant alignments:
>>>>>     (Bits)
>>>>>> Value
>>>>>>
>>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>>>>> reinhard...   174
>>>>>> 2e-42
>>>>>>
>>>>>>
>>>>>> ALIGNMENTS
>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas
>> reinhardtii]
>>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
>>>>>> Length=303
>>>>>>
>>>>>> Score =  174 bits (442),  Expect = 2e-42, Method:
>>>>> Composition-based
>>>>>> stats.
>>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%),
>>>>> Gaps = 0/150
>>>>>> (0%)
>>>>>>
>>>>>> Query  1
>>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
>>>>>> 60
>>>>>>
>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>>>>> Sbjct  154
>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>>>>> 213
>>>>>>
>>>>>> Query  61
>>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>> 120
>>>>>>
>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>> Sbjct  214
>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>> 273
>>>>>>
>>>>>> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
>>>>>>           AWHERDDNAFRQAHQNTAMATGPDPDDEYE
>>>>>> Sbjct  274  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
>>>>>>
>>>>>>
>>>>>>
>>>>>> Database: All non-redundant GenBank CDS
>>>>>> translations+PDB+SwissProt+PIR+PRF
>>>>>> excluding environmental samples from WGS projects
>>>>>>   Posted date:  Jul 5, 2009  4:41 AM
>>>>>> Number of letters in database: -1,124,994,511
>>>>>> Number of sequences in database:  9,252,587
>>>>>>
>>>>>> Lambda     K      H
>>>>>>  0.309    0.122    0.345
>>>>>> Gapped
>>>>>> Lambda     K      H
>>>>>>  0.267   0.0410    0.140
>>>>>> Matrix: BLOSUM62
>>>>>> Gap Penalties: Existence: 11, Extension: 1
>>>>>> Number of Sequences: 9252587
>>>>>> Number of Hits to DB: 60273703
>>>>>> Number of extensions: 1448367
>>>>>> Number of successful extensions: 2103
>>>>>> Number of sequences better than 10: 0
>>>>>> Number of HSP's better than 10 without gapping: 0
>>>>>> Number of HSP's gapped: 2113
>>>>>> Number of HSP's successfully gapped: 0
>>>>>> Length of query: 150
>>>>>> Length of database: 3169972781
>>>>>> Length adjustment: 113
>>>>>> Effective length of query: 37
>>>>>> Effective length of database: 2124430450
>>>>>> Effective search space: 78603926650
>>>>>> Effective search space used: 78603926650
>>>>>> T: 11
>>>>>> A: 40
>>>>>> X1: 16 (7.1 bits)
>>>>>> X2: 38 (14.6 bits)
>>>>>> X3: 64 (24.7 bits)
>>>>>> S1: 42 (20.8 bits)
>>>>>> S2: 74 (33.1 bits)
>>>>>>
>>>>>>
>>>>> ##############################################################
>>>>> ################
>>>>>> ###################################
>>>>>> and here are the hits (?) of the blast-algorithm on the
>>>>> ncbi-homepage with
>>>>>> the same query of course:
>>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>>>>> reinhard...   300
>>>>>> 3e-80
>>>>>> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA
>>>>> [Acyrtho...  36.2
>>>>>> 1.1
>>>>>> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880
>>>>> [Blautia...  35.4
>>>>>> 1.8
>>>>>> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania
>>>>> brazil...  34.3
>>>>>> 4.2
>>>>>> ref|XP_680841.1|  hypothetical protein AN7572.2
>>>>> [Aspergillus n...  33.5
>>>>>> 6.0
>>>>>> ref|YP_001768110.1|  hypothetical protein M446_1150
>>>>> [Methyloba...  33.5
>>>>>> 7.0
>>>>>>
>>>>> ##############################################################
>>>>> ################
>>>>>> ###################################at
>>>>>> least the first hit is the same, but even there there is a
>>>>> different score
>>>>>> and e-value.
>>>>>>
>>>>>> thanks so much for any help :)
>>>>>> regards, jonas
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>> From: "Chris Fields" <cjfields at illinois.edu>
>>>>>> To: "Jason Stajich" <jason at bioperl.org>
>>>>>> Cc: "Smithies, Russell"
>>>>> <Russell.Smithies at agresearch.co.nz>; "'BioPerl
>>>>>> List'" <bioperl-l at lists.open-bio.org>; "'Jonas Schaer'"
>>>>>> <Jonas_Schaer at gmx.de>
>>>>>> Sent: Monday, July 06, 2009 12:51 AM
>>>>>> Subject: Re: [Bioperl-l] different results with
>> remote-blast skript
>>>>>>
>>>>>>
>>>>>>> That inspires confidence ;>
>>>>>>>
>>>>>>> chris
>>>>>>>
>>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
>>>>>>>
>>>>>>>> integer overflow in blast....
>>>>>>>>
>>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>>>>>>>>
>>>>>>>>> I'd guess it's a difference in the parameters used.
>>>>>>>>> Interesting that both have the number of letters in the db as
>>>>>>>>> "-1,125,070,205", I assume that's a bug  :-)
>>>>>>>>>
>>>>>>>>> Stats from your remote_blast:
>>>>>>>>>
>>>>>>>>> 'stats' => {
>>>>>>>>>          'S1' => '42',
>>>>>>>>>          'S1_bits' => '20.8',
>>>>>>>>>          'lambda' => '0.309',
>>>>>>>>>          'entropy' => '0.345',
>>>>>>>>>          'kappa_gapped' => '0.0410',
>>>>>>>>>          'T' => '11',
>>>>>>>>>          'kappa' => '0.122',
>>>>>>>>>          'X3_bits' => '24.7',
>>>>>>>>>          'X1' => '16',
>>>>>>>>>          'lambda_gapped' => '0.267',
>>>>>>>>>          'X2' => '38',
>>>>>>>>>          'S2' => '74',
>>>>>>>>>          'seqs_better_than_cutoff' => '0',
>>>>>>>>>          'posted_date' => 'Jul 4, 2009  4:41 AM',
>>>>>>>>>          'Hits_to_DB' => '60102303',
>>>>>>>>>          'dbletters' => '-1125070205',
>>>>>>>>>          'A' => '40',
>>>>>>>>>          'num_successful_extensions' => '2004',
>>>>>>>>>          'num_extensions' => '1436892',
>>>>>>>>>          'X1_bits' => '7.1',
>>>>>>>>>          'X3' => '64',
>>>>>>>>>          'entropy_gapped' => '0.140',
>>>>>>>>>          'dbentries' => '9252258',
>>>>>>>>>          'X2_bits' => '14.6',
>>>>>>>>>          'S2_bits' => '33.1'
>>>>>>>>>        }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Stats from a blast done on the NCBI webpage:
>>>>>>>>>
>>>>>>>>> Database: All non-redundant GenBank CDS
>>>>> translations+PDB+SwissProt
>>>>>>>>> +PIR+PRF
>>>>>>>>> excluding environmental samples from WGS projects
>>>>>>>>> Posted date:  Jul 4, 2009  4:41 AM
>>>>>>>>> Number of letters in database: -1,125,070,205
>>>>>>>>> Number of sequences in database:  9,252,258
>>>>>>>>>
>>>>>>>>> Lambda     K      H
>>>>>>>>> 0.309    0.124    0.340
>>>>>>>>> Gapped
>>>>>>>>> Lambda     K      H
>>>>>>>>> 0.267   0.0410    0.140
>>>>>>>>> Matrix: BLOSUM62
>>>>>>>>> Gap Penalties: Existence: 11, Extension: 1
>>>>>>>>> Number of Sequences: 9252258
>>>>>>>>> Number of Hits to DB: 86493230
>>>>>>>>> Number of extensions: 3101413
>>>>>>>>> Number of successful extensions: 9001
>>>>>>>>> Number of sequences better than 100: 65
>>>>>>>>> Number of HSP's better than 100 without gapping: 0
>>>>>>>>> Number of HSP's gapped: 9000
>>>>>>>>> Number of HSP's successfully gapped: 66
>>>>>>>>> Length of query: 150
>>>>>>>>> Length of database: 3169897087
>>>>>>>>> Length adjustment: 113
>>>>>>>>> Effective length of query: 37
>>>>>>>>> Effective length of database: 2124391933
>>>>>>>>> Effective search space: 78602501521
>>>>>>>>> Effective search space used: 78602501521
>>>>>>>>> T: 11
>>>>>>>>> A: 40
>>>>>>>>> X1: 16 (7.1 bits)
>>>>>>>>> X2: 38 (14.6 bits)
>>>>>>>>> X3: 64 (24.7 bits)
>>>>>>>>> S1: 42 (20.8 bits)
>>>>>>>>> S2: 65 (29.6 bits)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer
>>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>>>>>>>>>> To: BioPerl List
>>>>>>>>>> Subject: [Bioperl-l] different results with
>> remote-blast skript
>>>>>>>>>>
>>>>>>>>>> Hi again :)
>>>>>>>>>> please, I only have this little question:
>>>>>>>>>> why do I get different results with my remote::blast
>>>>> perl skript
>>>>>>>>>> then on the
>>>>>>>>>> ncbi blast homepage?
>>>>>>>>>> I am using blastp, the query is an amino-sequence (different
>>>>>>>>>> results with any
>>>>>>>>>> sequence, differences not only in number of hits but
>> even in e-
>>>>>>>>>> values, scores
>>>>>>>>>> etc...), the database is 'nr'.
>>>>>>>>>> PLEASE help me,
>>>>>>>>>> thank you in advance,
>>>>>>>>>> Jonas
>>>>>>>>>>
>>>>>>>>>> ps: my skript:
>>>>>>>>>>
>>>>>>
>>>>> ##############################################################
>>>>> ################
>>>>>>>>>> ##
>>>>>>>>>> use Bio::Seq::SeqFactory;
>>>>>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>>>>>> use strict;
>>>>>>>>>> my @blast_report;
>>>>>>>>>> my $prog = 'blastp';
>>>>>>>>>> my $db   = 'nr';
>>>>>>>>>> my $e_val= '1e-10';
>>>>>>>>>> #my $e_val= '10';
>>>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>>>      '-data' => $db,
>>>>>>>>>>      '-expect' => $e_val,
>>>>>>>>>>      '-readmethod' => 'SearchIO' );
>>>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>>>>>>>> $
>>>>>>>>>> Bio
>>>>>>>>>>
>>>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
>>>>>>>>>> = '1';
>>>>>>>>>>
>>>>>>>>>> my
>>>>>>>>>> $
>>>>>>>>>> blast_seq
>>>>>>>>>>
>>>>>
>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>>>>>>>>>>
>>>>>>
>>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN
>>>>> AFRQAHQNTAMATGPD
>>>>>>>>>> PDDEYE';
>>>>>>>>>> #$v is just to turn on and off the messages
>>>>>>>>>> my $v = 1;
>>>>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
>>>>>>>>>> 'Bio::PrimarySeq');
>>>>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq,
>> -display_id =>
>>>>>>>>>> "$blast_seq");
>>>>>>>>>> my $filename='temp2.out';
>>>>>>>>>> my $r = $factory->submit_blast($seq);
>>>>>>>>>> print STDERR "waiting..." if( $v > 0 );
>>>>>>>>>> while ( my @rids = $factory->each_rid )
>>>>>>>>>> {
>>>>>>>>>>     foreach my $rid ( @rids )
>>>>>>>>>>     {
>>>>>>>>>>         my $rc = $factory->retrieve_blast($rid);
>>>>>>>>>>         if( !ref($rc) )
>>>>>>>>>>         {
>>>>>>>>>>             if( $rc < 0 )
>>>>>>>>>>             {
>>>>>>>>>>                 $factory->remove_rid($rid);
>>>>>>>>>>             }
>>>>>>>>>>             print STDERR "." if ( $v > 0 );
>>>>>>>>>>         }
>>>>>>>>>>             else
>>>>>>>>>>             {
>>>>>>>>>>                 my $result = $rc->next_result();
>>>>>>>>>>                 $factory->save_output($filename);
>>>>>>>>>>                 $factory->remove_rid($rid);
>>>>>>>>>>                 print "\nQuery Name: ",
>>>>> $result->query_name(),
>>>>>>>>>> "\n";
>>>>>>>>>>                 while ( my $hit = $result->next_hit )
>>>>>>>>>>                 {
>>>>>>>>>>                     next unless ( $v > 0);
>>>>>>>>>>                     print "\thit name is ",
>> $hit->name, "\n";
>>>>>>>>>>                     while( my $hsp = $hit->next_hsp )
>>>>>>>>>>                     {
>>>>>>>>>>                         print "\t\tscore is ",
>>>>> $hsp->score, "\n";
>>>>>>>>>>                     }
>>>>>>>>>>                 }
>>>>>>>>>>             }
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>> @blast_report = get_file_data ($filename);
>>>>>>>>>> return @blast_report;
>>>>>>>>>>
>>>>>>
>>>>> ##############################################################
>>>>> ################
>>>>>>>>>> ####
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>>
>>>>>
>> =====================================================================
>>>>>>>>> Attention: The information contained in this message and/or
>>>>>>>>> attachments
>>>>>>>>> from AgResearch Limited is intended only for the
>>>>> persons or entities
>>>>>>>>> to which it is addressed and may contain confidential and/or
>>>>>>>>> privileged
>>>>>>>>> material. Any review, retransmission, dissemination
>> or other use
>>>>>>>>> of, or
>>>>>>>>> taking of any action in reliance upon, this information
>>>>> by persons or
>>>>>>>>> entities other than the intended recipients is prohibited by
>>>>>>>>> AgResearch
>>>>>>>>> Limited. If you have received this message in error,
>>>>> please notify
>>>>>>>>> the
>>>>>>>>> sender immediately.
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>>
>>>>>
>> =====================================================================
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason Stajich
>>>>>>>> jason at bioperl.org
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>> --------------------------------------------------------------
>>>>> ----------------
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>> No virus found in this incoming message.
>>>>>> Checked by AVG - www.avg.com
>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release
>>>>> Date: 07/05/09
>>>>>> 05:53:00
>>>>>
>>>>>
>>>>> --------------------------------------------------------------
>>>>> ------------------
>>>>>
>>>>>
>>>>>
>>>>> No virus found in this incoming message.
>>>>> Checked by AVG - www.avg.com
>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release
>>>>> Date: 07/05/09
>>>>> 17:54:00
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>
>>
>> --------------------------------------------------------------
>> ------------------
>>
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release
>> Date: 07/09/09
>> 05:55:00
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list