[Bioperl-l] RemoteBlast problem
Scott Markel
smarkel at scitegic.com
Fri Oct 28 16:53:03 EDT 2005
Tom,
I haven't seen your exact problem, but I recently saw something
similar. In my case (blastn, NM_021267, Chromosome DB, default
parameters), one of the hits has a comment about the sequence
features.
===============================================
>gi|23618947|ref|NC_004331.1| Download subject sequence spanning the HSP Plasmodium falciparum 3D7 chromosome 13, *** SEQUENCING IN PROGRESS
***, 33 ordered pieces
Length=2732359
Features in this part of subject sequence:
hypothetical protein
Score = 46.1 bits (23), Expect = 0.27
Identities = 23/23 (100%), Gaps = 0/23 (0%)
Strand=Plus/Minus
Query 830 ACATCCCCTTCTACTTCTTCTTC 852
|||||||||||||||||||||||
Sbjct 1369466 ACATCCCCTTCTACTTCTTCTTC 1369444
===============================================
BioPerl's parser is choking on the extra lines.
Scott
Thomas J Keller wrote:
> Greetings,
> I'm using perl 5.8.6 and the fink installation of bioperl 1.4.5 on an
> Apple G5 running OS X 10.4.2
> running my script with a simple fasta dna sequence file:
> $ bp_remote_blast2.pl -p blastx -d nr -i test.fa
>
> Here's the error:
>
> ------------- EXCEPTION -------------
> MSG: no data for midline Query 78
> HRRPSFSACRCVLSASSVFPSRLGNNYITAAGAQVLAEGLRGNTSLQFLG 227
> STACK Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/
> SearchIO/blast.pm:1151
> STACK toplevel /Users/kellert/Sandbox/Perlscripts/bp_remote_blast2.pl:90
>
>
> It looks like Bio::SearchIO::blast
> is choking on the result from Bio::Tools::Run::RemoteBlast
>
> I lifted this from Jason's exampleL bp_remote_blast but modified it to
> use SearchIO method instead of BPLite:
>
> use warnings;
> use strict;
> use vars qw($USAGE);
>
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SeqIO;
> use Bio::SearchIO;
> use Getopt::Long;
>
> $USAGE = "remote_blast.pl [-h] [-p prog] [-d db] [-e expect] [-f
> seqformat] -i seqfile\n";
>
> my ($prog, $db, $expect );
>
> my ($sequencefile,$sequenceformat,$help) = (undef, 'fasta',undef);
>
> &GetOptions('prog|p=s' => \$prog,
> 'db|d=s' => \$db,
> 'expect|e=s' => \$expect,
> 'input|i=s' => \$sequencefile,
> 'format|f=s' => \$sequenceformat,
> 'help|h' => \$help,
> );
>
> if( $help ) {
> exec('perldoc', $0);
> die;
> }
>
> if( !defined $prog ) {
> die($USAGE . "\n\tMust specify a valid program name ([t]blast
> [pxn])\n");
> }
> if( !defined $db ) {
> die($USAGE . "\n\tMust specify a db (e.g. \'nr\') to search\n");
> }
> if( !defined $sequencefile ) {
> die($USAGE . "\n\tMust specify a path to a sequence file.\n");
> }
>
> my $factory = new Bio::Tools::Run::RemoteBlast ('-prog' => $prog,
> '-data' => $db,
> '-expect' => $expect,
> 'readmethod' => 'SearchIO', #use
> SearchIO to parse
> );
>
> # submit_blast can only currenly handle fasta format files so I'll
> # preprocess outside of the module but I'd rather be sure here
>
> my $input;
> if( $sequenceformat !~ /fasta/ ) {
> my @seqs;
> my $seqio = new Bio::SeqIO('-format' => $sequenceformat,
> '-file' => $sequencefile );
> while( my $seq = $seqio->next_seq() ) {
> push @seqs, $seq;
> }
> $input = \@seqs;
> } else {
> $input = $sequencefile;
> }
> my $r = $factory->submit_blast($input);
>
> my $v = 1;
> ## set to 0 to turn off Sanity check messages
> print STDERR "retrieving blasts for $input ...\n" if( $v > 0);
>
> while ( my @rids = $factory->each_rid ) {
> foreach my $rid ( @rids ) {
> my $rc = $factory->retrieve_blast($rid);
> if( !ref($rc) ) {
> if( $rc < 0 ) {
> $factory->remove_rid($rid);
> }
> print STDERR "." if ( $v > 0 );
> sleep 5;
> } else {
> my $result = $rc->next_result();
> next unless ($result);
> print "\nQuery Name: ", $result->query_name(), "\n";
> #save the output
> my $filename = $result->query_name()."\.out";
> print STDERR "Saving result to $filename.\n" if( $v > 0);
> $factory->save_output($filename);
> $factory->remove_rid($rid);
> print "\nQuery Name: ", $result->query_name(), "\n";
> while ( my $hit = $result->next_hit ) {
> next unless ( $v > 0);
> print "\thit name is ", $hit->name, "\n";
> while( my $hsp = $hit->next_hsp ) {
> print "\t\tscore is ", $hsp- >score,
> "\n";
> }
> }
> }
> }
> }
>
> I'm guessing either NCBI has changed it blast format and bioperl 1.4 no
> longer works, or I'm missing something that should be obvious.
>
> Help much apprecieated.
>
> Tom K
>
>
> Thomas J. Keller, Ph.D.
> Director, MMI Core Facility
> Oregon Health & Science University
> 3181 SW Sam Jackson Park Rd.
> Portland, OR, USA, 97239
>
> http://www.ohsu.edu/research/core
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Scott Markel, Ph.D.
Principal Bioinformatics Architect email: smarkel at scitegic.com
SciTegic Inc. mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253
San Diego, CA 92123 fax: +1 858 279 8804
USA web: http://www.scitegic.com
More information about the Bioperl-l
mailing list