[Bioperl-l] blastxml format

Chris Fields cjfields at uiuc.edu
Wed Oct 25 16:56:46 UTC 2006


> Thanks for the reply. I've already tried this but I got exactly the same >
> results as before.
> What other can I try? 
> Massimo

If you don't mind me asking, what version of perl and Bioperl are you using,
and what version of BLAST is used?  

I want to point out there are a number of problems with your script, now I
have had a chance to look at it.  

1) You have the SearchIO format set to 'blast'.  It should be 'blastxml' if
you are parsing XML format.  

2) Every time you call next_result() you iterate through each BLAST report.
In effect, you're doing something like this:

  my $result = $in->next_result();
   ....# do something here (in first BLAST report)
 
  while ($result = $in->next_result()) { # change to second BLAST report
      # more stuff here (in second BLAST report, if there is one)
  }

I don't know if it's intentional though, but it's something to point out.

3) You also use raw_score(), which doesn't return a value for me (this may
be related to the bioperl version, which is why I asked above).  If you use
$hit->bits() or $hit->significance() you can get the bits or hit evalue,
respectively.

4) Also, I didn't see a difference with the two XML tags
<BlastOutput_query-def> and <Iteration_query-def> using BLAST 2.2.15 output
(WebBLAST at NCBI), which makes sense since they should originate from the
same query sequence anyway.  This could be related to the BLAST version.

Here's my version of your script, using WinXP and bioperl-live (CVS):

use Bio::SearchIO;
my $file = shift @ARGV;

my $in = new Bio::SearchIO(-format => 'blastxml',
                            -file   => $file);

open OUTFILE, ">parsed_blastn_danio.txt" || 
die "Could not open file, stopped";

while(my $result = $in->next_result ) {
    print OUTFILE $result->algorithm, "\n";
    print OUTFILE $result->database_name, "\n";
    print OUTFILE "Score", "\t",
                  "Description", "\t",
                  "NCBI gi identifiers", "\t",
                  "GenBank Accession", "\n";
    print OUTFILE $result->query_description, "\n";
    while( my $hit = $result->next_hit ) {
        while( my $hsp = $hit->next_hsp ) {
            my $acc=$hit->name;
            my $description= $hit->description;
            if ($acc =~ /gi\|(\d+)\|\w+\|(\w+)\.\d/) {
                print OUTFILE $hit->bits, "\t", # Score
                  $hit->description, "\t", # Description
                  $1, "\t", $2, "\n";
            }
        }
    }
}

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign

...





More information about the Bioperl-l mailing list