[Bioperl-l] blastxml format

Massimo Ubaldi massimo.ubaldi at gmail.com
Wed Oct 25 14:28:52 UTC 2006


Hi
I'm using the script below to parse a blastn output to multiple sequences
I got the output from the blast web interface asking for xml formatted
output.
Everything work fine except that I cannot print the name of each input
sequence (see below).
That is, using the line (see below) $result->query_description I got just
the name of the first sequence. Infact this is defined by the
<BlastOutput_query-def> tag.
What I really want is to extract the name that is defined by the
<Iteration_query-def> tag.
Now I digged out the bioperl mailing list and other sources but I did not
find anything to solve this.
Can somebody help me?
Thanks alot
Massimo


 This is an example of ouput I got

MRDNA_probe
46.1    PREDICTED: Danio rerio similar to mineralocorticoid receptor form B
(LOC562171), mRNA    68354945    XM_685568
81.8    Danio rerio VDR-B mRNA, partial cds    68132043    DQ017633
PREDICTED: Danio rerio similar to Rarab protein (LOC560679), mRNA
68420187    XM_684078

This what I'd like to get
MRDNA_probe
46.1    PREDICTED: Danio rerio similar to mineralocorticoid receptor form B
(LOC562171), mRNA    68354945    XM_685568
VDRacterm_probe
81.8    Danio rerio VDR-B mRNA, partial cds    68132043    DQ017633
ARalpcterm_probe
PREDICTED: Danio rerio similar to Rarab protein (LOC560679), mRNA
68420187    XM_684078

This is the script
#!/usr/bin/perl
use strict;
use Bio::SearchIO;
my $in = new Bio::SearchIO(-format => 'blast',
                            -file   => 'Blastn_danio.bls');
open OUTFILE, ">parsed_blastn_danio.txt" or die "Could not open file,
stopped";
my $result = $in->next_result;
print OUTFILE $result->algorithm, "\n";
print OUTFILE $result->database_name, "\n";

print OUTFILE "Score", "\t", "Description", "\t", "NCBI gi identifiers",
"\t", "GenBank Accession", "\n";

while($result = $in->next_result ) {
    print OUTFILE $result->query_description, "\n";
      while( my $hit = $result->next_hit ) {
           while( my $hsp = $hit->next_hsp ) {

                my $acc=$hit->name;
                my $description= $hit->description;

                $acc =~ /gi\|(\d+)\|\w+\|(\w+)\.\d/;

                print OUTFILE

                  $hit->raw_score, "\t", # Score
                  $hit->description, "\t", # Description

                $1, "\t", $2, "\n";
         }
      }
}



More information about the Bioperl-l mailing list