[Bioperl-l] blastxml format
Massimo Ubaldi
massimo.ubaldi at gmail.com
Wed Oct 25 14:28:52 UTC 2006
Hi
I'm using the script below to parse a blastn output to multiple sequences
I got the output from the blast web interface asking for xml formatted
output.
Everything work fine except that I cannot print the name of each input
sequence (see below).
That is, using the line (see below) $result->query_description I got just
the name of the first sequence. Infact this is defined by the
<BlastOutput_query-def> tag.
What I really want is to extract the name that is defined by the
<Iteration_query-def> tag.
Now I digged out the bioperl mailing list and other sources but I did not
find anything to solve this.
Can somebody help me?
Thanks alot
Massimo
This is an example of ouput I got
MRDNA_probe
46.1 PREDICTED: Danio rerio similar to mineralocorticoid receptor form B
(LOC562171), mRNA 68354945 XM_685568
81.8 Danio rerio VDR-B mRNA, partial cds 68132043 DQ017633
PREDICTED: Danio rerio similar to Rarab protein (LOC560679), mRNA
68420187 XM_684078
This what I'd like to get
MRDNA_probe
46.1 PREDICTED: Danio rerio similar to mineralocorticoid receptor form B
(LOC562171), mRNA 68354945 XM_685568
VDRacterm_probe
81.8 Danio rerio VDR-B mRNA, partial cds 68132043 DQ017633
ARalpcterm_probe
PREDICTED: Danio rerio similar to Rarab protein (LOC560679), mRNA
68420187 XM_684078
This is the script
#!/usr/bin/perl
use strict;
use Bio::SearchIO;
my $in = new Bio::SearchIO(-format => 'blast',
-file => 'Blastn_danio.bls');
open OUTFILE, ">parsed_blastn_danio.txt" or die "Could not open file,
stopped";
my $result = $in->next_result;
print OUTFILE $result->algorithm, "\n";
print OUTFILE $result->database_name, "\n";
print OUTFILE "Score", "\t", "Description", "\t", "NCBI gi identifiers",
"\t", "GenBank Accession", "\n";
while($result = $in->next_result ) {
print OUTFILE $result->query_description, "\n";
while( my $hit = $result->next_hit ) {
while( my $hsp = $hit->next_hsp ) {
my $acc=$hit->name;
my $description= $hit->description;
$acc =~ /gi\|(\d+)\|\w+\|(\w+)\.\d/;
print OUTFILE
$hit->raw_score, "\t", # Score
$hit->description, "\t", # Description
$1, "\t", $2, "\n";
}
}
}
More information about the Bioperl-l
mailing list