[Bioperl-l] Problem Parsing BLAST output to annotate FASTA file
Andreas Leimbach
andreas.leimbach at uni-wuerzburg.de
Wed Feb 20 16:14:29 UTC 2013
Hi Ann,
I agree with Adam, but I was already writing my email, while his came
in. Hope it helps:
I hope I understand correctly what you want to do.
Just to clarify, you queried a protein blast database with blastx and
nucleotide queries. Now you want to associate the protein description
for the FIRST blast hit with the corresponding nucleotide fasta file. Is
that correct?
You have to put the two while loops into one another. Or associate the
blast hits with the query descriptions. But it's not feasible to take
the first blast hit and the first nucleotide fasta seq, then the 2nd of
both etc, as Adam already pointed out.
You would have to iterate through both at the same time. I.e. take the
first blast hit, then iterate through the nucleotide fasta until you
find the hit. Then take the 2nd blast hit and iterate through the
nucleotide fasta etc. It's probably easiest to do this in a hash.
Something along the lines of (not tested I just punched that in the E-Mail):
my %hits;
my $hit_desc;
my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
"$ARGV[0]");
while (my $result = $search_in->next_result) {
while (my $hit = $result->next_hit) {
while (my $hsp = $hit->next_hsp) {
if ($hit->description eq $hit_desc) { # Only want the first blast hit
next;
}
my $hit_desc = $hit->description;
$hits{$result->query_description} = $hit_desc;
}
}
}
my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
foreach my $query (keys %hits) {
while (my $seqobj = $seqio->next_seq) {
if ($seqobj->display_id eq $query) {
print ">$hits{$query}\n";
my $nuc = $seqobj->seq();
print $nuc, "\n";
}
You might want to put some evalue cutoff in there to only score
significant hits. Also if your nucleotide query multi-fasta file is very
large, you might consider creating an index first:
http://www.bioperl.org/wiki/HOWTO:Local_Databases#Bio::Index
Hope that helps!
Cheers,
Andreas
P.S.: Please next time include version numbers for BioPerl and Perl and
a little more detail what you want to do. ;-)
--
Andreas Leimbach
Universität Münster
Institut für Hygiene
Mendelstr. 7
D-48149 Münster
Germany
Tel.: +49 (0)551 39 3843
E-Mail: andreas.leimbach at uni-wuerzburg.de
On 20.2.13 06:20, Ann Gregory wrote:
> Hi BioPerl,
>
> I am having issues with a BioPerl script. I have a blastxml file from a
> blastx blast and the original multifasta file containing the original
> nucleotides sequences.
>
> I want to take the blast result (ie. the blast description) and annotate my
> multifasta file.
>
> I have written 2 while loops that extract the blast descriptions as well as
> the nucleotide sequence from the multifasta file.
>
> My problem is that I cannot incorporate one of the while loops into the
> other without loosing the loop property of one of the loops. I would like
> to take the 1st blast description, then the 1st nucleotide sequence, then
> the 2nd blast description, then the 2nd nucleotide sequence and so
> on...just can figure out how to alternate the results.
>
> See script below:
>
>
> use warnings;
> use strict;
> use Bio::SearchIO;
> use Bio::SeqIO;
>
>
> my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
> "$ARGV[0]");
> while (my $result = $search_in->next_result) {
> while (my $hit = $result->next_hit) {
> while (my $hsp = $hit->next_hsp) {
> my $qd = $hit->description;
> print $qd, "\n";
> }
> }
> }
>
> my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
> while (my $seqobj = $seqio->next_seq) {
> my $nuc = $seqobj->seq();
> print $nuc, "\n";
> }--
> Ann (Nina) Gregory
> Graduate Student
> Rich Lab / Sullivan Lab
> Soil, Water, Environmental Science Department
> University of Arizona
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list