[Bioperl-l] Problem Parsing BLAST output to annotate FASTA file
Andreas Leimbach
andreas.leimbach at uni-wuerzburg.de
Wed Feb 20 17:00:51 UTC 2013
Hey Ann,
damn, it 's not my best day ... Anyways, I wouldn't work with
List::MoreUtils's each_array function, as this assumes that the blast
hits and the nucleotide queries are in the same order (as Adam pointed
out). Rather use a hash which associates a key to a certain value. Also,
the hash can be used to skip sequences that have no hits.
Here's my new version:
my %hits;
my $hit_desc;
my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
"$ARGV[0]");
while (my $result = $search_in->next_result) {
while (my $hit = $result->next_hit) {
while (my $hsp = $hit->next_hsp) {
$hits{$result->query_description} = $hit->description; # hash: associate
query_desc (key) with hit_desc (value)
last; # jump out of the while loop; this should resolve getting only the
first hit
}
last; # see above
}
}
my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
while (my $seqobj = $seqio->next_seq) {
if ($hits{$seqobj->display_id}) { # only true if display_id associated
with hit_desc and should skip seqs without hits
print ">$hits{$seqobj->display_id}\n";
my $nuc = $seqobj->seq();
print $nuc, "\n";
}
}
Cheers,
Andreas
P.S.: I redirected your mail to the BioPerl mailing list, others might
profit from my mistakes ;-) ...
--
Andreas Leimbach
Universität Münster
Institut für Hygiene
Mendelstr. 7
D-48149 Münster
Germany
Tel.: +49 (0)551 39 3843
E-Mail: andreas.leimbach at uni-wuerzburg.de
On 20.2.13 17:35, Ann Gregory wrote:
> Hi Andreas,
>
> Thanks for you help! I don't understand how this gets the first blast hit:
>
> if ($hit->description eq $hit_desc) { # Only want the first blast hit
> next;
> }
>
> I tried this and seems to be working...but I can't get the 1st blast hit
> or skip the sequences that had no hits. Do you know any quick fixes?
>
> *
> use warnings;
> use strict;
> use Bio::SearchIO;
> use Bio::SeqIO;
> use List::MoreUtils qw(each_array);
>
> my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
> "$ARGV[0]");
> my @ids;
> while (my $result = $search_in->next_result) {
> while (my $hit = $result->next_hit) {
> while (my $hsp = $hit->next_hsp) {
> my $match = $result->num_hits;
> push(@ids, $qd);
> }
> }
> }
> }
>
> my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
> my @seqs;
> while (my $seqobj = $seqio->next_seq) {
> my $nuc = $seqobj->seq();
> push(@seqs, $nuc);
> }
>
> my $it = each_array(@ids, at seqs);
> while(my($ids,$seqs)=$it->()){
> print $ids, "\n", $seqs, "\n";
> }
> *
>
> Thanks again!
> ~Ann
>
> On Wed, Feb 20, 2013 at 9:24 AM, Andreas Leimbach
> <andreas.leimbach at uni-wuerzburg.de
> <mailto:andreas.leimbach at uni-wuerzburg.de>> wrote:
>
> oops, I just realized I had one loop to much in there. Adam is
> correct. Sorry.
>
> The last part of the code I send you should look like this:
>
>
> my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
> while (my $seqobj = $seqio->next_seq) {
> print ">$hits{$seqobj->display_id}\__n";
>
> my $nuc = $seqobj->seq();
> print $nuc, "\n";
> }
>
>
> Cheers,
> Andreas
>
>
> --
> Andreas Leimbach
> Universität Münster
> Institut für Hygiene
> Mendelstr. 7
> D-48149 Münster
> Germany
>
> Tel.: +49 (0)551 39 3843 <tel:%2B49%20%280%29551%2039%203843>
> E-Mail: andreas.leimbach at uni-__wuerzburg.de
> <mailto:andreas.leimbach at uni-wuerzburg.de>
>
> On 20.2.13 06:20, Ann Gregory wrote:
>
> Hi BioPerl,
>
> I am having issues with a BioPerl script. I have a blastxml file
> from a
> blastx blast and the original multifasta file containing the
> original
> nucleotides sequences.
>
> I want to take the blast result (ie. the blast description) and
> annotate my
> multifasta file.
>
> I have written 2 while loops that extract the blast descriptions
> as well as
> the nucleotide sequence from the multifasta file.
>
> My problem is that I cannot incorporate one of the while loops
> into the
> other without loosing the loop property of one of the loops. I
> would like
> to take the 1st blast description, then the 1st nucleotide
> sequence, then
> the 2nd blast description, then the 2nd nucleotide sequence and so
> on...just can figure out how to alternate the results.
>
> See script below:
>
>
> use warnings;
> use strict;
> use Bio::SearchIO;
> use Bio::SeqIO;
>
>
> my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
> "$ARGV[0]");
> while (my $result = $search_in->next_result) {
> while (my $hit = $result->next_hit) {
> while (my $hsp = $hit->next_hsp) {
> my $qd = $hit->description;
> print $qd, "\n";
> }
> }
> }
>
> my $seqio = Bio::SeqIO->new(-format => 'fasta', -file =>
> "$ARGV[1]");
> while (my $seqobj = $seqio->next_seq) {
> my $nuc = $seqobj->seq();
> print $nuc, "\n";
> }--
> Ann (Nina) Gregory
> Graduate Student
> Rich Lab / Sullivan Lab
> Soil, Water, Environmental Science Department
> University of Arizona
> _________________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> http://lists.open-bio.org/__mailman/listinfo/bioperl-l
> <http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>
>
>
>
> --
> Ann (Nina) Gregory
> Graduate Student
> Rich Lab / Sullivan Lab
> Soil, Water, Environmental Science Department
> University of Arizona
>
>
>
More information about the Bioperl-l
mailing list