[Bioperl-l] Problem Parsing BLAST output to annotate FASTA file
Fields, Christopher J
cjfields at illinois.edu
Wed Feb 20 18:24:58 UTC 2013
If this is meant to be something done using the same FASTA files for a bunch of BLAST reports, might be worth setting up a flat file index and using that to look up and grab the sequences; it should be a LOT faster, just the first pass (generation of the initial index) would take a little time. Look at Bio::DB::Fasta for an example.
chris
On Feb 20, 2013, at 11:00 AM, Andreas Leimbach <andreas.leimbach at uni-wuerzburg.de>
wrote:
> Hey Ann,
>
> damn, it 's not my best day ... Anyways, I wouldn't work with List::MoreUtils's each_array function, as this assumes that the blast hits and the nucleotide queries are in the same order (as Adam pointed out). Rather use a hash which associates a key to a certain value. Also, the hash can be used to skip sequences that have no hits.
> Here's my new version:
>
> my %hits;
> my $hit_desc;
> my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
> "$ARGV[0]");
> while (my $result = $search_in->next_result) {
> while (my $hit = $result->next_hit) {
> while (my $hsp = $hit->next_hsp) {
> $hits{$result->query_description} = $hit->description; # hash: associate query_desc (key) with hit_desc (value)
> last; # jump out of the while loop; this should resolve getting only the first hit
> }
> last; # see above
> }
> }
>
>
> my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
> while (my $seqobj = $seqio->next_seq) {
> if ($hits{$seqobj->display_id}) { # only true if display_id associated with hit_desc and should skip seqs without hits
> print ">$hits{$seqobj->display_id}\n";
> my $nuc = $seqobj->seq();
> print $nuc, "\n";
> }
> }
>
> Cheers,
> Andreas
>
> P.S.: I redirected your mail to the BioPerl mailing list, others might profit from my mistakes ;-) ...
>
> --
> Andreas Leimbach
> Universität Münster
> Institut für Hygiene
> Mendelstr. 7
> D-48149 Münster
> Germany
>
> Tel.: +49 (0)551 39 3843
> E-Mail: andreas.leimbach at uni-wuerzburg.de
>
> On 20.2.13 17:35, Ann Gregory wrote:
>> Hi Andreas,
>>
>> Thanks for you help! I don't understand how this gets the first blast hit:
>>
>> if ($hit->description eq $hit_desc) { # Only want the first blast hit
>> next;
>> }
>>
>> I tried this and seems to be working...but I can't get the 1st blast hit
>> or skip the sequences that had no hits. Do you know any quick fixes?
>>
>> *
>> use warnings;
>> use strict;
>> use Bio::SearchIO;
>> use Bio::SeqIO;
>> use List::MoreUtils qw(each_array);
>>
>> my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
>> "$ARGV[0]");
>> my @ids;
>> while (my $result = $search_in->next_result) {
>> while (my $hit = $result->next_hit) {
>> while (my $hsp = $hit->next_hsp) {
>> my $match = $result->num_hits;
>> push(@ids, $qd);
>> }
>> }
>> }
>> }
>>
>> my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
>> my @seqs;
>> while (my $seqobj = $seqio->next_seq) {
>> my $nuc = $seqobj->seq();
>> push(@seqs, $nuc);
>> }
>>
>> my $it = each_array(@ids, at seqs);
>> while(my($ids,$seqs)=$it->()){
>> print $ids, "\n", $seqs, "\n";
>> }
>> *
>>
>> Thanks again!
>> ~Ann
>>
>> On Wed, Feb 20, 2013 at 9:24 AM, Andreas Leimbach
>> <andreas.leimbach at uni-wuerzburg.de
>> <mailto:andreas.leimbach at uni-wuerzburg.de>> wrote:
>>
>> oops, I just realized I had one loop to much in there. Adam is
>> correct. Sorry.
>>
>> The last part of the code I send you should look like this:
>>
>>
>> my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
>> while (my $seqobj = $seqio->next_seq) {
>> print ">$hits{$seqobj->display_id}\__n";
>>
>> my $nuc = $seqobj->seq();
>> print $nuc, "\n";
>> }
>>
>>
>> Cheers,
>> Andreas
>>
>>
>> --
>> Andreas Leimbach
>> Universität Münster
>> Institut für Hygiene
>> Mendelstr. 7
>> D-48149 Münster
>> Germany
>>
>> Tel.: +49 (0)551 39 3843 <tel:%2B49%20%280%29551%2039%203843>
>> E-Mail: andreas.leimbach at uni-__wuerzburg.de
>> <mailto:andreas.leimbach at uni-wuerzburg.de>
>>
>> On 20.2.13 06:20, Ann Gregory wrote:
>>
>> Hi BioPerl,
>>
>> I am having issues with a BioPerl script. I have a blastxml file
>> from a
>> blastx blast and the original multifasta file containing the
>> original
>> nucleotides sequences.
>>
>> I want to take the blast result (ie. the blast description) and
>> annotate my
>> multifasta file.
>>
>> I have written 2 while loops that extract the blast descriptions
>> as well as
>> the nucleotide sequence from the multifasta file.
>>
>> My problem is that I cannot incorporate one of the while loops
>> into the
>> other without loosing the loop property of one of the loops. I
>> would like
>> to take the 1st blast description, then the 1st nucleotide
>> sequence, then
>> the 2nd blast description, then the 2nd nucleotide sequence and so
>> on...just can figure out how to alternate the results.
>>
>> See script below:
>>
>>
>> use warnings;
>> use strict;
>> use Bio::SearchIO;
>> use Bio::SeqIO;
>>
>>
>> my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
>> "$ARGV[0]");
>> while (my $result = $search_in->next_result) {
>> while (my $hit = $result->next_hit) {
>> while (my $hsp = $hit->next_hsp) {
>> my $qd = $hit->description;
>> print $qd, "\n";
>> }
>> }
>> }
>>
>> my $seqio = Bio::SeqIO->new(-format => 'fasta', -file =>
>> "$ARGV[1]");
>> while (my $seqobj = $seqio->next_seq) {
>> my $nuc = $seqobj->seq();
>> print $nuc, "\n";
>> }--
>> Ann (Nina) Gregory
>> Graduate Student
>> Rich Lab / Sullivan Lab
>> Soil, Water, Environmental Science Department
>> University of Arizona
>> _________________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/__mailman/listinfo/bioperl-l
>> <http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>>
>>
>>
>>
>> --
>> Ann (Nina) Gregory
>> Graduate Student
>> Rich Lab / Sullivan Lab
>> Soil, Water, Environmental Science Department
>> University of Arizona
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list