[Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file

Mark A. Jensen maj at fortinbras.us
Sun Nov 29 02:55:42 UTC 2009


Hi Tim--
There's a bug in my code; should be
for my $hit ($result->hits) {
...
}
and you're right about the comma. My bad.

But I don't think you need this-- you're already looping over your
query sequences and doing blastn on each one. So in the middle of
your loop, you can simply write the blast result that you got:

my $blio = Bio::SearchIO->new( -file => 
">".$query->id.".bls", -format=>"blast" );
$blio->write_result($result);

and forget about the foreach my $qid loop entirely.

The files should show up in the directory from which you're
running the script.
cheers, MAJ



----- Original Message ----- 
From: "Tim Koehler" <timbourine81 at googlemail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Thursday, November 26, 2009 11:02 AM
Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew 
file


ups, sent too early...

Hey Mark,

thanks for the answer. But I am still struggling, especially where to put in
your code.

Here ist the code I have, so far:

#!/usr/bin/perl -w

### should I put your code here as push is a perl command?
my %hits_by_query;
for ($result->hits) {
### I inserted a comma after name}}; if there is no comma, there was the
error: Scalar found where operator expected at
12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit"
###        (Missing operator before  $hit?)
###Useless use of push with no values at
12_BLAST_two_sequence_each_query_one_file.PL line 7.
###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near
"} $hit"
###BEGIN not safe after errors--compilation aborted at
12_BLAST_two_sequence_each_query_one_file.PL line 13.
 push @{$hits_by_query{$hit->name}}, $hit;
###here, every time this terror appears: Name "main::result" used only once:
possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5.
###error: Can't call method "hits" on an undefined value at
12_BLAST_two_sequence_each_query_one_file.PL line 5.
}


use strict;
use Bio::Tools::Run::StandAloneBlast;
use Bio::SeqIO;
use Bio::SearchIO;
use Bio::Search::Result::BlastResult;

my $Seq_in = Bio::SeqIO->new (
-file =>
"/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta",
-format => 'fasta'
);
while (my $query = $Seq_in->next_seq()) {
my $factory = Bio::Tools::Run::StandAloneBlast->new(
'program' => 'blastn',
'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db',
_READMETHOD => "Blast"
);

my $blast_report = $factory->blastall($query);

### Should I need to use a module? are the commands here at the right
position? errors, e.g., Global symbol "$hit" requires explicit package name
#my %hits_by_query;
#for ($result->hits) {
### inserted comma after name}}
# push @{$hits_by_query{$hit->name}}, $hit;
#}

foreach my $qid ( keys %hits_by_query ) {
 my $result = Bio::Search::Result::BlastResult->new();
 $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
 my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' );
 $blio->write_result($result);
}

###where are the files stored? what is their name. Sorry, but I cannot get
behind that :(

while( my $result = $blast_report->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while( my $hit = $result->next_hit ) {
   ## $hit is a Bio::Search::Hit::HitI compliant object
   while( my $hsp = $hit->next_hsp ) {
    ## $hsp is a Bio::Search::HSP::HSPI compliant object
    if( $hsp->length('total') > 50 ) {
     if ( $hsp->percent_identity >= 75 ) {
     print  "Query= ",        $result->query_name,
        "Hit= ",        $hit->name,
            "Length= ",     $hsp->length('total'),
            "Percent_id= ", $hsp->percent_identity,
        "Subject=",        $hsp->hit_string,"\n";
     }
    }
   }
  }
}
}

Again, a big thanks in advance :)

All the best,

Tim


On Thu, Nov 26, 2009 at 4:52 PM, Tim <timbourine81 at gmail.com> wrote:

> Hey Mark,
>
> thanks for the answer
>
> On 25.11.2009 20:21, Mark A. Jensen wrote:
> > whoops: change the following line:
> > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' );
> >
> > to
> >
> > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' );
> >
> > (I always forget that...)
> > MAJ
> >
> > ----- Original Message ----- From: "Mark A. Jensen" <maj at fortinbras.us>
> > To: "Tim" <timbourine81 at gmail.com>; <bioperl-l at lists.open-bio.org>
> > Sent: Wednesday, November 25, 2009 1:20 PM
> > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each
> > queryinnew file
> >
> >
> >> hey Tim--
> >>
> >> Sound like you need to go about collecting your queries inside out:
> >>
> >> my %hits_by_query;
> >> for ($result->hits) {
> >>  push @{$hits_by_query{$hit->name}} $hit;
> >> }
> >>
> >> I believe now each hash element, keyed by the query name, will contain
> >> an arrayref to the set of hits assoc with that query.
> >>> From here, I believe
> >>
> >> use Bio::Search::Result::BlastResult;
> >> use Bio::SearchIO;
> >>
> >> foreach my $qid ( keys %hits_by_query ) {
> >>  my $result = Bio::Search::Result::BlastResult->new();
> >>  $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
> >>  my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast'
> );
> >>  $blio->write_result($result);
> >> }
> >>
> >> will do what you want.
> >>
> >> hope this helps -
> >> Mark
> >>
> >> ----- Original Message ----- From: "Tim" <timbourine81 at gmail.com>
> >> To: <bioperl-l at lists.open-bio.org>
> >> Sent: Wednesday, November 25, 2009 12:40 PM
> >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each
> >> query innew file
> >>
> >>
> >>> Dear bioperl users,
> >>>
> >>> I am a real newbie and have - maybe a very trivial - question.
> >>>
> >>> I searched the mailing list archive and many howtos but I have not
> found
> >>> a concrete answer to my problem. So hopefully you can help me :)
> >>>
> >>> Background: I use the latest Bioperl version (installed it two weeks
> >>> before).
> >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file
> >>> including different sequences, I get a BLAST output with many queries
> >>> each having several hits / sbjcts.
> >>>
> >>> My problem is how to parse *all* hits of *one* query into a single new
> >>> file. And this for all the queries I have in my BLAST output file.
> >>>
> >>> Or is it better the other way round; first to make fasta files with
> only
> >>> single sequences inside and BLAST each file? But how can I automize
> that
> >>> using Bioperl?
> >>>
> >>> I tried Bio::SearchIO but can only parse all queries and their
> >>> respective hits in only one file...
> >>> I think iteration is also necessary here, but I do not really know how
> >>> to include that into Bio::SearchIO.
> >>> Or do I have to use Module:Bio::Index::Blast?
> >>>
> >>> I can index a file (see below), but I have no idea what comes next...
> >>>
> >>> ###How I index a file...
> >>>
> >>> #!/usr/bin/perl -w
> >>>
> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File";
> >>>
> >>> use Bio::Index::Fasta;
> >>>
> >>>
> >>> $file_name = "8_to_BLAST_two_seq_index.fasta";
> >>> $id = "48882";
> >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx",
> >>> -write_flag => 1);
> >>> $inx->make_index($file_name);
> >>>
> >>>
> >>> Hopefully, you can give me at least hints what to look for.
> >>>
> >>> A big THANKS in advance!
> >>>
> >>> Cheers,
> >>>
> >>> Tim
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >
>
> Tim Köhler
MPI for Terrestrial Microbiology
Karl-von-Frisch-Straße
D-35043 Marburg / Germany

Email: koehlerd at mpi-marburg.mpg.de
Phone: +49 6421 178-740
Fax:   +49 6421 178-999

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list