[Bioperl-l] How to parse BLAST output - all hits of each queryinnew file
Tim Koehler
timbourine81 at googlemail.com
Mon Nov 30 17:23:58 UTC 2009
Hello everybody,
thanks a lot for the overwhelming answers! All these codes are different
flavors and worked all.
For me the added code works the best. But I think I found a bug in
...Bio/SearchIO/blast.pm.
There the DEFAULT_BLAST_... variable is set to
Bio::Search::Writer::HitTableWriter instead of
Bio::SearchIO::Writer::HitTableWriter. This variable I changed also to
HTMLResultWriter
and others.
So again: THANKS for the support!
Cheers,
Tim
#!/usr/bin/perl -w
use strict;
use Bio::Tools::Run::StandAloneBlast;
use Bio::SeqIO;
use Bio::SearchIO;
### add here the writer you want
use Bio::SearchIO::Writer::HitTableWriter;
use Bio::Search::Result::BlastResult;
use Data::Dumper;
my $Seq_in = Bio::SeqIO->new( -file =>
"/home/koehler/Programs/for_BLAST/1_to_BLAST_two_seq.fasta",
-format => "fasta" );
while ( my $query = $Seq_in->next_seq() ) {
warn "Processing ",$query->id, "\n";
my $factory =
Bio::Tools::Run::StandAloneBlast->new(
program => "blastn",
database =>
"/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db",
_READMETHOD => "Blast"
);
my $blast_report = $factory->blastall($query);
sleep 5;
# just write the result we got for this query into a
#new blast-formatted file...named after the id of the query seq...
my $result = $blast_report->next_result;
my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format =>
"blast" ) or die $!;
$blio->write_result($result);
# below, just looking at the current blast result
###this does not appear in the output files
while ( my $result = $blast_report->next_result ) {
## $result is a Bio::Search::Result::ResultI compliant object
while ( my $hit = $result->next_hit ) {
## $hit is a Bio::Search::Hit::HitI compliant object
while ( my $hsp = $hit->next_hsp ) {
## $hsp is a Bio::Search::HSP::HSPI compliant object
if ( $hsp->length('total') > 50 ) {
if ( $hsp->percent_identity >= 75 ) {
print "Query= ", $result->query_name,
"Hit= ", $hit->name,
"Length= ", $hsp->length('total'),
"Percent_id= ", $hsp->percent_identity,
"Subject=", $hsp->hit_string, "\n";
}
}
}
}
}
}
On Sun, Nov 29, 2009 at 11:29 PM, Smithies, Russell <
Russell.Smithies at agresearch.co.nz> wrote:
> Changed it to a generic result and added a writer and it seems tio work:
>
>
>
> foreach my $qid ( keys %hits_by_query ) {
>
> warn "qid = $qid\n";
>
> my $res = Bio::Search::Result::GenericResult->new(-algorithm =>
> "blastn") or die $!;
>
> # print Dumper $res;
>
> foreach my $h ( @{ $hits_by_query{$qid} } ){
>
> warn "adding hit ", $h->name, "\n";
>
> $res->add_hit($h) if defined($h);
>
> }
>
> my $writerhtml = Bio::SearchIO::Writer::HTMLResultWriter->new();
>
> my $blio = Bio::SearchIO->new(-writer => $writerhtml, -file =>
> ">$qid\.bls\.html", -format => "blast" ) or die $!;
>
> $blio->write_result($res);
>
> }
>
>
>
>
>
> *From:* Mark A. Jensen [mailto:maj at fortinbras.us]
> *Sent:* Monday, 30 November 2009 10:19 a.m.
> *To:* Smithies, Russell; 'Tim Koehler'
>
> *Subject:* Re: [Bioperl-l] How to parse BLAST output - all hits of each
> queryinnew file
>
>
>
> My thought here was that since Tim's already going one at a time thru
>
> his queries, my scrap was not really necessary:
>
>
>
> use strict;
>
> use Bio::Tools::Run::StandAloneBlast;
>
> use Bio::SeqIO;
>
> use Bio::SearchIO;
>
> use Bio::Search::Result::BlastResult;
>
>
>
> use Data::Dumper;
>
>
>
> my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta",
>
> -format => "fasta" );
>
>
>
> while ( my $query = $Seq_in->next_seq() ) {
>
> warn "Processing ",$query->id, "\n";
>
> my $factory =
>
> Bio::Tools::Run::StandAloneBlast->new(
>
> program => "blastn",
>
> database =>
> "/data/databases/flatfile/illuminati_blastdata/nt",
>
> _READMETHOD => "Blast"
>
> );
>
>
>
> my $blast_report = $factory->blastall($query);
>
> sleep 5;
>
>
>
> # just write the result we got for this query into a
>
> #new blast-formatted file...named after the id of the query seq...
>
> my $result = $blast_report->next_result;
>
> my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format =>
> "blast" ) or die $!;
>
> $blio->write_result($result);
>
>
>
> # below, just looking at the current blast result
>
> while ( my $result = $blast_report->next_result ) {
>
> ## $result is a Bio::Search::Result::ResultI compliant object
>
> while ( my $hit = $result->next_hit ) {
>
> ## $hit is a Bio::Search::Hit::HitI compliant object
>
> while ( my $hsp = $hit->next_hsp ) {
>
> ## $hsp is a Bio::Search::HSP::HSPI compliant object
>
> if ( $hsp->length('total') > 50 ) {
>
> if ( $hsp->percent_identity >= 75 ) {
>
> print "Query= ", $result->query_name,
>
> "Hit= ", $hit->name,
>
> "Length= ", $hsp->length('total'),
>
> "Percent_id= ", $hsp->percent_identity,
>
> "Subject=", $hsp->hit_string, "\n";
>
> }
>
> }
>
> }
>
> }
>
> }
>
> }
>
> ----- Original Message -----
>
> *From:* Smithies, Russell <Russell.Smithies at agresearch.co.nz>
>
> *To:* 'Tim Koehler' <timbourine81 at googlemail.com> ; 'maj at fortinbras.us'<%27maj at fortinbras.us%27>
>
> *Sent:* Sunday, November 29, 2009 3:58 PM
>
> *Subject:* RE: [Bioperl-l] How to parse BLAST output - all hits of each
> queryinnew file
>
>
>
> Hi Tim
>
> With various people writing the “howtos” and other docs, the examples are
> bound to have differing names for the variables used but as long as you’re
> consistent, it should all fit together.
>
>
>
> I think I’ve almost got your code working, just getting errors from
> Bio::Search::Result::BlastResult which I’m not entirely sure how to use.
> Perhaps Mark can get this bit going?
>
>
>
> --Russell
>
> ===============================
>
>
>
> use strict;
>
> use Bio::Tools::Run::StandAloneBlast;
>
> use Bio::SeqIO;
>
> use Bio::SearchIO;
>
> use Bio::Search::Result::BlastResult;
>
>
>
> use Data::Dumper;
>
>
>
> my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta",
>
> -format => "fasta" );
>
>
>
> while ( my $query = $Seq_in->next_seq() ) {
>
> warn "Processing ",$query->id, "\n";
>
> my $factory =
>
> Bio::Tools::Run::StandAloneBlast->new(
>
> program => "blastn",
>
> database =>
> "/data/databases/flatfile/illuminati_blastdata/nt",
>
> _READMETHOD => "Blast"
>
> );
>
>
>
> my $blast_report = $factory->blastall($query);
>
> sleep 5;
>
>
>
>
>
> my %hits_by_query;
>
>
>
> while ( my $result = $blast_report->next_result ) {
>
> foreach my $hit ( $result->hits ) {
>
> warn "Pushed a hit for ",$hit->name, "\n";
>
> push( @{ $hits_by_query{ $hit->name } }, $hit );
>
> }
>
> }
>
>
>
> foreach my $qid ( keys %hits_by_query ) {
>
> warn "qid = $qid\n";
>
> my $res = Bio::Search::Result::BlastResult->new() or die $!;
>
> print Dumper $res;
>
> foreach my $h ( @{ $hits_by_query{$qid} } ){
>
> warn "adding hit ", $h->name, "\n";
>
> $res->add_hit($h) if defined($h);
>
> }
>
> my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format =>
> "blast" ) or die $!;
>
> $blio->write_result($res);
>
> }
>
>
>
> while ( my $result = $blast_report->next_result ) {
>
> ## $result is a Bio::Search::Result::ResultI compliant object
>
> while ( my $hit = $result->next_hit ) {
>
> ## $hit is a Bio::Search::Hit::HitI compliant object
>
> while ( my $hsp = $hit->next_hsp ) {
>
> ## $hsp is a Bio::Search::HSP::HSPI compliant object
>
> if ( $hsp->length('total') > 50 ) {
>
> if ( $hsp->percent_identity >= 75 ) {
>
> print "Query= ", $result->query_name,
>
> "Hit= ", $hit->name,
>
> "Length= ", $hsp->length('total'),
>
> "Percent_id= ", $hsp->percent_identity,
>
> "Subject=", $hsp->hit_string, "\n";
>
> }
>
> }
>
> }
>
> }
>
> }
>
> }
>
> ===============================
>
>
>
> *From:* Tim Koehler [mailto:timbourine81 at googlemail.com]
> *Sent:* Friday, 27 November 2009 10:24 p.m.
> *To:* Smithies, Russell; maj at fortinbras.us
> *Subject:* Re: [Bioperl-l] How to parse BLAST output - all hits of each
> queryinnew file
>
>
>
> Hey guys,
>
> please, do not get me wrong that I wanted to put the workload on you. So
> far I only found the HowTo's but in there in some way the language changed
> with time (e.g. $in to $Seq_in) or some things I simply could not find.
> Now I got a tip where else to search: the scrapbook and deobfuscator.
>
> I immediately will have a look at that.
>
> This is the first time for me touching linux / perl commands; that's why I
> thought after several days of trial and many errors ;) asking the
> mailinglist.
>
> I was very happy about your fast answers!
>
> Cheers and a nice weekend,
>
> Tim
>
> On Thu, Nov 26, 2009 at 5:02 PM, Tim Koehler <timbourine81 at googlemail.com>
> wrote:
>
> ups, sent too early...
>
> Hey Mark,
>
> thanks for the answer. But I am still struggling, especially where to put
> in your code.
>
> Here ist the code I have, so far:
>
> #!/usr/bin/perl -w
>
> ### should I put your code here as push is a perl command?
>
>
> my %hits_by_query;
> for ($result->hits) {
>
> ### I inserted a comma after name}}; if there is no comma, there was the
> error: Scalar found where operator expected at
> 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit"
> ### (Missing operator before $hit?)
> ###Useless use of push with no values at
> 12_BLAST_two_sequence_each_query_one_file.PL line 7.
> ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7,
> near "} $hit"
> ###BEGIN not safe after errors--compilation aborted at
> 12_BLAST_two_sequence_each_query_one_file.PL line 13.
>
>
> push @{$hits_by_query{$hit->name}}, $hit;
>
> ###here, every time this terror appears: Name "main::result" used only
> once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5.
> ###error: Can't call method "hits" on an undefined value at
> 12_BLAST_two_sequence_each_query_one_file.PL line 5.
>
>
> }
>
>
> use strict;
> use Bio::Tools::Run::StandAloneBlast;
> use Bio::SeqIO;
> use Bio::SearchIO;
>
> use Bio::Search::Result::BlastResult;
>
> my $Seq_in = Bio::SeqIO->new (
> -file =>
> "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta",
> -format => 'fasta'
> );
> while (my $query = $Seq_in->next_seq()) {
>
>
> my $factory = Bio::Tools::Run::StandAloneBlast->new(
>
> 'program' => 'blastn',
> 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db',
> _READMETHOD => "Blast"
> );
>
> my $blast_report = $factory->blastall($query);
>
> ### Should I need to use a module? are the commands here at the right
> position? errors, e.g., Global symbol "$hit" requires explicit package name
> #my %hits_by_query;
> #for ($result->hits) {
> ### inserted comma after name}}
> # push @{$hits_by_query{$hit->name}}, $hit;
> #}
>
>
>
> foreach my $qid ( keys %hits_by_query ) {
> my $result = Bio::Search::Result::BlastResult->new();
> $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
> my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' );
> $blio->write_result($result);
> }
>
> ###where are the files stored? what is their name. Sorry, but I cannot get
> behind that :(
>
> while( my $result = $blast_report->next_result ) {
> ## $result is a Bio::Search::Result::ResultI compliant object
>
>
> while( my $hit = $result->next_hit ) {
>
> ## $hit is a Bio::Search::Hit::HitI compliant object
>
>
> while( my $hsp = $hit->next_hsp ) {
>
> ## $hsp is a Bio::Search::HSP::HSPI compliant object
> if( $hsp->length('total') > 50 ) {
> if ( $hsp->percent_identity >= 75 ) {
> print "Query= ", $result->query_name,
> "Hit= ", $hit->name,
> "Length= ", $hsp->length('total'),
> "Percent_id= ", $hsp->percent_identity,
> "Subject=", $hsp->hit_string,"\n";
> }
> }
> }
> }
> }
> }
>
> Again, a big thanks in advance :)
>
> All the best,
>
> Tim
>
> On Thu, Nov 26, 2009 at 4:52 PM, Tim <timbourine81 at gmail.com> wrote:
>
> Hey Mark,
>
> thanks for the answer
>
>
>
>
> On 25.11.2009 20:21, Mark A. Jensen wrote:
> > whoops: change the following line:
> > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' );
> >
> > to
> >
> > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' );
> >
> > (I always forget that...)
> > MAJ
> >
> > ----- Original Message ----- From: "Mark A. Jensen" <maj at fortinbras.us>
> > To: "Tim" <timbourine81 at gmail.com>; <bioperl-l at lists.open-bio.org>
> > Sent: Wednesday, November 25, 2009 1:20 PM
> > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each
> > queryinnew file
> >
> >
> >> hey Tim--
> >>
> >> Sound like you need to go about collecting your queries inside out:
> >>
> >> my %hits_by_query;
> >> for ($result->hits) {
> >> push @{$hits_by_query{$hit->name}} $hit;
> >> }
> >>
> >> I believe now each hash element, keyed by the query name, will contain
> >> an arrayref to the set of hits assoc with that query.
> >>> From here, I believe
> >>
> >> use Bio::Search::Result::BlastResult;
> >> use Bio::SearchIO;
> >>
> >> foreach my $qid ( keys %hits_by_query ) {
> >> my $result = Bio::Search::Result::BlastResult->new();
> >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
> >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast'
> );
> >> $blio->write_result($result);
> >> }
> >>
> >> will do what you want.
> >>
> >> hope this helps -
> >> Mark
> >>
> >> ----- Original Message ----- From: "Tim" <timbourine81 at gmail.com>
> >> To: <bioperl-l at lists.open-bio.org>
> >> Sent: Wednesday, November 25, 2009 12:40 PM
> >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each
> >> query innew file
> >>
> >>
> >>> Dear bioperl users,
> >>>
> >>> I am a real newbie and have - maybe a very trivial - question.
> >>>
> >>> I searched the mailing list archive and many howtos but I have not
> found
> >>> a concrete answer to my problem. So hopefully you can help me :)
> >>>
> >>> Background: I use the latest Bioperl version (installed it two weeks
> >>> before).
> >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file
> >>> including different sequences, I get a BLAST output with many queries
> >>> each having several hits / sbjcts.
> >>>
> >>> My problem is how to parse *all* hits of *one* query into a single new
> >>> file. And this for all the queries I have in my BLAST output file.
> >>>
> >>> Or is it better the other way round; first to make fasta files with
> only
> >>> single sequences inside and BLAST each file? But how can I automize
> that
> >>> using Bioperl?
> >>>
> >>> I tried Bio::SearchIO but can only parse all queries and their
> >>> respective hits in only one file...
> >>> I think iteration is also necessary here, but I do not really know how
> >>> to include that into Bio::SearchIO.
> >>> Or do I have to use Module:Bio::Index::Blast?
> >>>
> >>> I can index a file (see below), but I have no idea what comes next...
> >>>
> >>> ###How I index a file...
> >>>
> >>> #!/usr/bin/perl -w
> >>>
> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File";
> >>>
> >>> use Bio::Index::Fasta;
> >>>
> >>>
> >>> $file_name = "8_to_BLAST_two_seq_index.fasta";
> >>> $id = "48882";
> >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx",
> >>> -write_flag => 1);
> >>> $inx->make_index($file_name);
> >>>
> >>>
> >>> Hopefully, you can give me at least hints what to look for.
> >>>
> >>> A big THANKS in advance!
> >>>
> >>> Cheers,
> >>>
> >>> Tim
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >
>
> Tim Köhler
> MPI for Terrestrial Microbiology
> Karl-von-Frisch-Straße
> D-35043 Marburg / Germany
>
> Email: koehlerd at mpi-marburg.mpg.de
> Phone: +49 6421 178-740
> Fax: +49 6421 178-999
>
>
>
>
> ------------------------------
>
> *Attention: *The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> ------------------------------
>
>
>
>
More information about the Bioperl-l
mailing list