[Bioperl-l] How to parse BLAST output - all hits of each queryinnew file
Tim Koehler
timbourine81 at googlemail.com
Thu Nov 26 16:02:30 UTC 2009
ups, sent too early...
Hey Mark,
thanks for the answer. But I am still struggling, especially where to put in
your code.
Here ist the code I have, so far:
#!/usr/bin/perl -w
### should I put your code here as push is a perl command?
my %hits_by_query;
for ($result->hits) {
### I inserted a comma after name}}; if there is no comma, there was the
error: Scalar found where operator expected at
12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit"
### (Missing operator before $hit?)
###Useless use of push with no values at
12_BLAST_two_sequence_each_query_one_file.PL line 7.
###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near
"} $hit"
###BEGIN not safe after errors--compilation aborted at
12_BLAST_two_sequence_each_query_one_file.PL line 13.
push @{$hits_by_query{$hit->name}}, $hit;
###here, every time this terror appears: Name "main::result" used only once:
possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5.
###error: Can't call method "hits" on an undefined value at
12_BLAST_two_sequence_each_query_one_file.PL line 5.
}
use strict;
use Bio::Tools::Run::StandAloneBlast;
use Bio::SeqIO;
use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
my $Seq_in = Bio::SeqIO->new (
-file =>
"/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta",
-format => 'fasta'
);
while (my $query = $Seq_in->next_seq()) {
my $factory = Bio::Tools::Run::StandAloneBlast->new(
'program' => 'blastn',
'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db',
_READMETHOD => "Blast"
);
my $blast_report = $factory->blastall($query);
### Should I need to use a module? are the commands here at the right
position? errors, e.g., Global symbol "$hit" requires explicit package name
#my %hits_by_query;
#for ($result->hits) {
### inserted comma after name}}
# push @{$hits_by_query{$hit->name}}, $hit;
#}
foreach my $qid ( keys %hits_by_query ) {
my $result = Bio::Search::Result::BlastResult->new();
$result->add_hit($_) for ( @{$hits_by_query{$qid}} );
my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' );
$blio->write_result($result);
}
###where are the files stored? what is their name. Sorry, but I cannot get
behind that :(
while( my $result = $blast_report->next_result ) {
## $result is a Bio::Search::Result::ResultI compliant object
while( my $hit = $result->next_hit ) {
## $hit is a Bio::Search::Hit::HitI compliant object
while( my $hsp = $hit->next_hsp ) {
## $hsp is a Bio::Search::HSP::HSPI compliant object
if( $hsp->length('total') > 50 ) {
if ( $hsp->percent_identity >= 75 ) {
print "Query= ", $result->query_name,
"Hit= ", $hit->name,
"Length= ", $hsp->length('total'),
"Percent_id= ", $hsp->percent_identity,
"Subject=", $hsp->hit_string,"\n";
}
}
}
}
}
}
Again, a big thanks in advance :)
All the best,
Tim
On Thu, Nov 26, 2009 at 4:52 PM, Tim <timbourine81 at gmail.com> wrote:
> Hey Mark,
>
> thanks for the answer
>
> On 25.11.2009 20:21, Mark A. Jensen wrote:
> > whoops: change the following line:
> > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' );
> >
> > to
> >
> > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' );
> >
> > (I always forget that...)
> > MAJ
> >
> > ----- Original Message ----- From: "Mark A. Jensen" <maj at fortinbras.us>
> > To: "Tim" <timbourine81 at gmail.com>; <bioperl-l at lists.open-bio.org>
> > Sent: Wednesday, November 25, 2009 1:20 PM
> > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each
> > queryinnew file
> >
> >
> >> hey Tim--
> >>
> >> Sound like you need to go about collecting your queries inside out:
> >>
> >> my %hits_by_query;
> >> for ($result->hits) {
> >> push @{$hits_by_query{$hit->name}} $hit;
> >> }
> >>
> >> I believe now each hash element, keyed by the query name, will contain
> >> an arrayref to the set of hits assoc with that query.
> >>> From here, I believe
> >>
> >> use Bio::Search::Result::BlastResult;
> >> use Bio::SearchIO;
> >>
> >> foreach my $qid ( keys %hits_by_query ) {
> >> my $result = Bio::Search::Result::BlastResult->new();
> >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
> >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast'
> );
> >> $blio->write_result($result);
> >> }
> >>
> >> will do what you want.
> >>
> >> hope this helps -
> >> Mark
> >>
> >> ----- Original Message ----- From: "Tim" <timbourine81 at gmail.com>
> >> To: <bioperl-l at lists.open-bio.org>
> >> Sent: Wednesday, November 25, 2009 12:40 PM
> >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each
> >> query innew file
> >>
> >>
> >>> Dear bioperl users,
> >>>
> >>> I am a real newbie and have - maybe a very trivial - question.
> >>>
> >>> I searched the mailing list archive and many howtos but I have not
> found
> >>> a concrete answer to my problem. So hopefully you can help me :)
> >>>
> >>> Background: I use the latest Bioperl version (installed it two weeks
> >>> before).
> >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file
> >>> including different sequences, I get a BLAST output with many queries
> >>> each having several hits / sbjcts.
> >>>
> >>> My problem is how to parse *all* hits of *one* query into a single new
> >>> file. And this for all the queries I have in my BLAST output file.
> >>>
> >>> Or is it better the other way round; first to make fasta files with
> only
> >>> single sequences inside and BLAST each file? But how can I automize
> that
> >>> using Bioperl?
> >>>
> >>> I tried Bio::SearchIO but can only parse all queries and their
> >>> respective hits in only one file...
> >>> I think iteration is also necessary here, but I do not really know how
> >>> to include that into Bio::SearchIO.
> >>> Or do I have to use Module:Bio::Index::Blast?
> >>>
> >>> I can index a file (see below), but I have no idea what comes next...
> >>>
> >>> ###How I index a file...
> >>>
> >>> #!/usr/bin/perl -w
> >>>
> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File";
> >>>
> >>> use Bio::Index::Fasta;
> >>>
> >>>
> >>> $file_name = "8_to_BLAST_two_seq_index.fasta";
> >>> $id = "48882";
> >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx",
> >>> -write_flag => 1);
> >>> $inx->make_index($file_name);
> >>>
> >>>
> >>> Hopefully, you can give me at least hints what to look for.
> >>>
> >>> A big THANKS in advance!
> >>>
> >>> Cheers,
> >>>
> >>> Tim
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >
>
> Tim Köhler
MPI for Terrestrial Microbiology
Karl-von-Frisch-Straße
D-35043 Marburg / Germany
Email: koehlerd at mpi-marburg.mpg.de
Phone: +49 6421 178-740
Fax: +49 6421 178-999
More information about the Bioperl-l
mailing list