[Bioperl-l] help using SEARCH IO to parse blast results
Chris Fields
cjfields at uiuc.edu
Wed Nov 28 13:57:27 UTC 2007
I had some code which does this which I committed yesterday to CVS; it
catches the GI for the query and the hits:
$result->query_gi;
$hit->ncbi_gi;
I am in the midst of fixing additional problems with WU-BLAST parsing
but you are more than welcome to try it.
chris
On Nov 27, 2007, at 3:32 PM, alison waller wrote:
> Thanks Everyone,
>
> Your edits worked Dave, however after looking at the output I
> realized that
> I only want information on the top hsp per query returned. For
> example some
> of the querys the top hit has two hsps so it returned both.
>
> I tried to further edit it, but after 3 attempts they are all
> failing, I
> think due to me using the loops wrong.
>
> I also have another problem, I also want to retrieve the gi, this
> doesn't
> seem to be straight forward as it should. I found another method
> _get_seq_identifiers, but this looks awkward, isn't there and object
> for the
> gi?
>
> I've pasted my non-working script below if there are any suggestions
> on how
> to get it to print out just the first hsp per hit, that would be
> great.
>
> Thanks,
>
>
> #!/usr/local/bin/perl -w
>
>
> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
> # alison waller November 2007
>
>
> use strict;
> use warnings;
> use Bio::SearchIO;
>
>
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
> hits>\n";
> if (@ARGV != 2) { die $usage; }
>
>
> my $infile = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit = $ARGV[1]; # to specify in the command line how many hits
> # to report for each query
>
>
> #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n";
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!
> \n";
>
>
> my $report = new Bio::SearchIO(
> -file => "$infile",
> -format => "blast"
> );
>
>
> print OUT
>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent
> \tgaps\t
> strand\tHstrand\n";
>
>
> # Go through BLAST reports one by one
> while (my $result = $report->next_result) {
> my $i=0;
> while( ( $i++<$tophit) && (my $hit = $result->next_hit)){
> while ( ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) {
>
>
> # Print some tab-delimited data about this hit
> print OUT $result->query_name, "\t";
> print OUT $hit->name, "\t";
> print OUT $hit->significance, "\t";
> print OUT $hit->bits, "\t";
> print OUT $hsp->evalue, "\t";
> print OUT $hsp->percent_identity, "\t";
> print OUT $hsp->length('total'), "\t";
> print OUT $hsp->num_identical, "\t";
> print OUT $hsp->gaps, "\t";
> print OUT $hsp->strand('query'), "\t";
> print OUT $hsp->strand('hit'), "\n";
> }
> }
> if ($i == 0) { print OUT "no hits\n"; }
>
> }
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Tuesday, November 27, 2007 4:01 PM
> To: Smithies, Russell
> Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>
> The hits/HSPs are generally in the order they appear in the report.
>
> If you are looking for best/worst HSP after parsing you can use the
> $hit->hsp() method:
>
> # best and worst
> my $best = $hit->hsp('best'); # also 'first'
> my $worst = $hit->hsp('worst'); # also last
>
> The SearchIO text BLAST parser also has several options implemented
> for finer control:
>
> -inclusion_threshold => e-value threshold for inclusion in the
> PSI-BLAST score matrix model (blastpgp)
> -signif => float or scientific notation number to be used
> as a P- or Expect value cutoff
> -score => integer or scientific notation number to be used
> as a blast score value cutoff
> -bits => integer or scientific notation number to be used
> as a bit score value cutoff
> -hit_filter => reference to a function to be used for
> filtering hits based on arbitrary criteria.
> All hits of each BLAST report must satisfy
> this criteria to be retained.
> If a hit fails this test, it is ignored.
> This function should take a
> Bio::Search::Hit::BlastHit.pm object as its first
> argument and return true
> if the hit should be retained.
> Sample filter function:
> -hit_filter => sub { $hit = shift;
> $hit->gaps == 0; },
> (Note: -filt_func is synonymous with -hit_filter)
> -overlap => integer. The amount of overlap to permit between
> adjacent HSPs when tiling HSPs. A reasonable
> value is 2.
> Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.
>
> chris
>
> On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:
>
>> Do the hits need to be sorted first or is this done automagicly?
>> I ask this as I know Megablast doesn't provide sorted output for
>> most of
>> it's formats.
>>
>> Russell
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-
>>> bio.org] On Behalf Of Dave Messina
>>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>>> To: alison waller
>>> Cc: bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>>
>>> Hi Alison,
>>> As Sendu mentioned, the key bit is adding a condition to the hit
>>> loop
>> to
>>> limit the number of hits that are printed. I didn't test the below
>>> extensively, but give it a try...
>>>
>>>
>>> Dave
>>>
>>>
>>>
>>> #!/usr/local/bin/perl -w
>>>
>>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>>> # alison waller November 2007
>>>
>>> use strict;
>>> use warnings;
>>> use Bio::SearchIO;
>>>
>>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
>> hits>\n";
>>> if (@ARGV != 2) { die $usage; }
>>>
>>> my $infile = $ARGV[0];
>>> my $outfile = $infile . '.parsed';
>>> my $tophit = $ARGV[1]; # to specify in the command line how many
>>> hits
>>> # to report for each query
>>>
>>> #open( IN, $infile ) || die "Can't open inputfile $infile! $!
>>> \n";
>>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
>> $!\n";
>>>
>>> my $report = new Bio::SearchIO(
>>> -file => "$infile",
>>> -format => "blast"
>>> );
>>>
>>> print OUT
>>>
>> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent
>> \tga
>> ps\t
>>> Qstrand\tHstrand\n";
>>>
>>> # Go through BLAST reports one by one
>>> while ( my $result = $report->next_result ) {
>>> my $i = 0;
>>> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>> while ( my $hsp = $hit->next_hsp ) {
>>>
>>> # Print some tab-delimited data about this hit
>>> print OUT $result->query_name, "\t";
>>> print OUT $hit->name, "\t";
>>> print OUT $hit->significance, "\t";
>>> print OUT $hit->bits, "\t";
>>> print OUT $hsp->evalue, "\t";
>>> print OUT $hsp->percent_identity, "\t";
>>> print OUT $hsp->length('total'), "\t";
>>> print OUT $hsp->num_identical, "\t";
>>> print OUT $hsp->gaps, "\t";
>>> print OUT $hsp->strand('query'), "\t";
>>> print OUT $hsp->strand('hit'), "\n";
>>> }
>>> }
>>>
>>> if ($i == 0) { print OUT "no hits\n"; }
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =
>> =
>> =====================================================================
>> Attention: The information contained in this message and/or
>> attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or
>> privileged
>> material. Any review, retransmission, dissemination or other use of,
>> or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by
>> AgResearch
>> Limited. If you have received this message in error, please notify
>> the
>> sender immediately.
>> =
>> =
>> =====================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list