[Bioperl-l] Error parsing BLAST report
Chris Fields
cjfields at uiuc.edu
Thu Aug 17 15:58:17 UTC 2006
Robert,
This sounds like a possible bug; the error you are getting is from BLAST
2.2.11 output, so it should work. The BLAST parsing errors fixed in CVS had
to do with parsing BLAST 2.2.13 (and later) output.
Could you add this as a bug to Bugzilla with your test script and test case
data that generates the error?
How to post the bug:
http://www.bioperl.org/wiki/Bugs
Bugzilla:
http://bugzilla.open-bio.org/
Chris
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Freimuth, Robert
> Sent: Wednesday, August 16, 2006 11:42 PM
> To: Brian Osborne; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Error parsing BLAST report
>
> Hi,
>
> Thank you for your reply. I downloaded bioperl-1.5.1 from
> http://bioperl.org/DIST/ and installed it (which appeared successful),
> but the one-liner:
>
> perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION, "\n"'
>
> prints 1.5 (I expected 1.5.1).
>
> When I run the test case that I reported earlier, I get the following
> output:
>
> -------------------- WARNING ---------------------
> MSG: There is no HSP data for hit 'ENSP00000327738'.
> You have called a method (Bio::Search::Hit::GenericHit::length_aln)
> that requires HSP data and there was no HSP data for this hit,
> most likely because it was absent from the BLAST report.
> Note that by default, BLAST lists alignments for the first 250 hits,
> but it lists descriptions for 500 hits. If this is the case,
> and you care about these hits, you should re-run BLAST using the
> -b option (or equivalent if not using blastall) to increase the number
> of alignments.
>
> ---------------------------------------------------
> Alignment length for ENSP00000327738 is -
> Alignment length for ENSP00000350182 is 250
> Alignment length for ENSP00000327738 is 398
>
> Could someone that is running 1.5.1 please verify the output of the
> one-liner above (did I somehow get the wrong file from the ftp site?)
> and try to reproduce the error with the test case?
>
> Thanks for the help. I'm stumped.
>
> Bob
>
> > -----Original Message-----
> > From: Brian Osborne [mailto:osborne1 at optonline.net]
> > Sent: Wednesday, August 16, 2006 6:21 PM
> > To: Freimuth, Robert; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Error parsing BLAST report
> >
> > Robert,
> >
> > The standard answer to a complaint about SearchIO these days
> > is to upgrade
> > to version 1.5.1 - what Bioperl version are you using?
> >
> > Brian O.
> >
> >
> > On 8/16/06 3:56 PM, "Freimuth, Robert"
> > <freimuth at pathology.wustl.edu> wrote:
> >
> > > Hello,
> > >
> > > I'm trying to parse a BLAST report using the following code:
> > >
> > > use warnings;
> > > use strict;
> > >
> > > use Bio::SearchIO;
> > >
> > > my $file = 'NP_006065_blast.out';
> > >
> > > my $searchio = new Bio::SearchIO( -format => 'blast',
> > > -file => $file );
> > >
> > > while( my $result = $searchio->next_result() )
> > > {
> > > while( my $hit = $result->next_hit )
> > > {
> > > my $hit_acc_num = $hit->accession();
> > >
> > > # get the total length of the aligned region
> > for query or
> > > sbjct seq
> > > # (includes all HSPs, calculated after tiling)
> > >
> > > my $align_len = $hit->length_aln( 'query' );
> > >
> > > print "Alignment length for $hit_acc_num is
> > $align_len\n";
> > > }
> > > }
> > >
> > > There are 104 one-line descriptions in the report, and
> > alignments for
> > > each one of them (the blast report was created using
> > > b_num_alignments_shown => 500 and v_num_descriptions_shown => 500).
> > > However, when I run the above code I get 14 errors like the
> > following:
> > >
> > > -------------------- WARNING ---------------------
> > > MSG: There is no HSP data for hit 'ENSP00000327738'.
> > > You have called a method (Bio::Search::Hit::GenericHit::length_aln)
> > > that requires HSP data and there was no HSP data for this hit,
> > > most likely because it was absent from the BLAST report.
> > > Note that by default, BLAST lists alignments for the first 250 hits,
> > > but it lists descriptions for 500 hits. If this is the case,
> > > and you care about these hits, you should re-run BLAST using the
> > > -b option (or equivalent if not using blastall) to increase
> > the number
> > > of alignments.
> > >
> > > ---------------------------------------------------
> > >
> > > There is an alignment for this (and the other 13 sequences) in the
> > > report. In fact, if I edit the report and delete all but the
> > > description and the alignment for ENSP00000327738, it
> > parses fine (no
> > > error).
> > >
> > > I continued editing the report and produced the following
> > minimal test
> > > case that reproduces the error. Note that the description for
> > > ENSP00000350182 appears twice, BUT THE ERROR IS FOR ENSP00000327738.
> > >
> > > *********** BLAST REPORT FOR TEST CASE ***********
> > >
> > > BLASTP 2.2.11 [Jun-05-2005]
> > >
> > >
> > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
> > > Schaffer,
> > > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> > > "Gapped BLAST and PSI-BLAST: a new generation of protein
> > database search
> > > programs", Nucleic Acids Res. 25:3389-3402.
> > >
> > > Query= NP_006065
> > > (442 letters)
> > >
> > > Database: Homo_sapiens.NCBI36.apr.pep.fa
> > > 48,851 sequences; 23,910,368 total letters
> > >
> > > Searching..................................................done
> > >
> > >
> > Score
> > > E
> > > Sequences producing significant alignments:
> > (bits)
> > > Value
> > >
> > > ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E...
> > > 120 3e-27
> > > ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E...
> > > 120 3e-27
> > > ENSP00000327738 pep:known-ccds chromosome:NCBI36:4:189297592:189...
> > > 115 8e-26
> > >
> > >> ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1
> > > gene:ENSG00000137397
> > > transcript:ENST00000357569
> > > Length = 425
> > >
> > > Score = 120 bits (301), Expect = 3e-27
> > > Identities = 76/261 (29%), Positives = 140/261 (53%), Gaps = 21/261
> > > (8%)
> > >
> > > Query: 9
> > IEKEVTCPICLELLTEPLSLDCGHSFCQACITAKIKESVIISRGESSCPVCQTRFQPGNL
> > > 68
> > > +++EV CPICL++L +P+++DCGH+FC CIT +I E+ S G
> > CP+C+T + +
> > > Sbjct: 10
> > LQEEVICPICLDILQKPVTIDCGHNFCLKCIT-QIGET---SCGFFKCPLCKTSVRKNAI
> > > 65
> > >
> > > Query: 69
> > RPNRHLANIVERVKEVKMSP-QEGQKRDVCEHHGKKLQIFCKEDGKVICWVCELSQEHQG
> > > 127
> > > R N L N+VE+++ ++ S Q +K C H + FC++DGK
> > +C+VC S++H+
> > > Sbjct: 66
> > RFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCCESKDHKS
> > > 125
> > >
> > > Query: 128
> > HQTFRINEVVKECQEKLQVALQRLIKEDQEAEKLED------DIRQERTAWKIERQKILK
> > > 181
> > > H I E + Q ++Q +Q L ++++E +++ D+ ++
> > + E+Q+IL
> > > Sbjct: 126
> > HNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQV--EHEKQRILT
> > > 183
> > >
> > > Query: 182
> > GFNEMRVILDNEEQRELQKL----EEGEVNVLDNLAAATDQLVQQRQDASTLISDLQRRL
> > > 237
> > > F + +L+ E+ L ++ EG +A+ QL D
> > L+ L+ +
> > > Sbjct: 184
> > EFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQL----NDLKKLVDSLKTKQ
> > > 239
> > >
> > > Query: 238 TGSSVEMLQDVIDVMKRSESW 258
> > > ++L+ + RSE +
> > > Sbjct: 240 NMPPRQLLEVTQPHLPRSEEF 260
> > >
> > >
> > >> ENSP00000327738 pep:known-ccds
> > > chromosome:NCBI36:4:189297592:189305643:1
> > > gene:ENSG00000184108 transcript:ENST00000332517
> > > CCDS3851.1
> > > Length = 468
> > >
> > > Score = 115 bits (289), Expect = 8e-26
> > > Identities = 101/410 (24%), Positives = 180/410 (43%),
> > Gaps = 39/410
> > > (9%)
> > >
> > > Query: 8
> > DIEKEVTCPICLELLTEPLSLDCGHSFCQACITAKIKESVIISRGESSCPVCQTRFQPGN
> > > 67
> > > ++ +E+TC ICL+ + P++ +CGHSFC C+ +E
> > SCP C + +
> > > Sbjct: 9
> > NLREELTCFICLDYFSSPVTTECGHSFCLVCLLRSWEE----HNTPLSCPECWRTLEGPH
> > > 64
> > >
> > > Query: 68
> > LRPNRHLANIVERVKEVKMSPQEGQKRDVCEHHGK-----KLQIFCKEDGKVICWVCELS
> > > 122
> > > + N L + ++++ Q Q D +G+ K ++
> > G ++
> > > Sbjct: 65
> > FQSNERLGRLASIARQLR--SQVLQSEDEQGSYGRMPTTAKALSDDEQGGSAF-----VA
> > > 117
> > >
> > > Query: 123
> > QEHQGHQTFRINEVVKECQEKLQVALQRLIKEDQEA------EKLEDDIRQERTAWKIER
> > > 176
> > > Q H ++ +E + +EKLQ L L +EA EK
> > + QE T K +
> > > Sbjct: 118
> > QSHGANRVHLSSEAEEHHREKLQEILNLLRVRRKEAQAVLTHEKERVKLCQEET--KTCK
> > > 175
> > >
> > > Query: 177
> > QKILKGFNEMRVILDNEEQRELQKLEEGEVNVLDNLAAATDQLVQQRQDASTLISDLQRR
> > > 236
> > > Q ++ + +M L EEQ +LQ LE+ E + L +L QQ +
> > S +I+ ++
> > > Sbjct: 176
> > QVVVSEYMKMHQFLKEEEQLQLQLLEQEEKENMRKLRNNEIKLTQQIRSLSKMIAQIESS
> > > 235
> > >
> > > Query: 237
> > LTGSSVEMLQDVIDVMKRSESWTXXXXXXXXXXXXXXFRVPDLSGMLQVLKELTDVQYYW
> > > 296
> > > S+ E L++V ++RSE + ++GM ++L++ +
> > > Sbjct: 236
> > SQSSAFESLEEVRGALERSE----PLLLQCPEATTTELSLCRITGMKEMLRKFS------
> > > 285
> > >
> > > Query: 297
> > VDVMLNPGSATSNVAISVDQRQVKTVRTCTFKNSNPCDF-SAFGVFGCQYFSSGKYYWEV
> > > 355
> > > ++ L+P +A + + +S D + VK + NP F + V G
> > Q F+SG++YWEV
> > > Sbjct: 286
> > TEITLDPATANAYLVLSEDLKSVKYGGSRQQLPDNPERFDQSATVLGTQIFTSGRHYWEV
> > > 345
> > >
> > > Query: 356 DVSGKIAWILGVHSKISSLNKRKSSGFAFDPSVNYSKVYSRYRPQYGYWV 405
> > > +V K W +G+ S + P +S + + Y WV
> > > Sbjct: 346 EVGNKTEWEVGICKDSVS----RKGNLPKPPGDLFSLIGLKIGDDYSLWV 391
> > >
> > >
> > > Database: Homo_sapiens.NCBI36.apr.pep.fa
> > > Posted date: Jun 15, 2006 8:56 PM
> > > Number of letters in database: 23,910,368
> > > Number of sequences in database: 48,851
> > >
> > > Lambda K H
> > > 0.319 0.133 0.398
> > >
> > > Gapped
> > > Lambda K H
> > > 0.267 0.0410 0.140
> > >
> > >
> > > Matrix: BLOSUM62
> > > Gap Penalties: Existence: 11, Extension: 1
> > > Number of Hits to DB: 20,900,506
> > > Number of Sequences: 48851
> > > Number of extensions: 899179
> > > Number of successful extensions: 6075
> > > Number of sequences better than 1.0e-25: 105
> > > Number of HSP's better than 0.0 without gapping: 18
> > > Number of HSP's successfully gapped in prelim test: 87
> > > Number of HSP's that attempted gapping in prelim test: 5632
> > > Number of HSP's gapped (non-prelim): 157
> > > length of query: 442
> > > length of database: 23,910,368
> > > effective HSP length: 107
> > > effective length of query: 335
> > > effective length of database: 18,683,311
> > > effective search space: 6258909185
> > > effective search space used: 6258909185
> > > T: 11
> > > A: 40
> > > X1: 16 ( 7.4 bits)
> > > X2: 38 (14.6 bits)
> > > X3: 64 (24.7 bits)
> > > S1: 41 (21.8 bits)
> > > S2: 289 (115.9 bits)
> > >
> > > *********** END BLAST REPORT FOR TEST CASE ***********
> > >
> > > Any ideas?
> > >
> > > Thanks,
> > >
> > > Bob
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list