[Bioperl-l] BLAST Parsing Bug?
Paul Boutros
pcboutro@engmail.uwaterloo.ca
Tue, 20 Aug 2002 14:18:21 -0400 (EDT)
Okay, same code, new Blast record, new error. The new blast record was
run with parameters:
-l (restricting it to a subset of GIs)
-v 10 (restrict to 10 hits)
-e 0.3 (expectation value)
The error seems to be suggesting that there is an empty line somewhere
where it isn't expected (i.e. midline = "\n").
Any comments? I've followed the suggestion of testing this both with
ActiveState and by downloading the libraries and installing them directly.
I get the same results either way.
I also verified all the dependencies are present and up-to-date.
Paul
error:
------------- EXCEPTION -------------
MSG: no data for midline Subset of the database(s) listed below
STACK Bio::SearchIO::blast::next_result
C:/Perl/site/lib/Bio/SearchIO/blast.pm:5
66
STACK toplevel blastp~1.pl:9
--------------------------------------
new blast file:
BLASTN 2.2.3 [Apr-24-2002]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= H3001A01-3(C0001A09-3)
(362 letters)
Database: est_others
5,032,538 sequences; 2,449,699,975 total letters
Score
E
Sequences producing significant alignments: (bits)
Value
gb|BI292210.1|BI292210 UI-R-DN0-civ-m-09-0-UI.s1 UI-R-DN0 Rattus... 274
1e-072
gb|BF290726.1|BF290726 EST455317 Rat Gene Index, normalized rat,... 266
3e-070
gb|BI301460.1|BI301460 UI-R-DN0-cit-e-07-0-UI.s1 UI-R-DN0 Rattus... 260
2e-068
>gb|BI292210.1|BI292210 UI-R-DN0-civ-m-09-0-UI.s1 UI-R-DN0 Rattus
norvegicus cDNA clone
UI-R-DN0-civ-m-09-0-UI 3'
Length = 468
Score = 274 bits (138), Expect = 1e-072
Identities = 168/178 (94%)
Strand = Plus / Plus
Query: 1 aagatttatttatttattccatgtataggaatacactgtagctgtcttcagacacaccag 60
|||||||||||||||||| ||| ||| |||||||||||||||||||||||| |||||||
Sbjct: 20 aagatttatttatttattttatgcatatgaatacactgtagctgtcttcagatacaccag 79
Query: 61 aagagggcatcagatctcattgcagatggctgtgagccaccatgtggttgctgggatttg
120
||||||||||||||||| ||| ||||||| ||||||||||||||||||||||||||||||
Sbjct: 80 aagagggcatcagatcttattacagatggttgtgagccaccatgtggttgctgggatttg
139
Query: 121 aactcaggacctctggaagagcagtcggtgctcttaaccgctgagccatctctccagc 178
|||||||||||||||||||||||||| |||||||||||| ||||||||||||||||||
Sbjct: 140 aactcaggacctctggaagagcagtcagtgctcttaaccactgagccatctctccagc 197
>gb|BF290726.1|BF290726 EST455317 Rat Gene Index, normalized rat, Rattus
norvegicus cDNA
Rattus norvegicus cDNA clone RGIIB68 3' sequence
Length = 223
Score = 266 bits (134), Expect = 3e-070
Identities = 167/178 (93%)
Strand = Plus / Plus
Query: 1 aagatttatttatttattccatgtataggaatacactgtagctgtcttcagacacaccag 60
|||||||||||||||||| ||||||| || |||||||||||||||||||||||||||||
Sbjct: 1 aagatttatttatttatttcatgtatgtgagtacactgtagctgtcttcagacacaccag 60
Query: 61 aagagggcatcagatctcattgcagatggctgtgagccaccatgtggttgctgggatttg
120
|||||||||||||||| ||| | ||||| |||||| ||||||||||||||||||| |||
Sbjct: 61 aagagggcatcagatcccatcacggatggttgtgaggcaccatgtggttgctgggaattg
120
Query: 121 aactcaggacctctggaagagcagtcggtgctcttaaccgctgagccatctctccagc 178
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 aactcaggacctctggaagagcagtcggtgctcttaaccgctgagccatctctccagc 178
>gb|BI301460.1|BI301460 UI-R-DN0-cit-e-07-0-UI.s1 UI-R-DN0 Rattus
norvegicus cDNA clone
UI-R-DN0-cit-e-07-0-UI 3'
Length = 524
Score = 260 bits (131), Expect = 2e-068
Identities = 164/175 (93%)
Strand = Plus / Plus
Query: 4 atttatttatttattccatgtataggaatacactgtagctgtcttcagacacaccagaag 63
||||||||||||||| | |||| || |||| |||||||||||||||||||||||||||
Sbjct: 210 atttatttatttatttattatatatgagtacattgtagctgtcttcagacacaccagaag
269
Query: 64 agggcatcagatctcattgcagatggctgtgagccaccatgtggttgctgggatttgaac
123
|||||||||||||||||| ||||||| |||||||||||||||||||||||||||||||||
Sbjct: 270 agggcatcagatctcattacagatggttgtgagccaccatgtggttgctgggatttgaac
329
Query: 124 tcaggacctctggaagagcagtcggtgctcttaaccgctgagccatctctccagc 178
||||||||||||||||||||||| ||||||||||||||||||||| |||||||||
Sbjct: 330 tcaggacctctggaagagcagtcagtgctcttaaccgctgagccacctctccagc 384
Subset of the database(s) listed below
Number of letters searched: 123,827,604
Number of sequences searched: 285,629
Database: est_others
Posted date: Aug 15, 2002 12:08 PM
Number of letters in database: 333,332,922
Number of sequences in database: 0
Database: c:\docume~1\paul\blast\data\est_others.01
Posted date: Aug 15, 2002 12:21 PM
Number of letters in database: 333,333,126
Number of sequences in database: 734,123
Database: c:\docume~1\paul\blast\data\est_others.02
Posted date: Aug 15, 2002 12:33 PM
Number of letters in database: 333,332,951
Number of sequences in database: 710,185
Database: c:\docume~1\paul\blast\data\est_others.03
Posted date: Aug 15, 2002 12:45 PM
Number of letters in database: 333,332,998
Number of sequences in database: 651,575
Database: c:\docume~1\paul\blast\data\est_others.04
Posted date: Aug 15, 2002 12:56 PM
Number of letters in database: 333,332,826
Number of sequences in database: 637,159
Database: c:\docume~1\paul\blast\data\est_others.05
Posted date: Aug 15, 2002 1:07 PM
Number of letters in database: 333,333,104
Number of sequences in database: 630,795
Database: c:\docume~1\paul\blast\data\est_others.06
Posted date: Aug 15, 2002 1:19 PM
Number of letters in database: 333,332,943
Number of sequences in database: 650,535
Database: c:\docume~1\paul\blast\data\est_others.07
Posted date: Aug 15, 2002 1:28 PM
Number of letters in database: 116,369,105
Number of sequences in database: 227,351
Lambda K H
1.37 0.711 1.31
Gapped
Lambda K H
1.37 0.711 1.31
Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 49,566
Number of Sequences: 4241723
Number of extensions: 49566
Number of successful extensions: 21421
Number of sequences better than 0.3: 2335
length of query: 362
length of database: 123,827,604
effective HSP length: 18
effective length of query: 344
effective length of database: 118,686,282
effective search space: 40828081008
effective search space used: 40828081008
T: 0
A: 40
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 19 (38.2 bits)
BLASTN 2.2.3 [Apr-24-2002]
On Tue, 20 Aug 2002, Jason Stajich wrote:
> Because the parser expects to be parsing a full blast report - you are
> only providing it with a report which has hits but no hsps.
>
> At some point we can adapt the module to parse these types of reports, but
> for now it is only going to work with reports that have the full
> alignments included.
>
> -jason
>
> On Tue, 20 Aug 2002, Paul Boutros wrote:
>
> > Hello,
> >
> > I am just starting with Bioperl, trying to evaluate how useful it will be
> > for our group. I'm struggling with getting it to work on my first few
> > steps here, though. I would like to use the SearchIO system to parse a
> > blast-results file and I can strange results.
> >
> > System: Win2k Pro (sp3)
> > Perl: 5.6.1 ActiveState build 631 (all packages are updated)
> > BioPerl: 1.00.2
> >
> > The basic problem is that the parser isn't finding any of the hits. At
> > all. So the code below comes back with $count=0 for every record in the
> > BLAST output file. Any ideas what I'm doing wrong?
> >
> > Paul
> >
> >
> > Code:
> > use strict;
> > use Bio::SearchIO;
> >
> > my $searchio = new Bio::SearchIO(
> > '-format' => 'blast',
> > '-file' => '15k5prime.out',
> > );
> >
> > while (my $result = $searchio->next_result()) {
> >
> > my $count = 0;
> >
> > print "Name: ", $result->query_name(), "\n";
> >
> > while (my $hit = $result->next_hit()) {
> > $count++;
> > }
> >
> > print "Count: $count\n";
> >
> > }
> >
> > Blast File Fragment:
> >
> > BLASTN 2.2.3 [Apr-24-2002]
> >
> >
> > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
> > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> > "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> > programs", Nucleic Acids Res. 25:3389-3402.
> >
> > Query= H3001A01-5
> > (589 letters)
> >
> > Database: est_others
> > 5,032,538 sequences; 2,449,699,975 total letters
> >
> >
> >
> > Score
> > E
> > Sequences producing significant alignments: (bits)
> > Value
> >
> > gb|BQ206993.1|BQ206993 UI-R-DZ1-cnm-h-16-0-UI.s1 UI-R-DZ1 Rattus... 200
> > 3e-050
> > gb|BM386877.1|BM386877 UI-R-CN1-cjh-d-20-0-UI.s1 UI-R-CN1 Rattus... 198
> > 1e-049
> > gb|BI301905.1|BI301905 UI-R-DL0-cio-k-03-0-UI.s1 UI-R-DL0 Rattus... 198
> > 1e-049
> > gb|BI301460.1|BI301460 UI-R-DN0-cit-e-07-0-UI.s1 UI-R-DN0 Rattus... 198
> > 1e-049
> > gb|BG371847.1|BG371847 UI-R-CV0-brj-a-09-0-UI.s1 UI-R-CV0 Rattus... 198
> > 1e-049
> > gb|BE115424.1|BE115424 UI-R-BS1-axu-f-02-0-UI.s1 UI-R-BS1 Rattus... 198
> > 1e-049
> > gb|AA819696.1|AA819696 UI-R-A0-bh-d-10-0-UI.s1 UI-R-A0 Rattus no... 192
> > 6e-048
> > gb|BM383271.1|BM383271 UI-R-DS0-cje-i-16-0-UI.s1 UI-R-DS0 Rattus... 190
> > 2e-047
> > gb|BI292210.1|BI292210 UI-R-DN0-civ-m-09-0-UI.s1 UI-R-DN0 Rattus... 190
> > 2e-047
> > gb|BI284655.1|BI284655 UI-R-DE0-cac-f-05-0-UI.s1 UI-R-DE0 Rattus... 190
> > 2e-047
> >
> > Subset of the database(s) listed below
> > Number of letters searched: 123,827,604
> > Number of sequences searched: 285,629
> >
> > Database: est_others
> > Posted date: Aug 15, 2002 12:08 PM
> > Number of letters in database: 333,332,922
> > Number of sequences in database: 0
> >
> > Database: c:\docume~1\paul\blast\data\est_others.01
> > Posted date: Aug 15, 2002 12:21 PM
> > Number of letters in database: 333,333,126
> > Number of sequences in database: 734,123
> >
> > Database: c:\docume~1\paul\blast\data\est_others.02
> > Posted date: Aug 15, 2002 12:33 PM
> > Number of letters in database: 333,332,951
> > Number of sequences in database: 710,185
> >
> > Database: c:\docume~1\paul\blast\data\est_others.03
> > Posted date: Aug 15, 2002 12:45 PM
> > Number of letters in database: 333,332,998
> > Number of sequences in database: 651,575
> >
> > Database: c:\docume~1\paul\blast\data\est_others.04
> > Posted date: Aug 15, 2002 12:56 PM
> > Number of letters in database: 333,332,826
> > Number of sequences in database: 637,159
> >
> > Database: c:\docume~1\paul\blast\data\est_others.05
> > Posted date: Aug 15, 2002 1:07 PM
> > Number of letters in database: 333,333,104
> > Number of sequences in database: 630,795
> >
> > Database: c:\docume~1\paul\blast\data\est_others.06
> > Posted date: Aug 15, 2002 1:19 PM
> > Number of letters in database: 333,332,943
> > Number of sequences in database: 650,535
> >
> > Database: c:\docume~1\paul\blast\data\est_others.07
> > Posted date: Aug 15, 2002 1:28 PM
> > Number of letters in database: 116,369,105
> > Number of sequences in database: 227,351
> >
> > Lambda K H
> > 1.37 0.711 1.31
> >
> > Gapped
> > Lambda K H
> > 1.37 0.711 1.31
> >
> >
> > Matrix: blastn matrix:1 -3
> > Gap Penalties: Existence: 5, Extension: 2
> > Number of Hits to DB: 69,708
> > Number of Sequences: 4241723
> > Number of extensions: 69708
> > Number of successful extensions: 25280
> > Number of sequences better than 0.3: 2163
> > length of query: 589
> > length of database: 123,827,604
> > effective HSP length: 18
> > effective length of query: 571
> > effective length of database: 118,686,282
> > effective search space: 67769867022
> > effective search space used: 67769867022
> > T: 0
> > A: 40
> > X1: 6 (11.9 bits)
> > X2: 15 (29.7 bits)
> > S1: 12 (24.3 bits)
> > S2: 19 (38.2 bits)
> > BLASTN 2.2.3 [Apr-24-2002]
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>