[Bioperl-l] Parsing BLASTP or TBLASTN reveals subtle query_length =
0 bug
Matthew Vaughn
vaughn at cshl.org
Fri Jun 6 12:55:44 EDT 2003
I've got some large BLASTP and TBLASTN reports to extract data from and
I've run into some issues that I think are coming from the
Bio:SearchIO:psiblast parser
Essentially, instead of $result->query_length returning the length of
the query sequence, it is returning zero. The reports are coming from
the most recent BLAST release, but I've run into this same problem
parsing reports from a couple point releases back. I took a look at
the raw BLAST files and have uncovered a pattern that is illustrated in
the following 4 test cases. In each of the cases labeled 'FAILURE CASE'
there is a blank line after the Query description before the length of
the query is provided - these two results return a query_length of 0.
Contrast this with the test cases labeled 'SUCCESS CASE' where the
proper length is returned. Presumably, the extra white space is
confusing the BLAST parser.
-FAILURE CASE 1-
Query= At2g02830.1 68409.m00200 retroelement pol polyprotein -related
(104 letters)
Database: athrep.ref
457 sequences; 1,462,624 total letters
Searching.done
Score E
Sequences producing significant alignments:
(bits) Value
ATCOPIA62_I
157 6e-41
ATCOPIA11I
102 4e-24
..
-FAILURE CASE 2-
Query= At2g04140.1 68409.m00353 retroelement pol polyprotein -related
(88 letters)
Database: athrep.ref
457 sequences; 1,462,624 total letters
Searching.done
Score E
Sequences producing significant alignments:
(bits) Value
META1_I
179 1e-47
ATCOPIA28_I
177 6e-47
..
-SUCCESS CASE 1-
Query= At2g01022.1 68409.m00001 polyprotein, putative similar to
polyprotein [Ananas comosus] GI:2995405; contains Pfam profile
PF00078: Reverse transcriptase (RNA-dependent DNA polymerase)
(660 letters)
Database: athrep.ref
457 sequences; 1,462,624 total letters
Searching.done
Score E
Sequences producing significant alignments:
(bits) Value
ATGP1I
1203 0.0
ATGP2I
870 0.0
..
-SUCCESS CASE 2-
Query= At2g03080.1 68409.m00227 reverse transcriptase -related
(137 letters)
Database: athrep.ref
457 sequences; 1,462,624 total letters
Searching.done
Score E
Sequences producing significant alignments:
(bits) Value
META1_I
231 5e-63
ATCOPIA28_I
229 3e-62
..
--
Matthew W. Vaughn, Ph.D.
Cold Spring Harbor Laboratory
Delbruck Laboratory / Martienssen Group
1 Bungtown Road
Cold Spring Harbor, NY 11724
phone: (516) 422-4128
More information about the Bioperl-l
mailing list