[Biopython-dev] BUG: blastparser: expect(2)
thomas at cbs.dtu.dk
thomas at cbs.dtu.dk
Fri Aug 11 07:47:38 EDT 2000
Hi,
The blastparser fails while reading a blastall result with the "-g = F" option.
(-g Perfom gapped alignment (not available with tblastx) [T/F] default = T)
Expect(2) means that there are 2 alignments for the same Sbjct:
c ya
-thomas
example code
##############################################
from Bio.Blast import NCBIStandalone
from Bio.Data import IUPACData
file = 'test.blastn'
parser = NCBIStandalone.BlastParser()
iter = NCBIStandalone.Iterator(handle = open(file), parser = parser)
while 1:
rec = iter.next()
if not rec: break
#############
results in:
##############################################
File "/home/genome6/thomas/cbs/python/biopython/Bio/Blast/NCBIStandalone.py", line 587, in _parse
dh.score = _safe_int(dh.score)
File "/home/genome6/thomas/cbs/python/biopython/Bio/Blast/NCBIStandalone.py", line 1469, in _safe_int
return long(str)
ValueError: invalid literal for long(): 5e-45
#########
the blast file:
##############################################
BLASTN 2.0.14 [Jun-29-2000]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= HUMAGCGB
(100 letters)
Database: ./ensembl.cdna
37,720 sequences; 24,543,038 total letters
Searching..................................................done
Score E
Sequences producing significant alignments: (bits) Value N
ENST00000022209 Gene:ENSG00000020685 Clone:AC012263 Cont... 153 5e-45 2
ENST00000008890 Gene:ENSG00000008430 Clone:AC007637 Cont... 28 13 1
>ENST00000022209 Gene:ENSG00000020685 Clone:AC012263 Contig:AC012263.00001
Length = 2673
Score = 46.1 bits (23), Expect(2) = 5e-45
Identities = 23/23 (100%)
Strand = Plus / Plus
Query: 1 atggagaccgtggtttgcccaag 23
|||||||||||||||||||||||
Sbjct: 1742 atggagaccgtggtttgcccaag 1764
Score = 153 bits (77), Expect(2) = 5e-45
Identities = 77/77 (100%)
Strand = Plus / Plus
Query: 24 gccctgggaagagaggcggaaacggagaagcctttccagtgaccgtgggaggacaaccca 83
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1764 gccctgggaagagaggcggaaacggagaagcctttccagtgaccgtgggaggacaaccca 1823
Query: 84 ttcaccatatgaggaac 100
|||||||||||||||||
Sbjct: 1824 ttcaccatatgaggaac 1840
>ENST00000008890 Gene:ENSG00000008430 Clone:AC007637 Contig:AC007637.00001
Length = 1530
Score = 28.2 bits (14), Expect = 13
Identities = 14/14 (100%)
Strand = Plus / Plus
Query: 26 cctgggaagagagg 39
||||||||||||||
Sbjct: 57 cctgggaagagagg 70
Database: ./ensembl.cdna
Posted date: Aug 3, 2000 1:07 PM
Number of letters in database: 24,543,038
Number of sequences in database: 37,720
Lambda K H
1.37 0.711 1.31
Matrix: blastn matrix:1 -3
Number of Hits to DB: 3
Number of Sequences: 37720
Number of extensions: 3
Number of successful extensions: 3
Number of sequences better than 10.0: 2
length of query: 100
length of database: 24,543,038
effective HSP length: 16
effective length of query: 84
effective length of database: 23,939,518
effective search space: 2010919512
effective search space used: 2010919512
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 10 (19.8 bits)
S1: 12 (24.3 bits)
S2: 14 (28.2 bits)
BLASTN 2.0.14 [Jun-29-2000]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= HUMAGCGB
(100 letters)
Database: ./ensembl.cdna
37,720 sequences; 24,543,038 total letters
Searching..................................................done
Score E
Sequences producing significant alignments: (bits) Value N
ENST00000022209 Gene:ENSG00000020685 Clone:AC012263 Cont... 153 5e-45 2
ENST00000008890 Gene:ENSG00000008430 Clone:AC007637 Cont... 28 13 1
>ENST00000022209 Gene:ENSG00000020685 Clone:AC012263 Contig:AC012263.00001
Length = 2673
Score = 46.1 bits (23), Expect(2) = 5e-45
Identities = 23/23 (100%)
Strand = Plus / Plus
Query: 1 atggagaccgtggtttgcccaag 23
|||||||||||||||||||||||
Sbjct: 1742 atggagaccgtggtttgcccaag 1764
Score = 153 bits (77), Expect(2) = 5e-45
Identities = 77/77 (100%)
Strand = Plus / Plus
Query: 24 gccctgggaagagaggcggaaacggagaagcctttccagtgaccgtgggaggacaaccca 83
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1764 gccctgggaagagaggcggaaacggagaagcctttccagtgaccgtgggaggacaaccca 1823
Query: 84 ttcaccatatgaggaac 100
|||||||||||||||||
Sbjct: 1824 ttcaccatatgaggaac 1840
>ENST00000008890 Gene:ENSG00000008430 Clone:AC007637 Contig:AC007637.00001
Length = 1530
Score = 28.2 bits (14), Expect = 13
Identities = 14/14 (100%)
Strand = Plus / Plus
Query: 26 cctgggaagagagg 39
||||||||||||||
Sbjct: 57 cctgggaagagagg 70
Database: ./ensembl.cdna
Posted date: Aug 3, 2000 1:07 PM
Number of letters in database: 24,543,038
Number of sequences in database: 37,720
Lambda K H
1.37 0.711 1.31
Matrix: blastn matrix:1 -3
Number of Hits to DB: 3
Number of Sequences: 37720
Number of extensions: 3
Number of successful extensions: 3
Number of sequences better than 10.0: 2
length of query: 100
length of database: 24,543,038
effective HSP length: 16
effective length of query: 84
effective length of database: 23,939,518
effective search space: 2010919512
effective search space used: 2010919512
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 10 (19.8 bits)
S1: 12 (24.3 bits)
S2: 14 (28.2 bits)
########
--
Sicheritz Ponten Thomas E. CBS, Department of Biotechnology
blippblopp at linux.nu The Technical University of Denmark
CBS: +45 45 252485 Building 208, DK-2800 Lyngby
Fax +45 45 931585 http://www.cbs.dtu.dk/thomas/index.html
De Chelonian Mobile ... The Turtle Moves ...
More information about the Biopython-dev
mailing list