blastn

thomas at cbs.dtu.dk thomas at cbs.dtu.dk
Sat Aug 5 05:05:11 EDT 2000


Full_Name: thomas sichertiz-ponten
Module: Blast/NCBIStandalone
Version: 
OS: linux, IRIX
Submission from: molev106.ebc.uu.se (130.238.82.106)


Problem:
cannot parse a multiple blastnresult because of
?hardcoded? amount of whitespaces ?

#script .....

import sys, os
sys.path.insert(0, os.path.expanduser('~thomas/cbs/python/biopython'))
from Bio.Blast import NCBIStandalone
from Bio.Data import IUPACData


file = 'blasttest.blastn'
parser = NCBIStandalone.BlastParser()
iter = NCBIStandalone.Iterator(handle = open(file), parser = parser)

while 1:
    res = iter.next()

---- SNIP ----- SNIP ------
# result 
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "/usr/tmp/python-Oq3ztf", line 18, in ?
    res = iter.next()
  File "/home/genome6/thomas/cbs/python/biopython/Bio/Blast/NCBIStandalone.py",
line 1199, in next
    return self._parser.parse(File.StringHandle(data))
  File "/home/genome6/thomas/cbs/python/biopython/Bio/Blast/NCBIStandalone.py",
line 463, in parse
    self._scanner.feed(handle, self._consumer)
  File "/home/genome6/thomas/cbs/python/biopython/Bio/Blast/NCBIStandalone.py",
line 68, in feed
    self._scan_rounds(uhandle, consumer)
  File "/home/genome6/thomas/cbs/python/biopython/Bio/Blast/NCBIStandalone.py",
line 121, in _scan_rounds
    self._scan_alignments(uhandle, consumer)
  File "/home/genome6/thomas/cbs/python/biopython/Bio/Blast/NCBIStandalone.py",
line 226, in _scan_alignments
    self._scan_pairwise_alignments(uhandle, consumer)
  File "/home/genome6/thomas/cbs/python/biopython/Bio/Blast/NCBIStandalone.py",
line 236, in _scan_pairwise_alignments
    self._scan_one_pairwise_alignment(uhandle, consumer)
  File "/home/genome6/thomas/cbs/python/biopython/Bio/Blast/NCBIStandalone.py",
line 241, in _scan_one_pairwise_alignment
    self._scan_alignment_header(uhandle, consumer)
  File "/home/genome6/thomas/cbs/python/biopython/Bio/Blast/NCBIStandalone.py",
line 267, in _scan_alignment_header
    read_and_call(uhandle, consumer.noevent, start='          ')
  File "/home/genome6/thomas/cbs/python/biopython/Bio/ParserSupport.py", line
140, in read_and_call
    raise SyntaxError, errmsg
SyntaxError: Line does not start with '          ':


--- SNIP --- SNIP -----
#blasttest.blastn
BLASTN 2.0.14 [Jun-29-2000]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= M15353
         (100 letters)

Database: ensembl.cdna
           37,720 sequences; 24,543,038 total letters

Searching..................................................done

                                                               Score     E
Sequences producing significant alignments:                    (bits)  Value  N

ENST00000044731 Gene:ENSG00000041402 Clone:AC060233 Cont...   182  4e-46  1
ENST00000041234 Gene:ENSG00000038511 Clone:AC015993 Cont...   163  3e-40  1

>ENST00000044731 Gene:ENSG00000041402 Clone:AC060233 Contig:AC060233.00036
          Length = 654

 Score =  182 bits (92), Expect = 4e-46
 Identities = 98/100 (98%)
 Strand = Plus / Plus

                                                                       
Query: 1   atggcgactgtcgaaccggaaaccacccctactcctaatcccccgactacagaagaggag 60
           |||||||| ||||||||||||||||||||||||||||||||||||||||||||| |||||
Sbjct: 1   atggcgaccgtcgaaccggaaaccacccctactcctaatcccccgactacagaaaaggag 60

                                                   
Query: 61  aaaacggaatctaatcaggaggttgctaacccagaacact 100
           ||||||||||||||||||||||||||||||||||||||||
Sbjct: 61  aaaacggaatctaatcaggaggttgctaacccagaacact 100


>ENST00000041234 Gene:ENSG00000038511 Clone:AC015993 Contig:AC015993.00011
          Length = 361

 Score =  163 bits (82), Expect = 3e-40
 Identities = 82/82 (100%)
 Strand = Plus / Plus

                                                                       
Query: 19  gaaaccacccctactcctaatcccccgactacagaagaggagaaaacggaatctaatcag 78
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1   gaaaccacccctactcctaatcccccgactacagaagaggagaaaacggaatctaatcag 60

                                 
Query: 79  gaggttgctaacccagaacact 100
           ||||||||||||||||||||||
Sbjct: 61  gaggttgctaacccagaacact 82


  Database: ensembl.cdna
    Posted date:  Aug 3, 2000  1:07 PM
  Number of letters in database: 24,543,038
  Number of sequences in database:  37,720
  
Lambda     K      H
    1.37    0.711     1.31 


Matrix: blastn matrix:1 -3
Number of Hits to DB: 2
Number of Sequences: 37720
Number of extensions: 2
Number of successful extensions: 2
Number of sequences better than 10.0: 2
length of query: 100
length of database: 24,543,038
effective HSP length: 16
effective length of query: 84
effective length of database: 23,939,518
effective search space: 2010919512
effective search space used: 2010919512
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 10 (19.8 bits)
S1: 12 (24.3 bits)
S2: 14 (28.2 bits)
BLASTN 2.0.14 [Jun-29-2000]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= X76013
         (100 letters)

Database: ensembl.cdna
           37,720 sequences; 24,543,038 total letters

Searching..................................................done

                                                               Score     E
Sequences producing significant alignments:                    (bits)  Value  N

ENST00000040999 Gene:ENSG00000038136 Clone:AC016581 Cont...    34  0.20  1

>ENST00000040999 Gene:ENSG00000038136 Clone:AC016581 Contig:AC016581.00002
          Length = 438

 Score = 34.2 bits (17), Expect = 0.20
 Identities = 17/17 (100%)
 Strand = Plus / Plus

                           
Query: 38 tcggcctgagcgagcag 54
          |||||||||||||||||
Sbjct: 29 tcggcctgagcgagcag 45


  Database: ensembl.cdna
    Posted date:  Aug 3, 2000  1:07 PM
  Number of letters in database: 24,543,038
  Number of sequences in database:  37,720
  
Lambda     K      H
    1.37    0.711     1.31 


Matrix: blastn matrix:1 -3
Number of Hits to DB: 2
Number of Sequences: 37720
Number of extensions: 2
Number of successful extensions: 2
Number of sequences better than 10.0: 1
length of query: 100
length of database: 24,543,038
effective HSP length: 16
effective length of query: 84
effective length of database: 23,939,518
effective search space: 2010919512
effective search space used: 2010919512
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 10 (19.8 bits)
S1: 12 (24.3 bits)
S2: 14 (28.2 bits)
BLASTN 2.0.14 [Jun-29-2000]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= U66617
         (100 letters)

Database: ensembl.cdna
           37,720 sequences; 24,543,038 total letters

Searching..................................................done

                                                               Score     E
Sequences producing significant alignments:                    (bits)  Value  N

ENST00000038861 Gene:ENSG00000036360 Clone:AC025361 Cont...   198  6e-51  1
ENST00000010117 Gene:ENSG00000007819 Clone:AL031228 Cont...    32  0.81  1

>ENST00000038861 Gene:ENSG00000036360 Clone:AC025361 Contig:AC025361.00005
          Length = 605

 Score =  198 bits (100), Expect = 6e-51
 Identities = 100/100 (100%)
 Strand = Plus / Plus

                                                               
====> MESSAGE TRUNCATED AT 8192 <====





More information about the Biopython-dev mailing list