[Biopython-dev] Notification: incoming/43

biopython-bugs at bioperl.org biopython-bugs at bioperl.org
Thu Sep 27 05:22:02 EDT 2001


JitterBug notification

new message incoming/43

Message summary for PR#43
	From: mkersz at pasteur.fr
	Subject: GenBank parser fails (on large files?)
	Date: Thu, 27 Sep 2001 05:22:01 -0400
	0 replies 	0 followups

====> ORIGINAL MESSAGE FOLLOWS <====

>From mkersz at pasteur.fr Thu Sep 27 05:22:02 2001
Received: from localhost (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f8R9M1p18288
	for <biopython-bugs at pw600a.bioperl.org>; Thu, 27 Sep 2001 05:22:01 -0400
Date: Thu, 27 Sep 2001 05:22:01 -0400
Message-Id: <200109270922.f8R9M1p18288 at pw600a.bioperl.org>
From: mkersz at pasteur.fr
To: biopython-bugs at bioperl.org
Subject: GenBank parser fails (on large files?)

Full_Name: Michel Kerszberg
Module: GenBank
Version: 1.00a3
OS: linux 2.2
Submission from: cache.pasteur.fr (157.99.64.13)


fetch
 
ftp://ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Mycobacterium_tuberculosis_H37Rv/AL123456.gbk

open this with 

file_handle = open( ... ,'r')
pars = GenBank.FeatureParser()
iter = GenBank.Iterator(file_handle, pars)
rec = iter.next()

This fails with:    

rec = iter.next()
  File "/usr/lib/python2.0/site-packages/Bio/GenBank/__init__.py", line 182, in
next
    return self._parser.parse(File.StringHandle(data))
  File "/usr/lib/python2.0/site-packages/Bio/GenBank/__init__.py", line 260, in
parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/lib/python2.0/site-packages/Bio/GenBank/__init__.py", line 1108, in
feed
    self._parser.parseFile(handle)
  File "/usr/lib/python2.0/site-packages/Martel/Parser.py", line 205, in
parseFile
    self.parseString(fileobj.read())
  File "/usr/lib/python2.0/site-packages/Martel/Parser.py", line 233, in
parseString
    self._err_handler.fatalError(result)
  File "/var/tmp/python-root//usr/lib/python2.0/xml/sax/handler.py", line 38, in
fatalError
Martel.Parser.ParserPositionException: error parsing at or beyond character 42

This is in the first line of the record, which seems
correctly formatted. No amount of massaging of the
file seems to help. 

I have seen this problem reported with other large
GenBank records.





More information about the Biopython-dev mailing list