[BioPython] GenBank records again
Andreas Kuntzagk
andreas.kuntzagk at mdc-berlin.de
Thu Feb 27 09:17:25 EST 2003
Hi,
> Thank you very much for the GenBank record things. Now I am trying to
> retrieve protein sequences with a file of GenBank ids. My script is the following:
>
> from Bio import GenBank
> import sys
>
> file = sys.argv[1]
> fp1 = open(file, 'r+') #file of gi
> ids = fp1.read()
>
> lids = ids.split()
> recNum = len(lids)
>
> protein_ncbi_dict = GenBank.NCBIDictionary(database='protein',
> format='gp', parser=GenBank.FeatureParser())
>
> for i in range(0, recNum):
> gb_record = protein_ncbi_dict[lids[i]]
> print '>'+ gb_record.id[0:-2] + ' ' + gb_record.seq.data
>
> The script works well most of the time, but sometimes it gives an error
> message:
>
> Traceback (most recent call last):
> File "getGBRecords.py", line 25, in ?
> gb_record = protein_ncbi_dict[lids[i]]
> File "/bio/python2.2/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line
> 1563, in __getitem__ return self.parser.parse(handle)
> File "/bio/python2.2/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line
> 268, in parse self._scanner.feed(handle, self._consumer)
> File "/bio/python2.2/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line
> 1255, in feed self._parser.parseFile(handle)
> File "/bio/python2.2/lib/python2.2/site-packages/Martel/Parser.py", line
> 338, in parseFile self.parseString(fileobj.read())
> File "/bio/python2.2/lib/python2.2/site-packages/Martel/Parser.py", line
> 366, in parseString self._err_handler.fatalError(result)
> File "/bio/python2.2/lib/python2.2/xml/sax/handler.py", line 38, in
> fatalError raise exception
> Martel.Parser.ParserPositionException: error parsing at or beyond character 14
>
>
> What is the reason for the problem? It seems that the problem is in the
> parser part, but I just don't know why. Can anybody help?
It will probably help if you can give the ids where this happens.
Also you could use
parser= GenBank.FeatureParser(debug=2)
This would give some info, where the parser chokes. (It is quit noisy
though and not easy to understand.)
I think, character 14 means, it's somewhere in the beginning of the
entry.
Ciao, Andreas
More information about the BioPython
mailing list