[BioPython] Cannot parse ApE plasmid editor GenBank file
Martin MOKREJŠ
mmokrejs at ribosome.natur.cuni.cz
Mon Jun 25 14:31:49 UTC 2007
Hi Peter,
I have re-tried current CVS version of biopyhton with a file regenerated
by fixed version of ApE editor. Unfortunately, I got:
$ python generate_image_from_genbank.py
Traceback (most recent call last):
File "generate_image_from_genbank.py", line 7, in ?
genbank_entry = parser.parse(fhandle)
File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 187, in parse
self._scanner.feed(handle, self._consumer)
File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 360, in feed
self._feed_first_line(consumer, self.line)
File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 876, in _feed_first_line
raise SyntaxError('Did not recognise the LOCUS line layout:\n' + line)
SyntaxError: Did not recognise the LOCUS line layout:
LOCUS pBL-RLuc-GBB+3-III 5391 bp ds-DNA circular 14-JUN-2007
What's wrong with the LOCUS line now? Bioperl from CVS can read it, and
I thought it is already following the current specs. ;-)
Thanks for your help,
Martin
Peter wrote:
> Martin MOKREJŠ wrote:
>> Hi Peter, Chris and others, here I am passing the answer from Wayne
>> back, sorry for the difficult cross-communication.
>
> Thank you both, Martin & Wayne.
>
> Wayne Davis wrote:
>> [the] locus line I'm using is the old standard (some older parsers
> > wanted it that way).
>
> That's worth knowing - thank you. Give that, maybe we (Biopython)
> should try and parse these files (which aside from the missing
> identifier in the LOCUS line should be fairly simple). On the other
> hand, I doubt many people still use this particular the old format.
>
> Wayne Davis wrote:
>>> I've updated to write the new standard, if your
>>> program isn't flexible enough to read the old style locus lines.
>
> That's good news. Martin - will this solve your problem, or do you
> think we should also update Biopython to cope with these "old style"
> LOCUS lines (which also lack identifiers)?
>
> Wayne Davis wrote:
>>> We encourage software developers to switch to a token-based LOCUS
>>> parsing approach, rather than a column-specific approach. If this
>>> is done, then future changes to the LOCUS line that affect only the
>>> spacing of its data values will not require any modifications to
> >> software.
>
> Easier said than done, as some fields can also contain white space.
> However, Howard Salis has some interesting code to tackle this attached
> to Biopython bug 2294.
>
> Peter wrote:
>>> The next six lines of that example file (elh/pNEX3.gb) have no
>>> values - as Chris Fields pointed out on the Biopython mailing list,
>>> the NCBI likes to use a dot/period as a place holder.
>>>
>>> The spec does explicitly say that the KEYWORDS can be omitted, but
>>> seems to assume the other lines are expected. Biopython should be
>>> happy if these lines are just omitted.
>
> Just to correct myself, many of those fields are described as mandatory
> single entries further up in the documentation - so using a dot/period
> (as Wayne has done for the ApE plasmid editor) does seem the best solution.
>
> Quoting: ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt
>> 3.4.2 Entry Organization
>> ...
>> The following is a brief description of each entry field. Detailed
>> information about each field may be found in Sections 3.4.4 to 3.4.15.
>>
>> LOCUS ... Mandatory keyword/exactly one record. DEFINITION ...
>> Mandatory keyword/one or more records. ACCESSION ... Mandatory
>> keyword/one or more records. VERSION... Mandatory keyword/exactly one
>> record. ...
>
> KEYWORDS, SOURCE and ORGANISM are described as mandatory in all annotated
> entries (so not mandatory in general). COMMENT is optional.
>
> Peter
>
>
>
--
Dr. Martin Mokrejs
Dept. of Genetics and Microbiology
Faculty of Science, Charles University
Vinicna 5, 128 43 Prague, Czech Republic
http://www.iresite.org
http://www.iresite.org/~mmokrejs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pGL3R.gb.gz
Type: application/x-tar
Size: 3117 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20070625/f6e08a5a/attachment-0002.tar>
More information about the Biopython
mailing list