[BioPython] Cannot parse GenBank file
Chris Fields
cjfields at uiuc.edu
Tue Jun 5 16:07:41 UTC 2007
One thing I missed which explains the biopython error: the LOCUS line
is missing the locus identifier (see the NCBI example record link).
This doesn't choke the bioperl parser but it appears to stop the
biopython parser in it's tracks (maybe a feature instead of a bug!).
You should try adding a unique identifier (maybe the name of the file
or record) to the LOCUS line to see if it works:
LOCUS testfile 6499 bp ds-DNA linear 02-AUG-2006
The bioperl parser in CVS writes out the correct alphabet when this
is added:
LOCUS testfile 6499 bp ds-DNA linear 02-
AUG-2006
I'll try adding a warning to the bioperl parser for this.
chris
On Jun 5, 2007, at 10:28 AM, Chris Fields wrote:
> Martin,
>
> The example file you give in the bioperl bugzilla report has several
> blank annotation lines which may lead to additional problems. When
> the BioPerl SeqIO parser finds annotation fields (SOURCE, ORGANISM,
> DEFINITION, etc) then it expects there will also be relevant data
> (text descriptions) accompanying it; I assume the BioPython parser
> expects likewise though I may be wrong.
>
> AFAIK the inclusion of field names w/o text isn't GenBank/EMBL-
> compliant. GenBank records lacking text either have a '.' instead or
> are left out entirely:
>
> http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
>
> We could add a fix but you should probably contact the ApE developers
> and request that field names w/o text be left out or have '.' added.
>
> chris
>
> On Jun 5, 2007, at 9:04 AM, Martin MOKREJŠ wrote:
>
>> Ezequiel Panepucci wrote:
>>>> genbank entry = parser.parse(fhandle)
>>>
>>> there is a space character between "genbank" and "entry".
>>> It is a syntax error.
>>> I suppose you meant "genbank_entry" ?
>>
>> Yes, the next command was right and has shown the error. Sorry, I
>> forgot
>> to delete the first attempt. ;-)
>>
>>>>> genbank_entry = parser.parse(fhandle)
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in ?
>> File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py",
>> line 187, in parse
>> self._scanner.feed(handle, self._consumer)
>> File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py",
>> line 360, in feed
>> self._feed_first_line(consumer, self.line)
>> File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py",
>> line 835, in _feed_first_line
>> assert False, \
>> AssertionError: Did not recognise the LOCUS line layout:
>> LOCUS 6499 bp ds-DNA linear 02-AUG-2006
>>
>>>>>
>>
>> Martin
>> _______________________________________________
>> BioPython mailing list - BioPython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Biopython
mailing list