[BioPython] Cannot parse ApE plasmid editor GenBank file
Chris Fields
cjfields at uiuc.edu
Tue Jun 5 21:46:07 UTC 2007
On Jun 5, 2007, at 4:11 PM, Peter wrote:
> Chris Fields wrote:
>> Note that the presence of the locus name appears to be required
>> according to the GenBank release notes. There is no optional
>> designation for the LOCUS line (it is mandatory as stated in sec.
>> 3.4.2), and the locus name appears in the line for all records
>> (sec. 3.5.4).
>
> I agree that valid GenBank files should indeed have a locus name in
> the LOCUS line. If it doesn't cause too many issues, then maybe we
> should allow such files as input.
>
> Having just gone over the Biopython code, if the locus name is
> missing but there is nothing else wrong with the LOCUS line,
> Biopython will give a slightly cryptic AssertionError, "Cannot
> parse the name and length in the LOCUS line"
>
> I could make the parser cope with missing locus names, but on
> reflection, that may just cause worse problems further downstream
> (e.g. trying to index the file). One option is to auto-generate an
> identifier...
>
> Lets wait and see what Wayne's new version of ApE plasmid editor
> outputs for "GenBank format" - maybe he will include some sort of
> locus name.
>
> Peter
In BioPerl you can optionally pass in a custom generator
(specifically a code reference) to generate the LOCUS, ACCESSION,
VERSION, and KEYWORD lines if needed. You might be able to do
something similar for your parser, though I'm not yet familiar with
Python enough to work out how...
chris
More information about the Biopython
mailing list