[BioPython] Cannot parse ApE plasmid editor GenBank file
Peter
biopython at maubp.freeserve.co.uk
Tue Jun 5 21:11:36 UTC 2007
Chris Fields wrote:
> Note that the presence of the locus name appears to be required
> according to the GenBank release notes. There is no optional
> designation for the LOCUS line (it is mandatory as stated in sec.
> 3.4.2), and the locus name appears in the line for all records (sec.
> 3.5.4).
I agree that valid GenBank files should indeed have a locus name in the
LOCUS line. If it doesn't cause too many issues, then maybe we should
allow such files as input.
Having just gone over the Biopython code, if the locus name is missing
but there is nothing else wrong with the LOCUS line, Biopython will give
a slightly cryptic AssertionError, "Cannot parse the name and length in
the LOCUS line"
I could make the parser cope with missing locus names, but on
reflection, that may just cause worse problems further downstream (e.g.
trying to index the file). One option is to auto-generate an identifier...
Lets wait and see what Wayne's new version of ApE plasmid editor outputs
for "GenBank format" - maybe he will include some sort of locus name.
Peter
More information about the Biopython
mailing list