[BioPython] Cannot parse/convert embl formatted files
Peter (BioPython List)
biopython at maubp.freeserve.co.uk
Thu Aug 17 14:41:54 UTC 2006
I've added a comment to the bug too:
http://bugzilla.open-bio.org/show_bug.cgi?id=2076
Martin MOKREJŠ wrote:
> No, the missing closing quotes should be added. Or better to say,
> the parser should terminate previous feature when it reaches beginning
> of the next feature. I wish this is feasible.
Missing closing quotes is a tricky issue. I have seen valid files with
text like /word= inside a quoted entry.
> I think the recipe in
> http://biopython.org/DIST/docs/cookbook/genbank_to_fasta.html chokes on those
> unterminated lines.
The FormatIO system itself is very fragile with "broken" input files.
It also doesn't work very well with large files. We (the BioPython
developers) have been talking about replacing it in a future release.
> Please add the missing import line to the above document. I have cleaned up
> my Trash so you have to get it from biopython archives from the very first
> message I think. ;)
Found it, you pointed out that in addition to this line:
from Bio import formats
we also need:
from Bio.FormatIO import FormatIO
> Sorry for the confusion. It took me a while to re-create the broken files
> and figure out all the steps again.
> Martin
Thanks Martin.
Have you been in touch with the Italian group to ask them if they can
include the closing quotes in the EMBL files?
Peter
More information about the Biopython
mailing list