[BioPython] Cannot parse/convert embl formatted files
Martin MOKREJŠ
mmokrejs at ribosome.natur.cuni.cz
Thu Aug 17 10:48:12 UTC 2006
Hi Peter,
sorry for the delay in my answer. Yes, I have realized later that the
file format is fixed when the parser choke that at some position there is
no space but some word character instead. :(
I have edited the files to contain just "DNA " or "RNA " while
the number of spaces afterwards was as necessary. ;-)
martin
Peter wrote:
> Martin MOKREJŠ wrote:
>
>> Finally, the LOCUS lines had unexpected values like pre-RNA, genomic DNA,
>> unassigned DNA, etc. I imagine those are some remnants from the EMBL data
>> and such value never exist in original GenBank ... you're the judge here.
>
>
> I've had a look at bug 2072 and for that example it looks like the
> BioPerl converter tried to squeeze "genomic DNA" into what I thought
> was a seven character field (or eight if you allow it to steal the
> following space). The extra characters seem to have pushed the later
> fields of "linear", division "FUN" and date out of position.
>
> How is your Perl? You could try:
>
> (a) Editing the BioPerl conversion script to make a few substitutions
> to the sequence type like "genomic DNA" or "unassigned DNA" to just
> "DNA"
>
> Or,
>
> (b) Editing the input EMBL file to make the same change in the ID line
> at the start of each record.
>
> Peter
>
>
--
Dr. Martin Mokrejs
Faculty of Science, Charles University
Vinicna 5, 128 43 Prague, Czech Republic
http://www.iresite.org
http://www.iresite.org/~mmokrejs
More information about the Biopython
mailing list