[BioPython] Cannot parse/convert embl formatted files

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Thu Aug 17 10:48:12 UTC 2006


Hi Peter,
  sorry for the delay in my answer. Yes, I have realized later that the
file format is fixed when the parser choke that at some position there is
no space but some word character instead. :(

  I have edited the files to contain just "DNA   " or "RNA   " while
the number of spaces afterwards was as necessary. ;-)
martin


Peter wrote:
> Martin MOKREJŠ wrote:
> 
>> Finally, the LOCUS lines had unexpected values like pre-RNA, genomic DNA,
>> unassigned DNA, etc. I imagine those are some remnants from the EMBL data
>> and such value never exist in original GenBank ... you're the judge here.
> 
> 
> I've had a look at bug 2072 and for that example it looks like the
> BioPerl converter tried to squeeze  "genomic DNA" into what I thought
> was a seven character field (or eight if you allow it to steal the
> following space).  The extra characters seem to have pushed the later
> fields of "linear", division "FUN" and date out of position.
> 
> How is your Perl?  You could try:
> 
> (a) Editing the BioPerl conversion script to make a few substitutions
> to the sequence type like "genomic DNA" or "unassigned DNA" to just
> "DNA"
> 
> Or,
> 
> (b) Editing the input EMBL file to make the same change in the ID line
> at the start of each record.
> 
> Peter
> 
> 

-- 
Dr. Martin Mokrejs
Faculty of Science, Charles University
Vinicna 5, 128 43 Prague, Czech Republic
http://www.iresite.org
http://www.iresite.org/~mmokrejs



More information about the Biopython mailing list