[Biojava-l] EMBL parsing problems

saerts saerts at mailserv.esat.kuleuven.ac.be
Tue Jan 28 17:27:25 EST 2003


Hi,

When currently parsing an exported sequence of an Ensembl mouse gene (using the
Export Data function at www.ensembl.org) there appear to be 3 problems:
(I have attached the exported sequence with gene features for Igf1)

1. Some of the exon locations start with .0:
I think this is a bug of the EMBL formatting at Ensembl?

2. The first annotation of a CDS feature is written on the next line after CDS.
This is not found by the EMBL parser.
I think that is is also a bug at Ensembl?

3. Some of the lines cannot be parsed, for example the parser writes to
System.out: "This line could not be parsed: exon            2001..2159"
This one I don't understand, I cannot see a problem for these features?

Thank you in advance!

Stein.


#########
#output when parsing the attached .embl file
################
>From must be less than To: exon            .0:44020..44591
>From must be less than To: exon            .0:44020..44364
>From must be less than To: exon            .0:44020..44364
This line could not be parsed: exon            complement(.0:13156..13348)
>From must be less than To: exon            .0:46248..46337
This line could not be parsed: exon            complement(245..653)
This line could not be parsed: exon            2001..2159
This line could not be parsed: exon            2003..2159
This line could not be parsed: exon            2003..2159
This line could not be parsed: exon            2003..2159
This line could not be parsed: exon            50907..51088
This line could not be parsed: exon            50907..51088
This line could not be parsed: exon            50907..51088
This line could not be parsed: exon            50907..51088
This line could not be parsed: exon            52586..52637
This line could not be parsed: exon            52586..52637
This line could not be parsed: exon            68128..69089
This line could not be parsed: exon            68128..69089
This line could not be parsed: exon            68128..69254
This line could not be parsed: exon            68132..69089
-------------- next part --------------
z'µìmjÛZržžÜ²Ç+¹¶ÞtÖ¦z׬¶j.±ç¦nTò¥æ©¦XœjبŸú
µël¶·š™(³{^5Û¾[Ê׬
ë
ž‹Z½¨¥i¹^R¹¦*^®f¢—ö­µ§!™éí³ý´Ó}5ÛÏÝk§xç~ÿµë-š‹¬z†ã


More information about the Biojava-l mailing list