[Biojava-l] Parsing EMBL files from Ensembl

Stein Aerts stein.aerts@esat.kuleuven.ac.be
Thu, 21 Nov 2002 09:23:58 +0100


Since today, apparantly something changed on the "export data" function 
of Ensembl. When retrieving a gene based on its ensembl id, e.g. 
ENSG00000110092 with 2000 bp on either side, and requesting only gene 
features, then until yesterday, the resulting EMBL formatted file had 
ID= ENSG00000110092 but now it has ID :

ID   Chromosome 11 71948701 to 71966070  ENSEMBL; DNA; HUM; 17370 BP.

More importantly the parser in BioJava with EMBL format cannot parse it, 
it complains by telling me

This line could not be parsed: CDS             
join(-1151..-840,1654..1777,1995..2434)
This line could not be parsed:                 12014..12175)
This line could not be parsed: CDS             
join(3927..4460,4728..4887,8890..9038,12014..12178)
This line could not be parsed: exon            -1151..-840
This line could not be parsed: exon            1654..1777

and so on.

Would there be anyone who could help me out on this?

Thanks a lot,
Stein Aerts.