[Biojava-l] Ensembl gene parsing

Ewan Birney birney at ebi.ac.uk
Wed Jan 29 09:08:22 EST 2003


On Wed, 29 Jan 2003, Stein Aerts wrote:

> Hi,
> 
> When currently parsing an exported sequence of an Ensembl mouse gene 
> (using the Export Data function at www.ensembl.org) there appear to be 3 
> problems:
> I tried to attach an example of an exported sequence of the Igf1 gene 
> but then the message was bounced because of a suspicious header...
> 
> 1. Some of the exon locations start with .0:
> I think this is a bug of the EMBL formatting at Ensembl?

Yes, this is pretty certainly a fault our end, and I think I know where 
this is.

> 
> FT   exon            .0:44020..44364
> FT                   /exon_id="ENSMUSE00000233709"
> FT                   /start_phase=0
> FT                   /end_phase=0
> 
> 
> 
> 2. The first annotation of a CDS feature is written on the next line 
> after CDS. This is not found by the EMBL parser.
> I think that is is also a bug at Ensembl?
> 

This is probably a line-length issue. I wonder what the right thing to do 
here is... Hmmm

> FT   CDS             
> FT                   /gene="ENSMUSG00000020053"
> 
> 
> 
> 3. Some of the lines cannot be parsed, for example the parser writes to 
> System.out: "This line could not be parsed: exon            2001..2159"
> This one I don't understand, I cannot see a problem for these features?
> 
> FT   exon            2001..2159
> FT                   /exon_id="ENSMUSE00000248454"
> FT                   /start_phase=0
> FT                   /end_phase=0
> 
> 
> 
> Thank you in advance!
> 

Stein - have you tried Mart inside Ensembl? For most people, this is far 
easier way to get bulk downloads of stuff in very-easy-to-parse-format.


http://www.ensembl.org/Homo_sapiens/martview


choose feature list and/or gene structure when you get to output.



The Ensembl bugs should be fixed of course... ;)



> Stein.
> 
> -- 
> Stein Aerts BioI at SISTA
> K.U.Leuven ESAT-SCD Belgium
> http://www.esat.kuleuven.ac.be/~dna/BioI
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney at ebi.ac.uk>. 
-----------------------------------------------------------------



More information about the Biojava-l mailing list