[Biojava-l] Ensembl gene parsing
Ewan Birney
birney at ebi.ac.uk
Wed Jan 29 09:08:22 EST 2003
On Wed, 29 Jan 2003, Stein Aerts wrote:
> Hi,
>
> When currently parsing an exported sequence of an Ensembl mouse gene
> (using the Export Data function at www.ensembl.org) there appear to be 3
> problems:
> I tried to attach an example of an exported sequence of the Igf1 gene
> but then the message was bounced because of a suspicious header...
>
> 1. Some of the exon locations start with .0:
> I think this is a bug of the EMBL formatting at Ensembl?
Yes, this is pretty certainly a fault our end, and I think I know where
this is.
>
> FT exon .0:44020..44364
> FT /exon_id="ENSMUSE00000233709"
> FT /start_phase=0
> FT /end_phase=0
>
>
>
> 2. The first annotation of a CDS feature is written on the next line
> after CDS. This is not found by the EMBL parser.
> I think that is is also a bug at Ensembl?
>
This is probably a line-length issue. I wonder what the right thing to do
here is... Hmmm
> FT CDS
> FT /gene="ENSMUSG00000020053"
>
>
>
> 3. Some of the lines cannot be parsed, for example the parser writes to
> System.out: "This line could not be parsed: exon 2001..2159"
> This one I don't understand, I cannot see a problem for these features?
>
> FT exon 2001..2159
> FT /exon_id="ENSMUSE00000248454"
> FT /start_phase=0
> FT /end_phase=0
>
>
>
> Thank you in advance!
>
Stein - have you tried Mart inside Ensembl? For most people, this is far
easier way to get bulk downloads of stuff in very-easy-to-parse-format.
http://www.ensembl.org/Homo_sapiens/martview
choose feature list and/or gene structure when you get to output.
The Ensembl bugs should be fixed of course... ;)
> Stein.
>
> --
> Stein Aerts BioI at SISTA
> K.U.Leuven ESAT-SCD Belgium
> http://www.esat.kuleuven.ac.be/~dna/BioI
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney at ebi.ac.uk>.
-----------------------------------------------------------------
More information about the Biojava-l
mailing list