[Biojava-l] [biojavax] EMBL parser : features parsing

Morgane THOMAS-CHOLLIER mthomasc at vub.ac.be
Wed Apr 12 08:34:43 UTC 2006


Hello again,

I am currently using biojavax to parse EMBL files exported from Ensembl 
website.

Compared to the EBI files I have, they show a difference in the Features 
lines :

sometimes, only one "/word" is present. ie:

EBI file :

FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"

Ensembl file;

FT   gene         complement(1..3218)
FT                   /gene="ENSMUSG00000038227"

The problem I encounter is that the parser correctly convert the "/word" 
into a Note, but the Note is then in relation with the immediate 
following feature (ie: mRNA).
The current gene feature thus has no annotation.

This behavior is reproducible when removing one "/word" of an EBI file.

Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a 
feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up 
with an incomplete Note, as the parser seems to split on "=" to separate 
the Key and the Value.

Thanks for your help,

Morgane.

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium






More information about the Biojava-l mailing list