[Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]

Jolyon Holdstock jolyon.holdstock at ogt.co.uk
Thu Apr 13 16:42:36 UTC 2006

Hi Morgane,

I have amended the EmblFormat readSection method as below and the
parsing seems to work; please test it.

I think that the last bit of annotation is carried over into the next
feature so before adding the new feature I dump the annotation and reset
currentTag and currentVal.

if (!line.startsWith(" ")) {
//--------- new code starts ---------------------------
  if (currentTag!=null) {
    section.add(new String[]{currentTag,currentVal.toString()});
    currentTag = null;
    currentVal = null;
//--------- new code ends -----------------------------
// case 1 : word value - splits into key-value on its own



-----Original Message-----
From: biojava-l-bounces at lists.open-bio.org
[mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
Sent: 12 April 2006 09:35
To: biojava-l at open-bio.org
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]

Hello again,

I am currently using biojavax to parse EMBL files exported from Ensembl 

Compared to the EBI files I have, they show a difference in the Features

lines :

sometimes, only one "/word" is present. ie:

EBI file :

FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"

Ensembl file;

FT   gene         complement(1..3218)
FT                   /gene="ENSMUSG00000038227"

The problem I encounter is that the parser correctly convert the "/word"

into a Note, but the Note is then in relation with the immediate 
following feature (ie: mRNA).
The current gene feature thus has no annotation.

This behavior is reproducible when removing one "/word" of an EBI file.

Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a

feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up 
with an incomplete Note, as the parser seems to split on "=" to separate

the Key and the Value.

Thanks for your help,



Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels

Biojava-l mailing list  -  Biojava-l at lists.open-bio.org

This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.

More information about the Biojava-l mailing list