[Biojava-dev] biojava EMBL parser

Lorna Morris lmorris at ebi.ac.uk
Mon Jan 26 05:29:46 EST 2004


Hi,

The EMBL group are shortly to be releasing a new version of EMBL flat 
files for new genomes,
known as the Genome Reviews project. These alternative versions of EMBL 
flat files contain new annotaion and I have been using the biojava 
parser to help me produce these files. Last year I had to make some 
modifications to the biojava EMBLparser because there were some errors 
in parsing standard EMBL flat files, and these were committed into the 
biojava-live branch, in December 2003. The Genome Reviews flat files are 
in standard EMBL flat file format but contain a few additional 
specifications and retrictions on format, including the introduction of 
evidence tags. The changes I've introduced to biojava maintain a sort 
order for feature qualifiers within a feature and ensure the evidence 
tags (which are in curly brackets after each feature qualifier) are on 
the outside of quoted strings surrounding the feature qualifier data.
 
We hope to publish the first release of these files mid February, and 
would like to announce that these files are parsable by the Biojava EMBL 
parser. The current biojava parser (SeqIOTools.readEMBL, 
SeqIOTools.writeEMBL) can read and write these new files, but the files 
produced with SeqIO.writeEMBL have some formatting errors. I've 
introduced some new entry points
SeqIO.writeEMBLGR and SeqIO.readEMBLGR and written some additional 
classes and subclasess in the org.biojava.bio.seq.io package.

I will post the files to the mailing list later today or put them up on 
a website,

Many thanks,

Lorna



More information about the biojava-dev mailing list