[Biojava-dev] EmblFileFormer

Lorna Morris lmorris at ebi.ac.uk
Tue Sep 2 11:56:25 EDT 2003


Hi

I'm using biojava to parse an EMBLFlatFile, add extra annotation, and 
dump the new file out at the end. The parser seems to be really useful 
for this. However the file created using SeqIOTools.writeEmbl contains 
errors, the lines RN, RP, RX, RA, RT, RL aren't correctly nested, these 
lines should occur in repeated sets, but the ouput has all the RN lines, 
followed by all the RP lines etc, so they are merged rather than nested.

I've looked at the javadoc for the class EmblFileFormer and there is a 
comment that might relate to this problem:

 * <p><code>EmblFileFormer</code> performs the detailed formatting of
 * EMBL entries for writing to a <code>PrintStream</code>. Currently
 * the formatting of the header is not correct. This really needs to
 * be addressed in the parser which is merging fields which should
 * remain separate.</p>

I've tried to address the problem by modifying the class, 
SeqIOEventEmitter, but have run into difficulties, because I cannot 
untangle which RN, RP, RX, RA, RT, RL 'belong' together in a single 
block, as the annotation is just in an ArrayList. Maybe I should take 
note of the javadoc comment above and address the problem in the parser. 
Is so could you give me some pointers on which classes I should focus 
on, in order to fix this, and whether you think it will be a difficult 
problem to solve.

Hope this makes sense.

Many thanks,

Lorna


-------------------------------------------------------------------
Lorna Morris
EMBL-European Bioinformatics Institute            Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus, Hinxton           Fax: +44-(0)1223-494468 
Cambridge
CB10 1SD, UK                                                        
email:lmorris at ebi.ac.uk




More information about the biojava-dev mailing list