[Biojava-dev] EmblFileFormer
Lorna Morris
lmorris at ebi.ac.uk
Tue Sep 2 11:56:25 EDT 2003
Hi
I'm using biojava to parse an EMBLFlatFile, add extra annotation, and
dump the new file out at the end. The parser seems to be really useful
for this. However the file created using SeqIOTools.writeEmbl contains
errors, the lines RN, RP, RX, RA, RT, RL aren't correctly nested, these
lines should occur in repeated sets, but the ouput has all the RN lines,
followed by all the RP lines etc, so they are merged rather than nested.
I've looked at the javadoc for the class EmblFileFormer and there is a
comment that might relate to this problem:
* <p><code>EmblFileFormer</code> performs the detailed formatting of
* EMBL entries for writing to a <code>PrintStream</code>. Currently
* the formatting of the header is not correct. This really needs to
* be addressed in the parser which is merging fields which should
* remain separate.</p>
I've tried to address the problem by modifying the class,
SeqIOEventEmitter, but have run into difficulties, because I cannot
untangle which RN, RP, RX, RA, RT, RL 'belong' together in a single
block, as the annotation is just in an ArrayList. Maybe I should take
note of the javadoc comment above and address the problem in the parser.
Is so could you give me some pointers on which classes I should focus
on, in order to fix this, and whether you think it will be a difficult
problem to solve.
Hope this makes sense.
Many thanks,
Lorna
-------------------------------------------------------------------
Lorna Morris
EMBL-European Bioinformatics Institute Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus, Hinxton Fax: +44-(0)1223-494468
Cambridge
CB10 1SD, UK
email:lmorris at ebi.ac.uk
More information about the biojava-dev
mailing list