[Biojava-dev] EmblFileFormer

Matthew Pocock matthew_pocock at yahoo.co.uk
Tue Sep 2 12:22:57 EDT 2003


Hi Lorna,

Yes - the fault goes back to the embl parser, not the
writer. The parser should be keeping track of RN RP,
etc. lines, and whenever a complete set goes through,
it should be spitting out a single annotation event
(perhaps called REFERENCE?) with all the data from a
single block in it. This would then be sensibly put
into a list, with one element for each reference
block.

The file former would then need to be modified to
unpack the REFERENCE list, but this would not be a big
deal.

If you are keen to do this, then we can talk you
through it, either here or on chat (irc.freenode.net,
#biojava).

Matthew

 --- Lorna Morris <lmorris at ebi.ac.uk> wrote: 
> Hi
> 
> I'm using biojava to parse an EMBLFlatFile, add
> extra annotation, and 
> dump the new file out at the end. The parser seems
> to be really useful 
> for this. However the file created using
> SeqIOTools.writeEmbl contains 
> errors, the lines RN, RP, RX, RA, RT, RL aren't
> correctly nested, these 
> lines should occur in repeated sets, but the ouput
> has all the RN lines, 
> followed by all the RP lines etc, so they are merged
> rather than nested.
> 
> I've looked at the javadoc for the class
> EmblFileFormer and there is a 
> comment that might relate to this problem:
> 
>  * <p><code>EmblFileFormer</code> performs the
> detailed formatting of
>  * EMBL entries for writing to a
> <code>PrintStream</code>. Currently
>  * the formatting of the header is not correct. This
> really needs to
>  * be addressed in the parser which is merging
> fields which should
>  * remain separate.</p>
> 
> I've tried to address the problem by modifying the
> class, 
> SeqIOEventEmitter, but have run into difficulties,
> because I cannot 
> untangle which RN, RP, RX, RA, RT, RL 'belong'
> together in a single 
> block, as the annotation is just in an ArrayList.
> Maybe I should take 
> note of the javadoc comment above and address the
> problem in the parser. 
> Is so could you give me some pointers on which
> classes I should focus 
> on, in order to fix this, and whether you think it
> will be a difficult 
> problem to solve.
> 
> Hope this makes sense.
> 
> Many thanks,
> 
> Lorna
> 
> 
>
-------------------------------------------------------------------
> Lorna Morris
> EMBL-European Bioinformatics Institute           
> Tel: +44-(0)1223-492507
> Wellcome Trust Genome Campus, Hinxton           Fax:
> +44-(0)1223-494468 
> Cambridge
> CB10 1SD, UK                                        
>                
> email:lmorris at ebi.ac.uk
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev 

________________________________________________________________________
Want to chat instantly with your online friends?  Get the FREE Yahoo!
Messenger http://uk.messenger.yahoo.com/


More information about the biojava-dev mailing list