[Biojava-l] GenBank parser question

Matthew Pocock mrp@sanger.ac.uk
Fri, 20 Oct 2000 09:51:17 +0100

Hi Scott,

Thanks for your interest in BioJava. The problem with these REFERENCEs is a
fatal flaw in how both EMBL and Genbank parsers handle the header
information. They should be stooring a list of reference objects in the
annotation bundle, rather than stooring each reference directly under the
same key. This should be easy for you to fix - the code is open source, so
you can make any changes to it that you want to. If you do not have time to
sort it today, then I will try to make the fix over the weekend.

All the best,


Scott Markel wrote:

> Here at NetGenics we've been following the BioJava effort since first
> joining the mailing list in September 1999.  We like what we see and we
> want to evaluate the BioJava code by using it on a real project.  So
> we've started a pilot project that will use BioJava.
> One of the first things we need to do is process GenBank entries.  The
> first GenBank file we tried has two REFERENCEs, as does the online
> example in section 7.1.2.  See
> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
> When we printed out the newly constructed BioJava sequence we noticed
> that only the second REFERENCE printed.  We poked around the code for
> GenBankFormat and Annotation.  Since Annotation is just a wrapper for a
> Map, only the last REFERENCE added will be kept.  Clearly this isn't
> what we want.
> Obviously we can write our own GenBank parser and Annotation class
> implementing the appropriate interfaces, but, first, we like to know if
> we've missed something.
> Thanks.
> Scott & Jeff
> --
> Scott Markel, Ph.D.       NetGenics, Inc.
> smarkel@netgenics.com     4350 Executive Drive
> Tel: 858 455 5223         Suite 260
> FAX: 858 455 1388         San Diego, CA  92121
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l