[Biojava-l] GenBank parser question

Scott Markel smarkel@netgenics.com
Thu, 19 Oct 2000 12:46:05 -0700


Here at NetGenics we've been following the BioJava effort since first
joining the mailing list in September 1999.  We like what we see and we
want to evaluate the BioJava code by using it on a real project.  So
we've started a pilot project that will use BioJava.

One of the first things we need to do is process GenBank entries.  The
first GenBank file we tried has two REFERENCEs, as does the online
example in section 7.1.2.  See

http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

When we printed out the newly constructed BioJava sequence we noticed
that only the second REFERENCE printed.  We poked around the code for
GenBankFormat and Annotation.  Since Annotation is just a wrapper for a
Map, only the last REFERENCE added will be kept.  Clearly this isn't
what we want.

Obviously we can write our own GenBank parser and Annotation class
implementing the appropriate interfaces, but, first, we like to know if
we've missed something.

Thanks.

Scott & Jeff

-- 
Scott Markel, Ph.D.       NetGenics, Inc.
smarkel@netgenics.com     4350 Executive Drive
Tel: 858 455 5223         Suite 260
FAX: 858 455 1388         San Diego, CA  92121