[Biojava-l] GenBank parser question
Matthew Pocock
mrp@sanger.ac.uk
Fri, 20 Oct 2000 09:51:17 +0100
Hi Scott,
Thanks for your interest in BioJava. The problem with these REFERENCEs is a
fatal flaw in how both EMBL and Genbank parsers handle the header
information. They should be stooring a list of reference objects in the
annotation bundle, rather than stooring each reference directly under the
same key. This should be easy for you to fix - the code is open source, so
you can make any changes to it that you want to. If you do not have time to
sort it today, then I will try to make the fix over the weekend.
All the best,
Matthew
Scott Markel wrote:
> Here at NetGenics we've been following the BioJava effort since first
> joining the mailing list in September 1999. We like what we see and we
> want to evaluate the BioJava code by using it on a real project. So
> we've started a pilot project that will use BioJava.
>
> One of the first things we need to do is process GenBank entries. The
> first GenBank file we tried has two REFERENCEs, as does the online
> example in section 7.1.2. See
>
> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
>
> When we printed out the newly constructed BioJava sequence we noticed
> that only the second REFERENCE printed. We poked around the code for
> GenBankFormat and Annotation. Since Annotation is just a wrapper for a
> Map, only the last REFERENCE added will be kept. Clearly this isn't
> what we want.
>
> Obviously we can write our own GenBank parser and Annotation class
> implementing the appropriate interfaces, but, first, we like to know if
> we've missed something.
>
> Thanks.
>
> Scott & Jeff
>
> --
> Scott Markel, Ph.D. NetGenics, Inc.
> smarkel@netgenics.com 4350 Executive Drive
> Tel: 858 455 5223 Suite 260
> FAX: 858 455 1388 San Diego, CA 92121
> _______________________________________________
> Biojava-l mailing list - Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l