[Biojava-l] biojavax GenbankFormat and legacy genbank records

Bubba Puryear bubba.puryear at gmail.com
Mon Mar 13 17:27:11 UTC 2006


 Hello,

  I work on a webapp for a biotech company that uses biojava to parse
plasmid and feature maps (genbank flatfile format)  and we store them in a
local database. I've wanted to update the version of biojava we use because
the current CVS parser handles features that cross the origin on plasmid
maps much better than the parser in 1.4.

  However, we have a lot of data in various databases that have genbank
records formatted in some of the older incarnations of the GFF. In
particular, some feature maps don't have ACCESSION fields, and/or are
missing modification dates and genbank divisions on the LOCUS line. When I
try to parse one of those maps with biojavax, I get parse errors.

  Should there perhaps be a LegacyGenbankFormat or should the GenbankFormat
class be made more tolerant? I know NCBI made several changes to their
flatfile format in part  because writing parsers for the older specs was
tricky. So I'm not sure which direction the bio* folks would like to go with
this.

  I've attached a small example map that causes parse problems. The data in
the map is completely bogus, but the structure was taken from a real map
file I have to deal with.

  The following code snippet illustrates my problems:

        BufferedReader br = new BufferedReader(new
StringReader(genbankContent));
        try {
            RichSequenceIterator sequences = IOTools.readGenbankDNA(br,
null);
            if (sequences.hasNext()) {
                    this.sequence = sequences.nextRichSequence();
             }
        } catch (Exception e) {
            e.printStackTrace();
        }


  where genbankContent is a String containing the contents of the attached
file.

Thanks much,
Bubba Puryear
-------------- next part --------------
A non-text attachment was scrubbed...
Name: foo.gb
Type: chemical/seq-na-genbank
Size: 1091 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20060313/b56af1d0/attachment-0002.bin>


More information about the Biojava-l mailing list