[Biojava-dev] GenbankFormat (biojavax) and comments with leading whitespace

Bubba Puryear bubba.puryear at gmail.com
Fri Sep 29 16:22:03 UTC 2006


Hey all,

  I've been using biojava for some time now on my project for reading
genbank flat files, but until reacently I haven't been writing any.
Our client makes extensive use of VectorNTI (version 9, I think) and I
was doing some edits to genbank files (via biojavax) and notice that
comment values get their whitespace trimmed.

  Turns out VNTI splats a load of state that it needs in the comment
section is a fairly lispish looking syntax... but indentation appears
to be important. In particular, VNTI won't read the files I've edited
that have had their whitespace munged. I have some local changes to
the parser that preserve leading/trailing whitespace for section
values for top level sections.

  I've run the tests locally (and added one for testing indented
comments) and run this against ~ 3000 files I have locally. I wanted
to get some feedback on this before I committed, though.

  As an example of the kind of thing that currently gets munged:

COMMENT     Vector_NTI_Display_Data_(Do_Not_Edit!)
COMMENT     (SXF
COMMENT      (CGexDoc "11460" 0 6359
COMMENT       (CDBMol 0 0 1 1 1 0 0 1633772385 0 "" "" 0 0 0 0
(CObList) (CObList)
COMMENT        (CObList) (CObList) -1)
COMMENT       (CDocSetData 1 0 0 0 0 0 "MAIN" 1 1 1 1 0 0 1 1 0 1 10 5
40 50 0 1 0
....

   The level of indentation can get quite deep.

Thanks,
Bubba



More information about the biojava-dev mailing list