[Biojava-dev] GenbankFormat (biojavax) and comments with leading whitespace

Bubba Puryear bubba.puryear at gmail.com
Mon Oct 2 14:44:11 UTC 2006


Seems I was a bit hasty in my assertions. Turns out I don't need to
muck with the comment blocks (at least not in biojava), but I did need
to fix the bug in SimpleRichFeature's compareTo method that wasn't
letting it play nicely with TreeSet. (I checked in tests, too)

I'll yell if I run into anything else. Thanks,
Bubba


On 9/30/06, Mark Schreiber <markjschreiber at gmail.com> wrote:
> I think this should be fine to commit as long as biojava can still
> read in the file again (and other files).
>
> You should probably also comment the code to say VNTI needs this and
> to be doubly certain put in a unit test.
>
> - Mark
>
> On 9/30/06, Bubba Puryear <bubba.puryear at gmail.com> wrote:
> > Hey all,
> >
> >   I've been using biojava for some time now on my project for reading
> > genbank flat files, but until reacently I haven't been writing any.
> > Our client makes extensive use of VectorNTI (version 9, I think) and I
> > was doing some edits to genbank files (via biojavax) and notice that
> > comment values get their whitespace trimmed.
> >
> >   Turns out VNTI splats a load of state that it needs in the comment
> > section is a fairly lispish looking syntax... but indentation appears
> > to be important. In particular, VNTI won't read the files I've edited
> > that have had their whitespace munged. I have some local changes to
> > the parser that preserve leading/trailing whitespace for section
> > values for top level sections.
> >
> >   I've run the tests locally (and added one for testing indented
> > comments) and run this against ~ 3000 files I have locally. I wanted
> > to get some feedback on this before I committed, though.
> >
> >   As an example of the kind of thing that currently gets munged:
> >
> > COMMENT     Vector_NTI_Display_Data_(Do_Not_Edit!)
> > COMMENT     (SXF
> > COMMENT      (CGexDoc "11460" 0 6359
> > COMMENT       (CDBMol 0 0 1 1 1 0 0 1633772385 0 "" "" 0 0 0 0
> > (CObList) (CObList)
> > COMMENT        (CObList) (CObList) -1)
> > COMMENT       (CDocSetData 1 0 0 0 0 0 "MAIN" 1 1 1 1 0 0 1 1 0 1 10 5
> > 40 50 0 1 0
> > ....
> >
> >    The level of indentation can get quite deep.
> >
> > Thanks,
> > Bubba
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>



More information about the biojava-dev mailing list