[Biojava-l] BioJava 1.1x freeze plans

Thomas Down td2@sanger.ac.uk
Tue, 30 Jan 2001 20:47:46 +0000


On Tue, Jan 30, 2001 at 12:33:45PM -0500, Cox, Greg wrote:
> Thomas,
> 	I'm in the midst of revising GenbankFormat to handle the new IO
> style, and I'd like to see that in the 1.1 release.  Right now I'm having
> trouble with features, and hope someone can help.  FeatureTableParser's
> documentation says it is shared between EMBL and GENBANK format, but EMBL is
> hard coded as the type source.  Is there any existing GENBANK feature
> information?  I'm leary about changing FeatureTableParser myself since I
> don't know what will break.

Great -- I think a lot of people will be happy to see
that working again...  There shouldn't be any problem
taking this change before 1.1, so long as there's a working
demo program to check that it's all working as expected.

Yes, the FeatureTableParser was shared between Embl and
Genbank in BioJava 1.0.  Unless you know of some subtle
differences between Embl and Genbank feature tables (I've
always thought they were the same, but that said I've very
rarely worked with Genbank files myself), it ought to be
possible to do a new GenbankParser which also uses
FeatureTableParser.

I presume you've seen the lifecycle:

  - create a FeatureTableParser, pointing it at the appropriate
    SeqIOListener.

  - When you see the start of a feature, call startFeature
    passing in  the feature type.

  - Pass each line of feature table, with the start trimmed
    off, to featureData

  - Flush the feature with endFeature().

  - You should then get a feature notified to the SeqIOListener.

The code isn't as elegant as it could be, mainly because I made
the minimum set of changes necessary to make it all work int
the `new IO' world.

As to the `source' issue, that being hard-coded is a mistake.
Feel free to offer a way to change this, either by using a
setFeatureSource(String) method, or adding an extra parameter
to the constructor.  Right now, FeatureTableParsers are only
constructed by  EmblProcessor -- so long as you keep that in
sync with any changes you make.

If you do change anything else, the gff.EmblToGFFFasta demo
is a good test to make sure everything is still working.  You
might want to hack up an equivalent GenbankToGFFFasta to
test your work.

Thanks,

    Thomas.