[Biojava-l] Revised Parser

Forsch, Dan dorsch@netgenics.com
Thu, 8 Feb 2001 21:13:05 -0500

I believe Greg and Thomas may be discussing slightly different issues.  Greg
was proposing changing the parsing logic to deal with unrecognized tokens
more gracefully than simply throwing a ParseException.  Because that code
isn't re-entrant an entire db entry containing one of the unsupported forms
such as remote locations could not be processed by the parsers.  

Thomas' suggestion to have an 'acceptRemoteFeatures' flag seems to me to
make sense once accepting them can be suported.  I agree with allowing users
to choose the behaviour they want, but until the types can be modeled with
the BioJava interfaces I'm not sure what choice there can be.  Maybe I
misunderstood.  Was the thinking that a true value for
'acceptRemoteFeatures' would result in ParseException being thrown in the
interim, but false would suppress the exception?

I see the issue as being a need to come to a consensus on what constitutes
an exceptional case, or at least one that results in complete failure from
the perspective of the caller.  Viewed that way, my preference would be to
have only one expected behaviour rather than multiple options.  However, if
the group prefers the route of more user control for the parsing behaviour,
I agree with Thomas that we should not introduce multiple top-level classes
since then the logic in both would need to be maintained.

I know we're in a release freeze period so resolving this may have to wait
until post.  

Dan Forsch, Principal Software Engineer
NetGenics, Inc.

> -----Original Message-----
> From: Cox, Greg [mailto:gcox@netgenics.com]
> Sent: Thursday, February 08, 2001 6:25 AM
> To: 'biojava-l@biojava.org'
> Subject: RE: [Biojava-l] Revised Parser
> I understand your approach, but my preference is to keep them 
> seperate.  I
> think it keeps the code in the parser more understandable and 
> I think it's
> easier to understand the constructor call when you don't have 
> to look up
> what the boolean flag is.  
> 	That said, I'd like to either get this finished up, or 
> decide to sit
> on it until 1.2.  If merging both parsers is the way to go, 
> I'll do that.  I
> changed GenBank, EMBL, and SwissProt to keep all the parsers 
> consistent.
> 	Greg
> -----Original Message-----
> From: Thomas Down [mailto:td2@sanger.ac.uk]
> Sent: Thursday, February 08, 2001 6:36 AM
> To: Cox, Greg
> Cc: 'biojava-l@biojava.org'
> Subject: Re: [Biojava-l] Revised Parser
> On Wed, Feb 07, 2001 at 02:46:51PM -0500, Cox, Greg wrote:
> > I'm revising the parsers to log a message when they come 
> across a location
> > they don't understand, and continue processing the file; instead of
> throwing
> > an exception and bombing out.  This will let us push off 
> issues like the
> > remote location problem.  I have a question for the list:
> > 
> > 	Should these revised parsers supplement or replace the current
> > parsers?  I.e., is anyone's program going to break if bad files are
> > partially processed instead of crashing the program?  A log 
> message is
> > written to System.err so it's not a silent failure.
> > 
> > 	If you'd like this to be a second parser instead of 
> replacing the
> > original, I'd appriciate a name for it; I can't come up 
> with a good one.
> Can I just suggest a third option: merge the two and add a boolean
> property (`acceptRemoteFeatures'?)( to control behaviour.  Since
> this is in GenbankProcessor (I presume...) you'll also need to add 
> this property to GenbankProcessor.Factory and propogate it when
> you construct processors.
> This seems easier to me that having two separate classes, but
> users really ought to be able to choose the behaviour they want.
> Does this make sense?
>     Thomas.
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l