[Bioperl-l] Bio::SeqAnalysisParserI [was: Bio::Tools::HMMER refactoring]

Jason Stajich jason@chg.mc.duke.edu
Mon, 18 Dec 2000 09:37:59 -0500 (EST)


On Mon, 18 Dec 2000, Ewan Birney wrote:

> On Mon, 18 Dec 2000, Hilmar Lapp wrote:
> 
> > Ewan Birney wrote:
> > > 
> > > In addition, this interface will not go easily into a corba
> > > /time-sliced/threaded framework.
> > > 
> > > Why not have
> > > 
> > >   Bio::SeqAnalysisParserFactoryI
> > > 
> > >   $parser = $factory->create_parser(-fh => \*FILE);
> > > 
> > >   Bio::SeqAnalyisParserI
> > > 
> > >   while( $next_feature = $parser->next_feature ) {
> > > 
> > >   }
> > > 
> > > same number of functions defined. Twice the number of interfaces, but
> > > these are the interfaces I would argue we want.
> > > 
> > > An implementation could implement ParserFactoryI and ParserI in the same
> > > module if so wished.
> > > 
> > > Whaddya reckon? Too complex for your taste hilmar?
> > > 
> > 
> > Well, Jason and I had such a layout in mind first, but the question was
> > how significant the performance hit might be in a CORBA context. A
> > likely situation is that you have less than 10 methods for which you
> > need parsers, and thousands of sequences, that is, thousands of inputs
> > for each parser. We thought that in a CORBA context creating 10 objects
> > instead of 10,000 does matter (in pure Perl you probably wouldn't notice
> > a difference), and that therefore we wanted to be able to reuse a
> > once-created parser object.
> > 
> > Of course you could let the parser implement the factory, too, and abuse
> > it as a 'reset', but IMHO this is abuse.
> > 
> > So, what I wanted to say, I guess both Jason and I are in principle
> > happy with a factory. Based on my experience with CORBA, however, there
> > is a performance issue, but my experience is somewhat not up-to-date,
> > and not that extensive, so it's up to you and Jason to make a decision
> > here.
> 
> >From a CORBA perspective I think both schemes would be implemented
> similarly, but with teh current scheme being less clean - indeed -
> potentially dangerously assumming a single client, single thread mode.
> 
> The overhead is going to be in making the SeqFeatures in a CORBA contex,
> not the analysis routines. The analysis routine creation only becomes a
> bottleneck in certain cases (though this does happen - for example, we
> have hit this bottleneck in Ensembl...)
> 
> 
> So - we are ok in splitting the interfaces into two now? Final thoughts
> from jason?

[ A final thought from Jason ... This is probably lost on those that don't
get the bastion of American culture - The Jerry Springer Show ]

I am happy with an interface Split.  I did a sort-of factory in
SeqAnalysisParser recently, to simplify how to work with analysis parsers
and adding those features to sequences (which I am sure has just become
even more confusing for the onlookers).  This proposal does a good job
establishing boundaries of where functionality should come from so I like
it.

I think the CORBA performance questions will have to be evaluated as we
get into it, but I suspect there will be other things as bottlenecks first
- let's also see what we can learn from Ensembl.  I also know that we have
very much avoided anything to do with analysis in the current BioCorba
spec to see what we could learn from the LSR idl and to delay that battle
for a little while. I think we may want to revist that in the BioCorba idl
some time in the future though as Bioperl begins to provide this
functionality that other language projects may want to use (instead of
each of them writing a Genscan, HMMer, Blast parser).

So you have my vote (and every vote should count unless there is a 'rule
of law' preventing that from happening).

-jason

> 
> 
> 
> > 
> > 	Hilmar
> > -- 
> > -----------------------------------------------------------------
> > Hilmar Lapp                                email: hlapp@gmx.net
> > GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
> > -----------------------------------------------------------------
> > 
> 
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>. 
> -----------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/