[Dynamite] compile status / radical idea
Ewan Birney
birney@ebi.ac.uk
Sun, 16 Apr 2000 22:21:08 +0100 (BST)
On Sun, 16 Apr 2000, Ian Holmes wrote:
> Ewan,
>
> Thanks for your considered mail. First off I think one theme that is
> emerging is that we are all slightly uncomfortable with the current
> paralysis, and that the constructive thing to do is identify the
> bottlenecks.
Great point Ian. I am about to disconnect for the trip back to the uk, but
I thought I'd respond quickly here...
>
> Three options are on the table: (1) keep going with IDL-to-C; (2) go Perl,
> possibly mixed with C; (3) go pure C. I am continuing to argue for (2),
> for reasons described below. I am also sympathetic to (3).
>
> > i have been dreaming code last night, and I sort of realised that
> > *internally* in the telegraph package we only "virtual
> > function"/implementation flipping in a limited number of areas
> >
> > - getting sequences out of a database (but not sequences themselves)
>
> I thought you were keen on sequences being virtual too. Do you regard this
> as less crucial now that you've implemented virtual contigs for EnsEMBL?
> (just curious really...)
>
It is more that the momento design pattern allows you to get out of the
problem of having virtual sequences. Once you let in momento's there is
not a great deal of point in having virtua; attributes to sequecnes, just
virtual wyas of getting sequence momentos.
this is a different argument than what I was arguing 2 months ago. Culpa
mea.
> > - running the algorithms either run-time or compile-time.
> >
> > - perhaps some training code.
> >
> > Everything else neither needs run-time method binding nor that much
> > inheritance.
> >
> >
> > So - rather than moving to Perl (drawbacks in my book -
> >
> > a) hard to maintain a large Perl code base - look at ensembl
>
> Actually, I don't think this *would* be a large Perl codebase. This
> project is well-contained, and our object model is already laid out.
> I think we could do it in a dozen or so smallish modules. Probably less
> code than "idlstubs.pl". (And probably quicker to write.)
Hmmmm. Maybe. I see it getting nasty.
>
> > b) execute heavy pieces going run like a stuck cow
>
> Yes, but the Perl implementation is proof of concept only. We'd have two
> options to improve performance:
>
> (1) port DP routines to C
> (2) autogenerate C (c.f. original Dynamite) - VERY easy using Perl
Doing calls Perl->C->Perl, which we might have to do sucks big time.
Kevin is in the same bind of porting son-of-gaze, written in Perl into C -
but the port is triggering an almost global rewrite into C.
>
> > c) guy wont do anything
>
> ;-)
>
> I had hoped that Guy would be interested in converting parts of the
> package from Perl to C. The DP algorithms, for example.
>
> What makes this idea so attractive to me is quick publication. Let me
> elaborate, then you can shoot me down if you disagree...
>
> I/we can write Telegraph in Perl *very quickly*. We are talking about a
> matter of days here. OK, so it runs slow, but we have proof of concept of
> everything - the whole object model, the idea of polymer HMMs, the
> parameter space translation, the training code. _Everything_.
>
> We then start to port parts of it to C, using the same object model as for
> the Perl. (The original Perl version must be so object-oriented that it
> has a halo.) We can even mix Perl & C initially, using XS. We can aim to
> eventually implement the entire library in standalone C, or just the DP
> algorithms, or whatever is feasible. There is no shame in leaving the
> training algorithms in Perl, because the training code can be decoupled
> from the DP code very easily. It is entirely feasible for the training and
> the DP code to communicate by means of XS calls, or over sockets, or even
> through temporary files: the only object that passes from the DP phase to
> the training phase is a Param::Value::Buf, which is easily serialisable.
>
> We can work in parallel. No bottlenecks, and we can write a paper at any
> stage, because we have 100% proof of concept: a working Perl program. I
> hypothesise that a useful division of labour would be for me to do the
> initial Perl implementation, perhaps with Ewan. Then Ewan and Guy could
> take over the porting to C, while I could either write more Perl (e.g.
> experimental training code, XML I/O) or help with the C port.
>
> As soon as we publish, we can go all-out Open Source, i.e. publicise the
> mailing list, give away bottles of champagne, etc etc. Perhaps people will
> even help us with the C conversion.
>
> Being able to publish early, even just a poster at ISMB, is *very*
> *attractive*. It will really get the ball rolling; a collaboration with a
> publication to its name is collaboration that has come of age.
Hmmm. This is a good argument. I *do* like the cut of your jib Ian.
Let me mull on this a bit.
>
> >
> > I suggest -
> >
> > Using "Standard" C methods, with some pointer-to-funtion for
> > database streaming/database access, algorithm implementation to allow
> > compile time code coming in cleanly and possibly training interface.
> >
> > I have a clean sequence stuff already with pointer-to-function for
> > database streaming. I can bind these via CORBA to bioperl.
> >
> >
> > What do people think?
>
> I'm not completely sure I follow you. Are you proposing abandoning our IDL
> object model but sticking with C?
>
> If so then I guess this would certainly remove the IDL-to-C bottleneck
> that arguably has contributed to our current paralysis. We would be
> throwing out a few babies with the bathwater though...
>
> (Baby #1) Yes we are only making sparing use of inheritance and
> dynamic binding, but IMO the main advantage of
> "object-oriented C" is having a logical object model,
> making the library nicer & more logical to use.
> Our IDL-to-C mapping enforces this.
>
> (Baby #2) The formality of using an IDL-to-C mapping also provides
> for future scenarios such as interfacing to CORBA or
> Perl XS.
>
> I have no interest in pushing idlstubs if you are both uncomfortable using
> it. I have always been concerned that using an in-house compiler would
> give people the willies, especially if it is opaque to everyone except
> me.
>
> Most of my recent work on idlstubs has been aiming towards making it more
> comprehensible, by separating out the C-generating part from the IDL
> parser. With these improvements, it would be straightforward for you guys
> to edit to the C without having to delve into the idlstubs Perl.
>
> I estimate the new improved idlstubs would be ready by the end of the
> month, unless we abandon IDL-to-C in which case I won't work on it.
>
> On balance, I think the bottleneck problem probably outweighs the
> advantages of IDL-to-C. But I'd like to see a little more discussion on
> this list first.
>
> I still favour Perl, because I see this being the quickest way by far of
> getting a working library. Dissuade me...
>
I dont think I can.
Ok. I vote for a fast perl implementation, sequences coming from bioperl
and then rewrite of DP in C looking first for XS links.
guy should put his $0.02 first before we leap.
> Ian
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------