[Bioperl-l] Bio::Tools::Glimmer

Mark Johnson johnsonm at gmail.com
Tue Feb 6 23:53:49 UTC 2007


Okay, I need to get something going for a project I'm working on.  Options:

1) Stick it all in one module:  This can get a bit ugly, as Glimmer, as
opposed to GlimmerM and GlimmerHMM, does not explicitly identify itself in
the prediction report.  You can pick up on some unique things in the output
file, but you don't know what you've got until you're actually parsing it.
Unless you require a format argument up front, then you can split the
parsing code up into different functions.
2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/Glimmer3.
With or without an abstract dispatch front end.

I suppose at this point, after getting my hands dirty, I'd prefer 1), with
an explicit -format => Glimmer2/3/M/HMM arg required in the constructor.
Though I'm not opposed to 2) if that is what it takes to get it into
Bioperl.

If we can achieve some sort of consensus without too much bloodshed, I'll
shoot y'all some patches and we can consider this issue checked off the
list.

On 9/20/06, Mark Johnson <johnsonm at gmail.com> wrote:
>
>     I think it's going to be at least two modules, one for the
> prokaryotic stuff and one for the eukaryotic.  And really, the
> prokaryotic stuff is different enough to warrant two modules. So three
> different parsers.  Could do it in one, but it would be ugly and
> nasty.  However, this does not preclude three parsers and one abstract
> interface, which is your excellent suggestion.
>     Oh, and excuse me, but I have a bit of a rant here, after dealing
> with parsers and pipelines for the last few months.  Parsers should
> not load the whole input file into RAM to parse it.  And Pipelines
> using the parsers (Ensembl / biopipe) should not stuff the whole
> result set from the parser into a single array.  When you're trying to
> annotate assemblies, it sucks to have to split up contigs/supercontigs
> because the whole result set won't fit into RAM on a 12 gig blade.
> Sheesh.  Though this doesn't matter for bacterial genomes, as they're
> tiny (by comparison to vertebrates).  There, sorry, been saving up
> that frustration for a while.  No offense meant, hope I didn't tick
> anybody off.  8)
>     Torsten:  You sound like you know what you're doing with respect
> to Bioperl more than I do, and I know I don't have CVS access, so I'll
> defer to you.  I'd be happy to help out, though.
>
>
> On 9/20/06, Hilmar Lapp <hlapp at gmx.net> wrote:
> >
> > On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote:
> >
> > > I'm not sure whether to
> > >
> > > 1. parse them all under the same module, perhaps with a
> > > -format=>'glimmerXXX' parameter
> > >
> > > 2. create a single new module  Glimmer2 and Glimmer3
> > >
> > > 3. create two new modules, one for Glimmer2 and one for Glimmer3,
> > > given
> > > they are different outputs both in syntax and number of output files
> > >
> > > Any advice from Bioperl 'old timers' appreciated ;-)
> > >
> >
> > If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an
> > example for how this can work.
> >
> > If this would amount to basically 4 modules stringed together into
> > one file (because the parsing code can't share much if anything
> > between the flavors), it'd still be advantageous to have a single
> > frontend module that would then dispatch.
> >
> >         -hilmar
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> >
>



More information about the Bioperl-l mailing list