Naming the modules; Mailing lists

Fri, 21 Feb 1997 13:02:27 +0900 ()

Dear Georg,

  I don't have time to write a long message and address all your points,
but suffice it to say that I am quite certain that the current alignment
object can _NOT_ handle many of the features needed by protein alignments
in an efficient manner.  Altering the object to add these features would
certainly require changing the internals very significantly and might also
require changes to the interface. 

  As I said before, I think that hte best idea is to call it
Bio::Seq::GFAln or Bio::Seq::NucAln, to indicate its speciality for
Nucleic Acid Sequences.  I don't see any reason for objection to this --
if it turns out that this object can handle proteins and structures
perfectly well, then we could simply do the following:

1) Leave a copy of the module "as is" with the name Bio::Seq::GFAln
2) Make a copy of the module which is called Bio::Seq::Aln

  Thus, people who have used the the module from the outset can happily go
on using it as they always have.  But, those who wish to take advantage of
newer developments can alter the appropriate lines in their code ot
Bio::Seq::Aln.   Thus, backwards compatibility would be ensured without
compromising future development.

  By contrast, if we name the module Bio::Seq::Aln right now, and we
later decide that I am right (i.e., the module needs a major overhaul to
become general), then either:

a) the 'general' module will be forced to have some obscure name and the
obsolete module will retain the name Bio::Seq::Aln
    or
b) the 'general' module will take on the name Bio::Seq::Aln, but any old
code which used Bio::Seq::Aln will have to be altered to continue to work.
This would break backwards compatibility.

  I agree that the current module can handle protein alignments in a
limited way, and may be sufficient for some uses.  That's why I proposed
the name Bio::Seq::GFAln; I thought you preferred NucAln to GFAln.  Any
other thame (which does not preempt the general 'Aln') is fine with me.

> I even consider Bio::Aln to be sufficiently general to
> process alignments of numeric data, linguistic data, etc !

  Indeed, right now it can hold any sort of data -- but that is perhaps a
weakness!  This is a bioperl object and it should have features for
supporting biological sequences.  Right now, the object lacks support
for many types of operations that people would want to do on proteins.  

> I'd suggest to have Bio::Aln and Bio::ProtAln, where ProtAln has 
> special features needed for protein data, incl. speed enhancements
> where necessary. See also my post from Monday, in particular,

As noted above, I suggest  Bio::Seq::NucAln and Bio::Seq::ProtAln, and 
-- if possible -- the two are merged into Bio::Seq::Aln at some time in
the future.  I don't see why you want to preempt future improvements.

> I'm really not sure. Do you mean that storing the alignment as an
> array of array references is already problematic ?

Storing it as the raw sequence (without clear reference to the original
sequence and attachments) is the problem.  Suffice it to say that to get
things to work for many protein operations, there would need to be
relators for vitually every operation.  This would be complicated and
inefficient. 

As I said before (and at risk of being highly repetitive), I am not
certain that it is possible to efficiently meet "your" requirements and
"my" requirements.  Each has fundamental assumptions that are quite
different.  My plan, therefore, is that NucAln (or GFAln) and ProtAln (or
AnonAln) develop independnetly.  Hopefully, each will contain all features
for both proteins and Nucleic Acids, but internally they will be optimized
completely differently.  Eventually, we may recognize how to efficiently
take care of all tasks and then produce the Aln module. 

--

I think that anything relying on PerlDL should be a 'special feature' and
not built into the core of the "basic" alignment module.  This is because
PerlDL is non-trivial to install, which will prevent its use by a large
fraction of potential bioperl users.  That said, I would of course
heartily endorse development of modules using PerlDL if that does prove to
be more efficient and effective.

> > > > * vsns-bcd-perl-guts      like vsns-bcd-perl, incl. subscribers
> > > > * vsns-bcd-perl-talk      like vsns-bcd-perl, incl. subscribers
> > 
> > "including subscribers"  What's the difference between -guts and -perl
> 
> These lists are indeed ``for later''. E.g. -guts could be used for topics 
> that are too boring for the more general list vsns-bcd-perl, once 
> vsns-bcd-perl has many subscribers. This could e.g. be a special topic
> like using Perl Data Language stuff, or GUI development (note that my
> belief is that we don't need a GUI; WWW/CGI can do the job.) (They've just 
> split the PerlDL list in a similar way). (Let's not worry about this now.:)

Ok.  But perhaps the vsns-bcd-perl list should be renamed to
vsns-bcd-perl-dev?