UnivAln 1.004 Beta

Georg Fuellen fuellen@dali.Mathematik.Uni-Bielefeld.DE
Mon, 17 Mar 1997 15:57:49 +0000 (GMT)


Steve wrote,
> ... gave the code an extremely quick look-thru.  Not many comments beyond
> those previously noted.
> 
> 
> > Note that if you accept a reference to a hash as the parameter,
> > the key makes it possible to support a lot of possiblities in parallel,
> > like ``{i=>'...'}'' for the final solution (whatever it is), or even
> > ``{descs=>'...'}'' if you wanna support retrieval of sequences for which
> > the description matches a certain pattern. For ids which have no whitespace,
> > space-delimited strings look optimal. 
> 
> I tend to think that comma-separated is probably better.  A reference to

The problem w/ comma-separated is that according to our current
specs, comma is a legal component of an ID; we only carp on whitespace.
In other words, ``Mus,musculus'' is a legal ID.
Since non-whitespace is also a legal component of filenames on many systems 
I believe, I'd like to keep the convention.

> an array of strings is probably even better still, as that's presumably
> what you use inside the routines that deal with these things.

Arrays of integers are interpreted as index lists; since names may be
integers as well, and Perl doesn't really distinguish integers and strings,
how do you want to do this ?
(Of course, the system under discussion can allow {string=>\$sting_of_names}
as a parameter for seqs().)

> The numbering in the code still seems pretty poorly documented/determined.

Pls be more specific..

> I agree that a hash permits many options.  But that potentially
> just indicates lack of clear thinking and good design.  A tenet of OO
> design is that you shouldn't have redundant interfaces; they raise the
> learning curve (because there are more options to learn) and make the code
> less efficient and more error-prone.

Since ARRAY, CODE and scalar are already taken as the possible type of the
first real parameter of seq(), HASH seems ideal.

> I note that you're still using %FormUnivAln and %TypeUnivAln rather than
> the arrays @UnivAlnType and @UnivAlnForm.  These should be arrays, not
> hashes.

You mean, @UnivAlnType = ('Unknown','Dna','Rna','Amino','OtherSeq') and 
@UnivAlnForm = ('unknown','raw','fasta','nexus') ? On second thoughts,
I must admit I fail to remember the advantages, but can clearly see
the disadvantages; given ``fasta'', how do you find out what the corresponding
number is ? It's my feeling that this is a costly change on which I'll spend 
hours, _or_ I just misunderstand.

(I've been waiting for Chris to go ahead on this since I'm happy with the 
current setup.)

> 

best wishes,
georg