UnivAln 1.004 Beta
Georg Fuellen
fuellen@dali.Mathematik.Uni-Bielefeld.DE
Mon, 17 Mar 1997 15:57:49 +0000 (GMT)
Steve wrote,
> ... gave the code an extremely quick look-thru. Not many comments beyond
> those previously noted.
>
>
> > Note that if you accept a reference to a hash as the parameter,
> > the key makes it possible to support a lot of possiblities in parallel,
> > like ``{i=>'...'}'' for the final solution (whatever it is), or even
> > ``{descs=>'...'}'' if you wanna support retrieval of sequences for which
> > the description matches a certain pattern. For ids which have no whitespace,
> > space-delimited strings look optimal.
>
> I tend to think that comma-separated is probably better. A reference to
The problem w/ comma-separated is that according to our current
specs, comma is a legal component of an ID; we only carp on whitespace.
In other words, ``Mus,musculus'' is a legal ID.
Since non-whitespace is also a legal component of filenames on many systems
I believe, I'd like to keep the convention.
> an array of strings is probably even better still, as that's presumably
> what you use inside the routines that deal with these things.
Arrays of integers are interpreted as index lists; since names may be
integers as well, and Perl doesn't really distinguish integers and strings,
how do you want to do this ?
(Of course, the system under discussion can allow {string=>\$sting_of_names}
as a parameter for seqs().)
> The numbering in the code still seems pretty poorly documented/determined.
Pls be more specific..
> I agree that a hash permits many options. But that potentially
> just indicates lack of clear thinking and good design. A tenet of OO
> design is that you shouldn't have redundant interfaces; they raise the
> learning curve (because there are more options to learn) and make the code
> less efficient and more error-prone.
Since ARRAY, CODE and scalar are already taken as the possible type of the
first real parameter of seq(), HASH seems ideal.
> I note that you're still using %FormUnivAln and %TypeUnivAln rather than
> the arrays @UnivAlnType and @UnivAlnForm. These should be arrays, not
> hashes.
You mean, @UnivAlnType = ('Unknown','Dna','Rna','Amino','OtherSeq') and
@UnivAlnForm = ('unknown','raw','fasta','nexus') ? On second thoughts,
I must admit I fail to remember the advantages, but can clearly see
the disadvantages; given ``fasta'', how do you find out what the corresponding
number is ? It's my feeling that this is a costly change on which I'll spend
hours, _or_ I just misunderstand.
(I've been waiting for Chris to go ahead on this since I'm happy with the
current setup.)
>
best wishes,
georg