UnivAln 1.004 Beta
Steven E. Brenner
brenner@akamail.com
Tue, 18 Mar 1997 10:34:52 +0900 (JST)
> The problem w/ comma-separated is that according to our current
> specs, comma is a legal component of an ID; we only carp on whitespace.
> In other words, ``Mus,musculus'' is a legal ID.
> Since non-whitespace is also a legal component of filenames on many systems
> I believe, I'd like to keep the convention.
I thought ID's had to be in '\s'; if not, maybe they should be. Further,
whitespace is a legal component of most filesystems. (It is on Unix,
Macintosh, and Windows, for example).
An array seems to me to be "right" way to do this, I think. But I thought
we were talking about numeration (rather htan identifiers anyway).
> > an array of strings is probably even better still, as that's presumably
> > what you use inside the routines that deal with these things.
>
> Arrays of integers are interpreted as index lists; since names may be
> integers as well, and Perl doesn't really distinguish integers and strings,
> how do you want to do this ?
> (Of course, the system under discussion can allow {string=>\$sting_of_names}
> as a parameter for seqs().)
I don't follow -- probably because I haven't spent enough time studying
UnivAln
> > The numbering in the code still seems pretty poorly documented/determined.
>
> Pls be more specific..
You sent an email saying that UnivAln supported arbirary numbering
schemes. I saw no documentation (even in comments) about this anywhere.
There was lots of code passing around 'numering,' without ever
saying what it was supposed ot be.
> > I agree that a hash permits many options. But that potentially
> > just indicates lack of clear thinking and good design. A tenet of OO
> > design is that you shouldn't have redundant interfaces; they raise the
> > learning curve (because there are more options to learn) and make the code
> > less efficient and more error-prone.
>
> Since ARRAY, CODE and scalar are already taken as the possible type of the
> first real parameter of seq(), HASH seems ideal.
I'm not saying that using a HASH is bad (though I would tend to aruge that
this means that we should reconsider the parameters to the seq()
function). What I am saying is that allowing multiple ways of specifying
the same data via a hash is generally bad design.
> > I note that you're still using %FormUnivAln and %TypeUnivAln rather than
> > the arrays @UnivAlnType and @UnivAlnForm. These should be arrays, not
> > hashes.
>
> You mean, @UnivAlnType = ('Unknown','Dna','Rna','Amino','OtherSeq') and
> @UnivAlnForm = ('unknown','raw','fasta','nexus') ? On second thoughts,
> I must admit I fail to remember the advantages, but can clearly see
> the disadvantages; given ``fasta'', how do you find out what the corresponding
> number is ? It's my feeling that this is a costly change on which I'll spend
> hours, _or_ I just misunderstand.
The idea was that you would have
@UnivAlnType = ('unknown','dna','rna','amino','other'); #note lower case
foreach $i (0..$#UnivAlnType) {
%UnivAlnType{$UnivAlnType[$i] = $i;
}
This way we can index from number to string with @UnivAlnType and from
string to number with %UnivAlnType.
The problem is that you have replaced @UnivAlnType with %TypeAlnUniv...
and you're putting a number as the parameter to a hash. This is
inefficient. But worse, it can lead to problems because $foo =" 1" would
give the right results in $UnivAlnType[$foo] but not in
%TypeAlnUniv{$foo}
To restate, to go from a string to a number use a hash
to go from a number to a string use an array
Steve