New Bio::Seq and Bio::Seq::Parse (.025 BETA)

Georg Fuellen fuellen@dali.Mathematik.Uni-Bielefeld.DE
Mon, 17 Mar 1997 15:04:40 +0000 (GMT)


Hi,

> Hi Chris,
> 
> > The location is: http://www.ayf.org/~c_raffi/bioperl/top.html
> 
>   Nifty logo!  Only, I'm confused about what the code is supposed [sic] to
> be doing.  (Also, the name of hte project is 'bioperl', not 'bio::perl'
> That would imply a bio/perl.pm module, whereas bioperl doesn't imply
> anything at all :) 

:-> For me, the logo somehow reinforces the idea that "Perl is obfuscated" -
it's not the message we'd like to get across I think, or?

I'd suggest to keep the logo on Dag's page, and postpone the issue;
finalizing a logo costs time that is better spent on the code right 
now - in April, I hope things are different :-)

> > I wrote a crude Parse.pm that serves as an interface to ReadSeq and made
> > the appropriate changes to Seq.pm.
> 
>   Great!
> 
>   Had a quick look at it; it seems quite reasonable and the changes in
> Seq.pm are also appropriate.  On comment is that it would be much more
> efficient to pass around references than potentially huge strings.
> 
>   However, these modification doesn't deal with the bigger issue of what
> to do about the strings v. files problem, that I mention in the 5th
> paragraph of:
> 
> http://www.hrz.uni-bielefeld.de/mailinglists/BCD/vsns-bcd-perl/9702/0003.html
> 
>   Is the parse function in Bio::Seq supposed to take 1 or 2 parameters (as
> documented) or 4 params (as coded).  The problem arises because of some of
> my inefficient legacy design at the very outset, but I think there's a
> solution. 
> 
> -=-
> 
>   A few other nits from a _very_ cursory look-through
> 
> @SeqForm appears never to be created
> 
> I would change [@%]SeqForm to [@%]SeqFmt, or even [@%]seq_fmt (to be
> consistent with the rest of the naming). 

I think then we should have seq_ffmt.
Then again, doesn't SeqForm hint at the fact that these variables are 
very special ?

> The names of formats in SeqForm, etc., should be all lower-case for the
> reasons discussed earlier on this list.  (Becuase is it FastA or fasta or
> Fasta?  GenBank, Genbank, or genbank?  If it is always lowercase, there's
> no ambiguity.)
> 
> There's no 'valid' field to indicate whether or not the object is indeed
> valid for any operation.  For example, if setseq is used to set an invalid
> sequence.  

What if we don't allow this to happen ?
If we keep the object valid all the time ?

> to-do: more validity checking, such as in setseq
> 
> A "_undef" parameter (or something like it) needs to be available to unset
> various options
> 
> Functions which can return an invalid result (such as parse_bad) should
> return undef ratehr

You mean, rather than 0 ? I thought zero and the null string ("") 
are interpreted as false, and returning 0 or "" seems the standard 
convention, no ?

best wishes,
georg

>