Seq.pm

Georg Fuellen fuellen@dali.mathematik.uni-bielefeld.de
Tue, 17 Dec 1996 12:28:55 +0000 (GMT)


Chris:

That was a big step ahead!  I'm very grateful !!

I've modified Seq.pm a little, see
http://www.techfak.uni-bielefeld.de/bcd/Perl/BioCD/Seq.pm
Please check the diff !
(it's http://www.techfak.uni-bielefeld.de/bcd/Perl/BioCD/diff.txt  
80% of the differences are trivial. Tell me if you don't agree w/ something)
I've also changed outRaw->out_raw and outFasta -> out_fasta
in testSeq.pl.

>From the following comments, those marked ``***'' may be relevant for 
Steve as well, although I believe he won't have safe email access again
before January, so let's not wait for him if we're confident w/ our
decisions :-)

I did a s/new Bio::Seq/Bio::Seq->new/g
because the Perl module list says 
>>>>>>>>
For method calls use either

     $foo = new Foo $arg1, $arg2; # no parentheses $foo =
     Foo->new($arg1, $arg2);

but avoid the ambiguous form

     $foo = new Foo($arg1, $arg2); # Foo() looks like function call
<<<<<<<<
It also says
>>>>>>>>
Use underscores to separate words. It is generally easier to read
$var_names_like_this than $VarNamesLikeThis, especially for non-native
speakers of English. It's also a simple rule that works consistently with
VAR_NAMES_LIKE_THIS.
Package/Module names are an exception to this rule. [...]
<<<<<<<<
so I've changed a few subroutine names, ok? Sorry that I didn't
correct this earlier!

At ``=head2 Object Manipulation'', can you move the 
``Accessors [...] There are a wide variety of methods [...]'' and ``Methods 
[...]'' POD here, and expand it using the examples now given seperately
in each of the method's POD ? (NOTE that seq() is only access, not change !
*** Changing the sequence should be done via _seq(); not sure whether we really 
*** need to enable seq() to do _safe_ changes as well; what do you think ?)

In OO Programming, you always want to avoid that the user accesses the internal
data representation (so that you may change it later), so we don't want to
advertise ``$seq->{[FIELD]} = $new_value;'' . Users who do this will be bitten
some day :-) Along the same lines, pls move the ``=head2 Sequence Object''
stuff into a ``Bio::Seq Guts'' POD section at the end.

In ``=head2       Extended DNA / RNA alphabet'' we need to note that
we support this, except for the alphabet_ok and related stuff.

Calling ReadSeq in _file_read is problematic: Fasta files are
parsed twice, and it may be much slower than the Perl version
since it invokes a system call... I'd strongly suggest that
ReadSeq is called just before parse_bad is called; but you may have
a good reason not to do this ?! In other words, ReadSeq should
be the last resort, not the first.
I'd just write into the docu that for multiple sequences in one file, the
behavour of our module is currently unspecified.

Can we / should we do Output via ReadSeq as well, if there's no
internal "out_$ffmt" routine ?
Anyway, out_GCG is very useful, b/c A LOT of ppl won't have readseq,
and we aim to support the most important formats directly.

We should delete ``$SeqForm{IG}       = 1;'' and the like (at least the
formats that are not supported by ReadSeq); it's legacy
from Steve's Grit::Seq and just confuses ppl; let's only mention what
we have.

I'd just turn the POD for the internal functions (starting w/ "_") into 
pure comments, what do you think ? Anyway, Can you re-order things so that
the methods are in some logical order ?

$self->{ffmt} can be set by the user AFTER the parsing, if s/he
desires a different output format. Other than that, I very much liked your 
explanations in the POD for ffmt().
*** It seems that there's no parse/output system which
*** is easier, no? Having $self->{ffmt_for_parsing} and $self->{ffmt_for_output}
*** seems too complicated for me.

*** Do you have an idea how to fix dup() ?
Maybe you can fix parse_unknown... We just need to implement a stronger
recognition regexp of Fasta files.

-re- alphabet_ok, take a look at
http://www.perl.com/perl/faq/Q5.5.html
You can simply read the alphabet into a hash, and do the checking
w/o a regexp. What do you think ? Is there a size problem ?
btw, alphabet_ok should return the standard 1 / 0.

-re- revcom  you still need to check the alphabet.

In the test code, $myseq->{"seq"} needs to be replaced by the accessor,
$myseq->_seq(..)

If I missed anything you want feedback about, or if I am unclear, pls 
email me !

Do you think you can take care of all this before Christmas? Would be optimal :-)
I'll be reading email until Fri/Sat, around Christmas, and again Jan 2.

Happy Holidays!!
georg

> Hi all,
> 
> Seq.pm is ready for another round of examination. I've implemented Georg's
> suggestions regarding alphabets, POD and readseq.
> 
> Seq.pm is now fairly bulky to be sent to the list in email form now so all
> associated files can be accessed via HTTP at
> http://www.ayf.org/~c_raffi/bioperl/
> 
> ayf.org is run by a friend of mine-- I'm stuck behind a firewall at work so
> I can't serve HTTP or offer FTP access to any of my local machines.
> 
> If it is difficult to access the files via the web, I can email the code
> out to individuals. I think the plan is for Georg and Steve to go over the
> code carefully before eventually releasing it to interested beta testers.
> 
> Regards,
> Chris Dagdigian
> cdagdigian@genetics.com
> 
> 
> 
>