Steve's notes on Seq.pm
Chris Dagdigian
cdagdigian@genetics.com
Tue, 4 Mar 1997 11:15:47 -0400
Enclosed is what I got out of reading Steve's written comments on Seq.pm.
I am *really* swamped at work now (even weekends) and probably will not get
much chance to do any work on Seq.pm until sunday. Perhaps we can kick
these notes around and come up with a "MUST DO NOW" list of essentials that
need to be fixed prior to the next version?
Personally, I feel that moving all ReadSeq stuff into a ReadSeq.pm would be
worth the effort at this point.
Regards,
Chris Dagdigian
cdagdigian@genetics.com
----------------------------------------------------------------------------
-----
Steve's notes on Seq.pm and Seq.pod-docs.html
(transcribed by chris dagdigian - errors are mine :)
----------------------------------------------
General Observations:
o Users should not be expected to edit .pm files
(we need a configuration process)
o Separate out ReadSeq functions into ReadSeq.pm
(would clear/simplify internals of seq.pm )
o Might be worth looking at Sean Eddy's SQUID
library for parsing ideas
o Perhaps we need another option for sequence type
that deals with alphabet stringency levels
1 Strict 1 Gap
2 Ambig 2 No-Gap (2,1) Set as default?
o Does layout() need an option that would specify
output to a file?
o translate() should be able to use different
translation tables
General To-Do Comments
----------------------
1. Need a validity marker that is set to 'false'
when the program Carps() out
2. rename %SeqType to %SeqAlph
3. Settle the "_nofile" or "undef" question
in the constructor method
4. Constructor default ID field should be changed
from "No_Id_Given" to "No_Id" plus a unique number
5. Constructor should attempt to guess alphabet (and origin?)
before setting a default "not_given" setting
6. _file_read() is innefficient, just slurp file in with
a giant READ
7. Resolve this conflict: names() currently will ADD the values
of a passed in hash referance to the existing %names hash.
However, the POD docs state that the method SETS (overwrites)
the hash values. Which should it do?
8. Remove POD discussion about internal stuff
in out_GCG()
9. type() should verify that the given type is valid for
the sequence (in addition to being a supported type)
10. Clear up the POD in parse()
11. Get rid of call to `date` in out_GCG()
(Actual Steve quote--> "Yuk!" :)
12. revcom() needs to check that the sequence is really
nucleotide before applying the regular expression
**13. DNA_to_RNA() should return a Bio::Seq object, not a
string or array
**14. translate() should return a Bio::Seq object, not a
string or array
15. translate() treats ambiguity inconsistantly.
16. Need to write a "getseq()" and "setseq() so that
user calls to the "internal" _seq() can be avoided
17. rename seq_length() to seq_len()
18. rename ary() and str() to seq_ary() and seq_str()
19. write "RNA_to_DNA()" to complement existing
DNA_to_RNA()