Steve's notes on Seq.pm

Tue, 4 Mar 1997 11:15:47 -0400

Enclosed is what I got out of reading Steve's written comments on Seq.pm.

I am *really* swamped at work now (even weekends) and probably will not get
much chance to do any work on Seq.pm until sunday. Perhaps we can kick
these notes around and come up with a "MUST DO NOW" list of essentials that
need to be fixed prior to the next version?

Personally, I feel that moving all ReadSeq stuff into a ReadSeq.pm would be
worth the effort at this point.

Regards,
Chris Dagdigian
cdagdigian@genetics.com

----------------------------------------------------------------------------
-----

  Steve's notes on Seq.pm and Seq.pod-docs.html
(transcribed by chris dagdigian - errors are mine :)
----------------------------------------------

General Observations:
o Users should not be expected to edit .pm files
  (we need a configuration process)

o Separate out ReadSeq functions into ReadSeq.pm
  (would clear/simplify  internals of seq.pm )

o Might be worth looking at Sean Eddy's SQUID
  library for parsing ideas

o Perhaps we need another option for sequence type
  that deals with alphabet stringency levels
  1 Strict    1 Gap
  2 Ambig     2 No-Gap   (2,1) Set as default?

o Does layout() need an option that would specify
  output to a file?

o translate() should be able to use different
  translation tables

General To-Do Comments
----------------------

1. Need a validity marker that is set to 'false'
   when the program Carps() out

2. rename %SeqType to %SeqAlph

3. Settle the "_nofile" or "undef" question
   in the constructor method

4. Constructor default ID field should be changed
   from "No_Id_Given" to "No_Id" plus a unique number

5. Constructor should attempt to guess alphabet (and origin?)
   before setting a default "not_given" setting

6. _file_read() is innefficient, just slurp file in with
   a giant READ

7. Resolve this conflict: names() currently will ADD the values
   of a passed in hash referance to the existing %names hash.
   However, the POD docs state that the method SETS (overwrites)
   the hash values. Which should it do?

8. Remove POD discussion about internal stuff
   in out_GCG()

9. type() should verify that the given type is valid for
   the sequence (in addition to being a supported type)

10. Clear up the POD in parse()

11. Get rid of call to `date` in out_GCG()
    (Actual Steve quote--> "Yuk!" :)

12. revcom() needs to check that the sequence is really
    nucleotide before applying the regular expression

**13. DNA_to_RNA() should return a Bio::Seq object, not a
      string or array

**14. translate() should return a Bio::Seq object, not a
      string or array

15. translate() treats ambiguity inconsistantly.

16. Need to write a "getseq()" and "setseq() so that
    user calls to the "internal" _seq() can be avoided

17. rename seq_length() to seq_len()

18. rename ary() and str() to seq_ary() and seq_str()

19. write "RNA_to_DNA()" to complement existing
    DNA_to_RNA()