Bio::Seq (was:Robert's chapter :))

Chris Dagdigian cdagdigian@genetics.com
Tue, 10 Dec 1996 12:04:50 -0400


>Do you think you could prepare Bio::Seq for a first beta release until, say,
>Dec 15 (and Steve Brenner and/or I would do some final checking) ?

>That would include:
>+ completing the POD (search for ``to be completed'' in Seq.pm)
>+ add code to testSeq.pm testing/demonstrating your new code in particular
>+ alphabet checking (before storing sequences in the object hash)
>+ integrate Gilbert's ReadSeq so that it's called as a command for
>  converting exotic formats, _if_ it's installed (before calling
>  parse_bad )

I'd be happy to do what I can with Bio::Seq in preparation for a beta release.
I will need some "guidance" from people on some questions that I have
listed below.

Regards,
Chris Dagdigian
cdagdigian@genetics.com


#########################################################################
[apology in advance if these are uninformed questions]

[This refers to the Seq.pm code as found at
http://www.techfak.uni-bielefeld.de/bcd/Tec/Bioperl/Code/Bio/Seq.pm]

alphabet checking:

 o Should the 'base' DNA alphabet be "ACTG" or should we use the IUPAC
extended genetic alphabet as the default base? It would be nice to have
recognition for nucleotide ambiguity right from the start. Is there a
downside to having the larger alphabet as the base for DNA sequences? [The
IUPAC alphabet is described at
http://www.techfak.uni-bielefeld.de/bcd/Curric/PrwAli/node7.html]

alphabets in general:

 o With the %alphabet hash, there are some bits of code to add a gap "-" or
an missing "?" character to the alphabets. I can see how ''1Mg'' =
$SeqType{Dna}Mg and would contain an alphabet of ["A","C","T","G","?"].

What I'm not clear about is how a user would invoke that "alternate"
alphabet over the base alphabet of $SeqType{Dna} = ["A","C","T","G"].
Passing in '1Mg' as the "Type" field would result in a "Unknown" response
when the %SeqType hash is checked wouldnt it? Am I missing something really
simple here or should there be an optional constructor for specifying the
alphabet?

That way you could construct:
 $myseq = new Bio::Seq(-seq=>'ACTG???',-type=>'Dna',-alphabet=>'1Mg');

  -or even-
 $myseq = new Bio::Seq(-seq=>'A-C-TG?-?-?',-type=>'Dna',-alphabet=>'1MgGp');

I guess without having the alphabet defined seperatly, I'm not quite sure
how I would go about checking it whenever a sequence is assigned.


 o ReadSeq

   No questions now but I'm sure I'll have some later :)