[Bioperl-l] Directions for qual modules.

Chad Matsalla chad@sausage.usask.ca
Tue, 7 Aug 2001 11:54:46 -0600 (CST)


Ewan wrote:
> i think the deliberate copying of PrimarySeq to PrimaryQual is fine. I
> would put it in
> Bio::Seq::PrimaryQual
to which Malcom Cook replied:
> I would rather create a new interface class, Bio::Seq::QualI

Perhaps I will make a Bio::Seq::QualI, and later move that into
Bio::IdentifiableI?

Ewan wrote:
> We really need a Bio::IdentifiableI interface that it could inheriet for
> the identifier set.
to which Heikki replied:
> I agree. Let's put in Bio::IdentifiableI. Then Chad does not have to
> nor inherit from Bio::PrimarySeqI nor duplicate code.

Is this something I can do? Now or later? I guess I am a bit confused as
to how Bio::IdentifiableI would look and behave.

Malcom Cook wrote:
> I would rather have Bio::Seq::Phred implement both Bio::PrimarySeqI and
> Bio::Seq::QualI.

This is along the lines of what I was considering. Depending on what
-format is set to in:
my $in_qual  = Bio::SeqIO->new(-file => "<t/qualfile.qual" , '-format' =>
'qual');
will decide whether you will get back a Bio::Seq::PrimaryQual object
alone or a Bio::Seq::SeqWithQuality object with both quality and sequence
objects inside of it.

I know about the different formats that phred can write but at the moment
I really only care about phd files:
BEGIN_DNA
a 6 1
c 6 20
t 6 17
...
and fasta-style files containing quality values only. These are what can
be found in phrap'ed consed project directories.


<thinking>
Should there be a Bio::SeqIO::phd and Bio::SeqIO::xbap and so on, one for
each type of file? This sounds good to me and doesn't break the fact that
at this time my Bio::SeqIO::qual only parses fasta-style quality files. It
also seems more intuitive to do things this way rather then to pass in
some flag when then SeqIO object is constructed.

my $in_qual  = Bio::SeqIO->new(-file => "<t/qualfile.qual" , '-format' =>
'qual');
my $in_qual  = Bio::SeqIO->new(-file => "<t/phredfile.phd" , '-format' =>
'phd');
my $in_qual  = Bio::SeqIO->new(-file => "<t/quality.xbap" , '-format' =>
'xbap');

Heikki, how does this tie into your idea of a Bio::Seq::QualIO? How would
this account for .phd files with both quality and sequence?
Bio::Seq::QualIO might only be intuitively useful for files (like my
fasta-quality files) with quality values only. I am really interested in
phd files (like above) that have both quality and sequence.



In any case, here is what I am going to do in the next little bit:

1. Rename Bio::PrimaryQual -> Bio::Seq::PrimaryQual

2. Verify that Bio::Seq::PrimaryQual uses $obj->qual() rather then
$obj->seq() .
Note - It it already did this. :)

3. same for $obj->subqual(10,20)
Note - This returns a reference to an array. Is that OK?

4. Create a Bio::Seq::QualI and cut-and-paste ID-related things from
Bio::PrimarySeqI where I find them useful.

5. Create a Bio::Seq::SeqWithQuality that has a Bio::PrimarySeq and a
Bio::Seq::PrimaryQual .

6. "Fix" Bio::SeqIO::qual. Should this return a Bio::Seq::PrimaryQual
object or a Bio::Seq::SeqWithQuality with no sequence?

7. Create Bio::SeqIO::phd. This will return a Bio::Seq::SeqWithQuality
object.


I will go ahead and do this but I am ready to alter everything if it seems
to be incorrect. This has just been a big experiment for me anyway. :)

When should I do the first commit of this stuff? When it works, or now? I
have cvs access now.



Chad Matsalla
Agriculture & Agri-Food Canada
Saskatoon, Saskatchewan, Canada