[Bioperl-l] .qual are like fasta files

Malcolm Cook mcook@dna.com
Tue, 7 Aug 2001 06:39:45 -0700


>All this sounds great! Good stuff!

Seconded!

>
>i think the deliberate copying of PrimarySeq to PrimaryQual is fine. I
>would put it in 
>
>Bio::Seq::PrimaryQual

I would rather create a new interface class, Bio::Seq::QualI

>
>so we don't mess up Bio::  namespace (despite the fact that
>Bio::PrimarySeq is there).
>
>there is an argument that although PrimaryQual should 
>basically look like
>a copy and paste, it should in fact use
>
>  $obj->qual() # equivalent to ->seq, returns an array of numbers
>
>and
>
>  $obj->subqual(10,20) # equivalent to ->subseq
>
>as it is not really a sequence and we will mislead people less.
>

Seconded!  Bio::Seq::QualI would provide prototypes for $obj->qual() and
$obj->subqual.

>
>
>I suspect inherieting from Bio::PrimarySeqI is probably a bad 
>idea. open
>quesiton what one does with trunc. translate is not possible ;) 
>
>
>We really need a Bio::IdentifiableI interface that it could 
>inheriet for
>the identifier set.
>
>
>After than I would make a 
>
>
>Bio::Seq::SeqWithQuality
>
>
>which would inheriet from Bio::Seq (hence have a Bio::PrimarySeq and
>all the other goodies) but have in addition a 
>Bio::Seq::PrimaryQual, and
>give access to the quality values in many places. 

I would rather have Bio::Seq::Phred implement both Bio::PrimarySeqI and
Bio::Seq::QualI.  In addition, 

Why the interface layer?

Well, beside being good OO and bioperlish to boot, it would support
Bio::Seq::Phred written to suppport both .phd and .qual (fasta/quality)
files.

So far, I think this discussion is mainly considering that phred can be
instructed to be seperate fasta-like files for quality scores:

   -qt fasta                    Set the output quality file format
                                to FASTA. Trimming options affect the
                                FASTA file; see the Notes below for
                                more information.

But wait, there's more!

   -qt xbap                     Set the output quality file format
                                to XBAP.  Trimmed off base quality
                                values are omitted.

   -qt mix                      Set the output quality file format
                                to FASTA. Base quality values for
                                all bases are written (including those
                                for trimmed off bases).

So, perhaps Bio::Seq::Phred would have alternate implementations of
Bio::Seq::QualI depending on a variable (we should call it 'qt'!).

But wait, theres even more - it can also create a single 'PHD' file that
contains both base-call and quality data using this option:

   -p                           Write a PHD file, which is used by the
                                consed editor to display bases.  A PHD
                                file contains a set of comments used by
                                consed for maintaining consistency between
                                the chromat file, the .ace file and
                                the PHD file, and it contains base data
                                as triples consisting of the base call, 
                                quality, and position.  ....

>
>
>(ewan spots a rather annoying memory circle there and somewhat 
>curses his
>luck. - to do with Bio::SeqFeatureI->entire_seq giving back a
>Bio::PrimarySeq but ideal in this case you would want access to the 
>Qual values as well but then if it pointed to the parent you would
>end up with a cycle. hmmmmm)

Ewan, would the way around this be to have Bio::PrimarySeqI extend my
posited Bio::Seq::QualI?  Not that I propose refactoring BioPerl this
way....

>
>
>Bio::SeqIO::phred  could then make a Bio::Seq::SeqWithQuality object
>

Nah...

Bio::SeqIO::phred  could then make a Bio::Seq::Phred object

>
>What do other people think?
>

Seeing as you asked....

Regards,

Malcolm Cook
DNA Science Labs