[Biojava-l] Alignments
Schreiber, Mark
mark.schreiber@agresearch.co.nz
Thu, 16 May 2002 10:00:43 +1200
Hi -
Right off the bat I would like to say that the Phred package was poorly
designed. I can say that cause I made it ;-)
The major problem with it was that initially I couldn't think of a way
to store the infinite integer alphabet with the DNA alphabet as a
cross-product. This problem is actually now solved (see PhredTools). I
have been very slowly trying to integrate this 'great leap forward' into
the rest of the Phred package. One nice thing is that a SimpleSymbolList
can now be constructed over the PHRED alphabet.
The PhredSequence and GappedPhredSequence are actually largely
deprecated by this improvement although I'm always reluctant to simply
break stuff as it sounds that David is actually using this stuff a lot.
One possibility would be to keep the API shell of PhredSequence,
GappedPhredSequence and gut the internals so they simply become
convinience classes, I suspect that some API breaking is required but
keeping it to a minimum would be good.
Another issue that needs solving is the tokenization -> fasta formating
of the Phred stuff. There are methods that read and write the DNA/ Phred
score and merge them but they where written for the old situtation where
the DNA and Phred sequences where kept separate.
Any suggestions on making all this mess look pretty and well designed
are very welcome.
- Mark
> >> Could you explain what Qualitative means?
> >
> >
> > Qualitative is defined in biojava.bio.program.phred package. In the
> > case of PhredSequence it represents the quality score given
> by Phred
> > or Phrap. there is just one method qualityAt().
>
>
> I will check this out. Quality scores are the sort of thing that the
> integer alphabet is meant to be used for, so I will see how the phred
> API shapes up to how Thomas and I had envisioned data being
> represented.
> It is a corner of the library that I have never visited before.
>
> >
> > What methods would the gapped interface contain? I would be
> happy to
> > make GappedSymbolList an interface and add a
> SimpleGappedSymbolList.
> > Perhaps that would make people mad. The idea of
> GappedSymbolList was
> > that it wrapped another symbol list, adding the ability to view it
> > with gaps. GappedSequence does the same for a Sequence
> instance, and
> > takes care of projecting features from un-gapped to gapped
> > coordinates. Both classes have (or should have) methods to
> fetch the
> > underlying object being viewed. Perhaps the need for your gapped
> > interface goes away if we have a generic 'View' interface, and code
> > would walk down the decorating views untill they hit one
> that has the
> > funcitonality they want. Grr. Sometimes I don't like OOP.
> >
> >
> > The gapped interface would contain the methods in
> GappedSymbolList. I
> > see that we now have GappedSequence which is what I am
> after. But we
> > also have GappedPhrepSequence, which with the exception of the
> > capitolization of method names could implement the gapped
> interface. I
> > suspect that renaming GappedSymbolList would cause a bunch of
> > headaches so a different name for the interface might be in order
> >
>
> Again, I will take a look at GappedPhredSequence and see if
> it can't be
> refactored as a gapped view of a phred sequence. Do any phred
> users have
> views?
>
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================