[Biojava-l] Functions Requirement...

David Waring dwaring@u.washington.edu
Tue, 30 Apr 2002 11:21:34 -0700


We will be using a QualitativeAlignment when viewing resequencing data,
looking for SNPs and other polymorphisms. So we could be looking at several
GappedPhredSequences aligned to each other and generally to the reference
sequence. I would say that both sequence and quality are aligned.

Perhaps I am missing something here but I don't understand when it would be
necessary to have a cross-product alphabet of IntegerSymbolList and DNA
alphabet. In fact I am not sure what the function of a cross-product
alphabet is in terms of a regular alignment. My naive assumption would be
that the alphabet of an alignment should be the same as the alphabet as the
individual sequences so it should be DNA (or Protein) . I see that this is
not the case with SimpleAlignment. Is this cross-product alphabet used in
HMM or just to create the consensus with ambiguity symbols? Could someone
enlighten me?

As for Assembly vs. Alignment. My idea of an assembly is that there is one
true sequence for an assembly, a discrepancy between sequences indicates
either sequence error or heterozygosity. An alignment describes the
relationship between multiple, perhaps different, sequences. The sequence of
one does not suggest anything about the validity of another. So multiple
ESTs should be aligned.

Note there is an error in my QualitativeAssembly interface it should be
	public Symbol qualityAt(Object label,int column);


> -----Original Message-----
> From: biojava-l-admin@biojava.org
> [mailto:biojava-l-admin@biojava.org]On Behalf Of Schreiber, Mark
> Sent: Sunday, April 28, 2002 2:02 PM
> To: David Waring; biojava
> Subject: RE: [Biojava-l] Functions Requirement...
>
>
> I like the API.
>
> I am also intrigued by the idea of a QualitativeAlignment. I
> assume you would use it for EST assemblies. Inspite of it being
> an assembly it may well be better represented as an alignment.
> Therefore, if it is an Alignment the QualitativeAlignment could
> be a sub-interface of UnequalLenthAlignment. There is also the
> question of what should be aligned. For example the PhredSequence
> holds two symbol lists so do you align the quality symbol list or
> the sequence or both?
>
> The problem is caused by the fact that the quality information is
> represented as an Integer alphabet which is infinite and a DNA
> alphabet which is Finite. The equation to calculate the phred
> score is QV = - 10 * log_10( P_e ) where P_e is the probability
> that the base call is an error. Hence the lower bound is 0 where
> P_e is 1 while the upper bound is infinite. However realistically
> a sequencer could never approach P_e of > 0.00001 which is a
> phred score of 50 (a very conservative estimate). Thus a fininte
> alphabet could be made and a cross product alphabet used instead?
> Can anyone see a reason why this might be a bad thing?
>
> Do people have views on whether a EST contig assembly is best
> represented as an Alignment or an Assembly?
>
> Mark
>
>
>
> > -----Original Message-----
> > From: David Waring [mailto:dwaring@u.washington.edu]
> > Sent: Saturday, 27 April 2002 8:43 a.m.
> > To: biojava
> > Subject: RE: [Biojava-l] Functions Requirement...
> >
> >
> > Funny that this comes up now. I am currently working on some
> > new Alignment classes. I will be supporting alignments of
> > unequal length. I think this might be at time to discuss
> > additions to the API.
> >
> >  In addition to the functions Mathew mentioned include
> > support for UnequalLengthAlignments as I am working on. I see
> > at least 3 new methods
> >
> >         /**
> >         * The location of an individual SymbolList relative
> > to overall Alignment
> >         */
> >     public Location locInAlignment(Object label);
> >
> >         /**
> >         * Returns a list labels, of all seqs that cover that column
> >         */
> >     public List labelsAt(int column);
> >
> >         /**
> >         * Returns list of all the labels that intersect that range
> >         */
> >     public List labelsInRange(Location loc);
> >
> > Another is support for QualitativeSymbolLists. That would have
> >
> >         /**
> >         * Returns a quality score for label/position
> >         */
> >         public List qualityAt(Object label,int column);
> >
> > I think that the unequal length methods should be added to
> > the Alignment interface, they would be simple to implement in
> > SimpleAlignment. One question; what should be the behavior of
> > symbolAt() when the column is in range of the total alignment
> > but not within the individual sequence? I suggest it should
> > return null rather than throwing an error. Another possibilty
> > would be to have a new Symbol (NullSymbol, or SpaceSymbol )
> > similar to GappedSymbol. I think this woud be better than
> > having to always try to check that it is in range before
> > calling symbolAt().
> >
> > Perhaps we could add new interfaces.
> >
> > QualitativeAlignment
> >
> > SequenceAlignment
> > several posibilites including making it implement
> > FeatureHolder,  and or allow individual sequences to be
> > Sequences perhaps with a method featuresAt(Object label,
> > Location range);
> >
> > EditableAlignment
> > 	remove (Object label)
> > 	add (Object label,SymbolList seq, Location
> > referenceLocation) -- and perhaps other sigs
> > 	addGap (List labels, Location range, int length)
> > 	removeGap (List labels, Location range, int length)
> > 	shiftBase (List labels, Location range, int length)
> >
> > Any other suggestions?
> >
> >     David
> >
> > Bug note: There is currently a problem with SimpleAlignment.
> > seqString() does not work, perhaps due to changes a few
> > months ago with tokenization
> >
> > Exception in thread "main" java.util.NoSuchElementException:
> > There is no tokenization 'token' defined in alphabet (DNA x DNA)
> >         at
> > org.biojava.bio.symbol.AbstractAlphabet.getTokenization(Abstra
> ctAlphabet.jav
> > a:96)
> >         at
> > org.biojava.bio.symbol.AbstractSymbolList.seqString(AbstractSy
> mbolList.java:
> > 80)
> >         at SimpleAlignmentTest.main(SimpleAlignmentTest.java:33)
> >
> >
> > Does Alignment need to use a CrossProduct alphabet?
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: biojava-l-admin@biojava.org
> > > [mailto:biojava-l-admin@biojava.org]On Behalf Of Matthew Pocock
> > > Sent: Friday, April 26, 2002 8:18 AM
> > > To: 阿俗
> > > Cc: biojava-l@biojava.org
> > > Subject: Re: [Biojava-l] Functions Requirement...
> > >
> > >
> > > 阿俗 wrote:
> > > > Dear Sir,
> > > >
> > > >      How to implement "Multiple Sequence Alignment" or
> > "Phylogenetic
> > > > tree" in BioJava?
> > > >      I cannot find any related function in online documents....
> > > >
> > > >
> > > >
> > > >
> >       Jim
> > >
> > > Hi Jim,
> > >
> > > There is no direct support for phylogenetic trees currently in
> > > BioJava. It would be a great thing to see added. We do have some
> > > support for alignments, via the org.biojava.bio.symbol.Alignment
> > > class. However, there are no well developed utilities or
> > support code
> > > for making alignments realy easy to work with. In particular,
> > > Alignment needs modifying to allow easy addition/removal of
> > sequences
> > > from the alignment, and we need to add an easy to use
> > > AlignmentSequence class so that you can annotate columns of an
> > > aligment as features.
> > >
> > > You can insert gaps into a view of an underlying ungapped
> > > sequence/symbol list using the GappedSymbolList and GappedSequence
> > > classes. You can then build an alignment object from these gapped
> > > views to get gapped alignemnts.
> > >
> > > The org.biojava.bio.dp package is a starting point for developing
> > > alignment algorithms. So far it only has alignments of one and two
> > > sequences to a model implemented, but the APIs do support
> > symultaneous
> > > alignment of arbitrarily many sequences to a model.
> > >
> > > This is an area that needs work and documentation. Does
> > anybody else
> > > on the list make alignments as part of their daily work?
> > >
> > > Matthew
> > >
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l@biojava.org
> > > http://biojava.org/mailman/listinfo/biojava-l
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l@biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> >
>
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l