[Biojava-l] Functions Requirement...

Schreiber, Mark mark.schreiber@agresearch.co.nz
Tue, 30 Apr 2002 14:32:36 +1200


Hi -

I have implemented this and checked it in. Currently the subalphabet implementation extends SimpleAlphabet which constructs the whole alphabet and could get a bit heavy weight for more than 100 or so symbols. When the Alphabet is constructed via the IntegerAlphabet.getSubAlphabet(int min, int max) the AlphabetManager is checked for a pre existing copy. If it is not found then the Alphabet is constructed and registered with the AlphabetManager.

- Mark


> -----Original Message-----
> From: Matthew Pocock [mailto:matthew_pocock@yahoo.co.uk] 
> Sent: Monday, 29 April 2002 10:40 p.m.
> To: Schreiber, Mark
> Cc: David Waring; biojava
> Subject: Re: [Biojava-l] Functions Requirement...
> 
> 
> Hi All,
> 
> We should look at easy ways to make finite sub-sets of the 
> common infinite alphabets play well e.g. give me the alphabet 
> Integer[1..100], ensuring that it implements FiniteAlphabet 
> and therefore behaves in cross-products efficiently.
> 
> I think for integers, it would be a fairly trivial addition 
> (just one public method on IntegerAlphabet and one private 
> static class).
> 
> Matthew
> 
> Schreiber, Mark wrote:
> > I like the API.
> > 
> > I am also intrigued by the idea of a QualitativeAlignment. I assume 
> > you would use it for EST assemblies. Inspite of it being an 
> assembly 
> > it may well be better represented as an alignment. 
> Therefore, if it is 
> > an Alignment the QualitativeAlignment could be a sub-interface of 
> > UnequalLenthAlignment. There is also the question of what should be 
> > aligned. For example the PhredSequence holds two symbol lists so do 
> > you align the quality symbol list or the sequence or both?
> > 
> > The problem is caused by the fact that the quality information is 
> > represented as an Integer alphabet which is infinite and a DNA 
> > alphabet which is Finite. The equation to calculate the 
> phred score is 
> > QV = - 10 * log_10( P_e ) where P_e is the probability that 
> the base 
> > call is an error. Hence the lower bound is 0 where P_e is 1 
> while the 
> > upper bound is infinite. However realistically a sequencer 
> could never 
> > approach P_e of > 0.00001 which is a phred score of 50 (a very 
> > conservative estimate). Thus a fininte alphabet could be made and a 
> > cross product alphabet used instead? Can anyone see a 
> reason why this 
> > might be a bad thing?
> > 
> > Do people have views on whether a EST contig assembly is best 
> > represented as an Alignment or an Assembly?
> > 
> > Mark
> > 
> > 
> > 
> > 
> >>-----Original Message-----
> >>From: David Waring [mailto:dwaring@u.washington.edu]
> >>Sent: Saturday, 27 April 2002 8:43 a.m.
> >>To: biojava
> >>Subject: RE: [Biojava-l] Functions Requirement...
> >>
> >>
> >>Funny that this comes up now. I am currently working on some
> >>new Alignment classes. I will be supporting alignments of 
> >>unequal length. I think this might be at time to discuss 
> >>additions to the API.
> >>
> >> In addition to the functions Mathew mentioned include
> >>support for UnequalLengthAlignments as I am working on. I see 
> >>at least 3 new methods
> >>
> >>        /**
> >>        * The location of an individual SymbolList relative
> >>to overall Alignment
> >>        */
> >>    public Location locInAlignment(Object label);
> >>
> >>        /**
> >>        * Returns a list labels, of all seqs that cover that column
> >>        */
> >>    public List labelsAt(int column);
> >>
> >>        /**
> >>        * Returns list of all the labels that intersect that range
> >>        */
> >>    public List labelsInRange(Location loc);
> >>
> >>Another is support for QualitativeSymbolLists. That would have
> >>
> >>        /**
> >>        * Returns a quality score for label/position
> >>        */
> >>        public List qualityAt(Object label,int column);
> >>
> >>I think that the unequal length methods should be added to
> >>the Alignment interface, they would be simple to implement in 
> >>SimpleAlignment. One question; what should be the behavior of 
> >>symbolAt() when the column is in range of the total alignment 
> >>but not within the individual sequence? I suggest it should 
> >>return null rather than throwing an error. Another possibilty 
> >>would be to have a new Symbol (NullSymbol, or SpaceSymbol ) 
> >>similar to GappedSymbol. I think this woud be better than 
> >>having to always try to check that it is in range before 
> >>calling symbolAt().
> >>
> >>Perhaps we could add new interfaces.
> >>
> >>QualitativeAlignment
> >>
> >>SequenceAlignment
> >>several posibilites including making it implement
> >>FeatureHolder,  and or allow individual sequences to be 
> >>Sequences perhaps with a method featuresAt(Object label, 
> >>Location range);
> >>
> >>EditableAlignment
> >>	remove (Object label)
> >>	add (Object label,SymbolList seq, Location
> >>referenceLocation) -- and perhaps other sigs
> >>	addGap (List labels, Location range, int length)
> >>	removeGap (List labels, Location range, int length)
> >>	shiftBase (List labels, Location range, int length)
> >>
> >>Any other suggestions?
> >>
> >>    David
> >>
> >>Bug note: There is currently a problem with SimpleAlignment.
> >>seqString() does not work, perhaps due to changes a few 
> >>months ago with tokenization
> >>
> >>Exception in thread "main" java.util.NoSuchElementException:
> >>There is no tokenization 'token' defined in alphabet (DNA x DNA)
> >>        at 
> >>org.biojava.bio.symbol.AbstractAlphabet.getTokenization(Abstra
> > 
> > ctAlphabet.jav
> > 
> >>a:96)
> >>        at 
> >>org.biojava.bio.symbol.AbstractSymbolList.seqString(AbstractSy
> > 
> > mbolList.java:
> > 
> >>80)
> >>        at SimpleAlignmentTest.main(SimpleAlignmentTest.java:33)
> >>
> >>
> >>Does Alignment need to use a CrossProduct alphabet?
> >>
> >>
> >>
> >>
> >>
> >>>-----Original Message-----
> >>>From: biojava-l-admin@biojava.org
> >>>[mailto:biojava-l-admin@biojava.org]On Behalf Of Matthew Pocock
> >>>Sent: Friday, April 26, 2002 8:18 AM
> >>>To: 阿俗
> >>>Cc: biojava-l@biojava.org
> >>>Subject: Re: [Biojava-l] Functions Requirement...
> >>>
> >>>
> >>>阿俗 wrote:
> >>>
> >>>>Dear Sir,
> >>>>
> >>>>     How to implement "Multiple Sequence Alignment" or
> >>>
> >>"Phylogenetic
> >>
> >>>>tree" in BioJava?
> >>>>     I cannot find any related function in online documents....
> >>>>
> >>>>
> >>>>
> >>>>                                                          
> >>>
> >>      Jim
> >>
> >>>Hi Jim,
> >>>
> >>>There is no direct support for phylogenetic trees currently in
> >>>BioJava. It would be a great thing to see added. We do have some 
> >>>support for alignments, via the org.biojava.bio.symbol.Alignment 
> >>>class. However, there are no well developed utilities or 
> >>
> >>support code
> >>
> >>>for making alignments realy easy to work with. In particular,
> >>>Alignment needs modifying to allow easy addition/removal of 
> >>
> >>sequences
> >>
> >>>from the alignment, and we need to add an easy to use
> >>>AlignmentSequence class so that you can annotate columns of an 
> >>>aligment as features.
> >>>
> >>>You can insert gaps into a view of an underlying ungapped
> >>>sequence/symbol list using the GappedSymbolList and GappedSequence 
> >>>classes. You can then build an alignment object from these gapped 
> >>>views to get gapped alignemnts.
> >>>
> >>>The org.biojava.bio.dp package is a starting point for developing
> >>>alignment algorithms. So far it only has alignments of one and two 
> >>>sequences to a model implemented, but the APIs do support 
> >>
> >>symultaneous
> >>
> >>>alignment of arbitrarily many sequences to a model.
> >>>
> >>>This is an area that needs work and documentation. Does
> >>
> >>anybody else
> >>
> >>>on the list make alignments as part of their daily work?
> >>>
> >>>Matthew
> >>>
> >>>_______________________________________________
> >>>Biojava-l mailing list  -  Biojava-l@biojava.org
> >>>http://biojava.org/mailman/listinfo/biojava-l
> >>
> >>_______________________________________________
> >>Biojava-l mailing list  -  Biojava-l@biojava.org
> >>http://biojava.org/mailman/listinfo/biojava-l
> >>
> > 
> > 
> > 
> ======================================================================
> > =
> > Attention: The information contained in this message and/or 
> attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential 
> and/or privileged
> > material. Any review, retransmission, dissemination or 
> other use of, or
> > taking of any action in reliance upon, this information by 
> persons or
> > entities other than the intended recipients is prohibited 
> by AgResearch
> > Limited. If you have received this message in error, please 
> notify the
> > sender immediately.
> > 
> ==============================================================
> =========
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l@biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> > 
> 
> 
> 
> 

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================