[Biojava-l] Functions Requirement...

David Waring dwaring@u.washington.edu
Fri, 26 Apr 2002 13:42:42 -0700


Funny that this comes up now. I am currently working on some new Alignment
classes. I will be supporting alignments of unequal length. I think this
might be at time to discuss additions to the API.

 In addition to the functions Mathew mentioned include support for
UnequalLengthAlignments as I am working on. I see at least 3 new methods

        /**
        * The location of an individual SymbolList relative to overall
Alignment
        */
    public Location locInAlignment(Object label);

        /**
        * Returns a list labels, of all seqs that cover that column
        */
    public List labelsAt(int column);

        /**
        * Returns list of all the labels that intersect that range
        */
    public List labelsInRange(Location loc);

Another is support for QualitativeSymbolLists. That would have

        /**
        * Returns a quality score for label/position
        */
        public List qualityAt(Object label,int column);

I think that the unequal length methods should be added to the Alignment
interface, they would be simple to implement in SimpleAlignment. One
question; what should be the behavior of symbolAt() when the column is in
range of the total alignment but not within the individual sequence? I
suggest it should return null rather than throwing an error. Another
possibilty would be to have a new Symbol (NullSymbol, or SpaceSymbol )
similar to GappedSymbol. I think this woud be better than having to always
try to check that it is in range before calling symbolAt().

Perhaps we could add new interfaces.

QualitativeAlignment

SequenceAlignment
several posibilites including making it implement FeatureHolder,  and or
allow individual sequences to be Sequences perhaps with a method
featuresAt(Object label, Location range);

EditableAlignment
	remove (Object label)
	add (Object label,SymbolList seq, Location referenceLocation) -- and
perhaps other sigs
	addGap (List labels, Location range, int length)
	removeGap (List labels, Location range, int length)
	shiftBase (List labels, Location range, int length)

Any other suggestions?

    David

Bug note: There is currently a problem with SimpleAlignment. seqString()
does not work, perhaps due to changes a few months ago with tokenization

Exception in thread "main" java.util.NoSuchElementException: There is no
tokenization 'token' defined in alphabet (DNA x DNA)
        at
org.biojava.bio.symbol.AbstractAlphabet.getTokenization(AbstractAlphabet.jav
a:96)
        at
org.biojava.bio.symbol.AbstractSymbolList.seqString(AbstractSymbolList.java:
80)
        at SimpleAlignmentTest.main(SimpleAlignmentTest.java:33)


Does Alignment need to use a CrossProduct alphabet?




> -----Original Message-----
> From: biojava-l-admin@biojava.org
> [mailto:biojava-l-admin@biojava.org]On Behalf Of Matthew Pocock
> Sent: Friday, April 26, 2002 8:18 AM
> To: ŞüĞU
> Cc: biojava-l@biojava.org
> Subject: Re: [Biojava-l] Functions Requirement...
>
>
> ŞüĞU wrote:
> > Dear Sir,
> >
> >      How to implement "Multiple Sequence Alignment" or "Phylogenetic
> > tree" in BioJava?
> >      I cannot find any related function in online documents....
> >
> >
> >
> >                                                                 Jim
>
> Hi Jim,
>
> There is no direct support for phylogenetic trees currently in BioJava.
> It would be a great thing to see added. We do have some support for
> alignments, via the org.biojava.bio.symbol.Alignment class. However,
> there are no well developed utilities or support code for making
> alignments realy easy to work with. In particular, Alignment needs
> modifying to allow easy addition/removal of sequences from the
> alignment, and we need to add an easy to use AlignmentSequence class so
> that you can annotate columns of an aligment as features.
>
> You can insert gaps into a view of an underlying ungapped
> sequence/symbol list using the GappedSymbolList and GappedSequence
> classes. You can then build an alignment object from these gapped views
> to get gapped alignemnts.
>
> The org.biojava.bio.dp package is a starting point for developing
> alignment algorithms. So far it only has alignments of one and two
> sequences to a model implemented, but the APIs do support symultaneous
> alignment of arbitrarily many sequences to a model.
>
> This is an area that needs work and documentation. Does anybody else on
> the list make alignments as part of their daily work?
>
> Matthew
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l