[Biojava-l] adding toSequenceIterator method for Alignment

Kalle Näslund kalle.naslund@genpat.uu.se
Tue, 30 Jul 2002 16:36:28 +0200


Singh, Nimesh wrote:

>Given any implementation of an alignment, there is no guarantee that the alignment will have links to the original underlying sequences.
>

I know this might be a bit finicky, and that you are most likely already 
aware of this. But, An alignment
doesnt necesarily hold Sequences,  so there might be no underlying 
sequences. An Alignment is built up
from SymbolLists and is itself a SymbolList.

Basicly, the thing i am no comfortable with is the addition of the 
SequenceIterator sequenceIterator() method
to the Alignment Interface. If you take a look at som other interfaces, 
say  SymbolList, Sequence etc you will see
that they only contain a small sett of methods and properties. The 
methods and properties  in the interface is
what makes an object into an object of that particular type. It could be 
seen as the smallest common
set of functionality needed for being a Sequences, a SymbolLists etc. 
The trick is to keep these "core" intefaces
as simple and as generic as possible, so no one gets restricted by them, 
and can write their own implementations
that then nicely interacts with the rest of biojava.


The functionalily of the SequenceInterator sequenceIterator() method can 
be broken down into two parts as i
see it.

1)     Allow access to SymbolLists in the alignment via an Iterator object.
2)    Construct SimpleSequence objects from SymbolLists.

The first point is a nice addition imho. You can of course first get all 
the labels, and then iterate over those
and call SymbolList getSymbolListForLabel( Object label ) but this is 
simpler, and in many situations you
just want an Iterator anyway. This doesnt realy add anything to what an 
Alignment is, it just adds a method
to let you access the properties of an alignment, in a slightly 
different way.

The second part is the thing i dont like. To me, the core properties 
that defines what an Alignment is, doesnt
contain the functionality of being able to make SimpleSequences out of 
SymbolLists. And should therefore
not be included in the Alignment interface ( as that should only define 
the most basal basic things that
all Alignments have in common ).


Biojava contains many types of SymbolLists, that all can be used for 
alignments, things that i remember out
of my head are  SimpleSymbolList, GappedSymbolList, SimpleSequence, 
GappedSequence, SimpleAlignment
and the list will grow more.
So then when people that dont work with SimpleSequences but with Say 
GappedSeqeunces
or SimpleAlignments need the same type of conversion functionality, you 
are in for a problem, either you add
yet more to the "core" Alignment interface, forcing every alignment 
implementation to contain lots of
conversion code between different types of  SymbolLists. This is also 
nasty because you need to use a
different API for different types of Symbollists ( you need to call a 
different Iterator function that gives you a
GappedSequenceIterator, or a SimpleAlignmentIterator ) so it becomes 
much harder to switch from one type
of SymbolList to another.
Of course you can refuse to have any more SequenceIterator code in the 
"core" Alignment interface, then
that type of code will have to go somwhere else, and in the end you have 
code basicly doing the same things, but
spread over different locations in the source tree, making it harder to 
maintain and keep track off.

To summarise the whole rant.

1)     I do think its appropriate to add a method to the Alignment 
interface that allows you to get hold of an iterator,
        that you can sure to iterate over the SymbolLists in the alignment.

2)     I dont think its appropriate to add a SequenceIterator to the 
"core" Alignment interface. As conversion/
        construction functionality doesnt realy belong in the simple 
"core" interfaces. I wouldnt think that
        adding toSequence() in the SymbolList interface would be 
appropriate either, or adding
        toGappedSequence in the Sequence interfacce.

I understand that you find SimpleSequence construction functionality 
nice, so what i would suggest would be
somthing like.

1)     change the sequenceIterator method into a symbolListIterator method
2)    then just have your implementations of the AlignmentInterface 
implement the sequenceIterator method
        or, add that sequenceIterator method to some of the other, "more 
advanced" alignment
        interfaces, just dont put it in the "core" Alignment interface.
        Or you could make something like SimpleSequenceAlignment that 
extends Alignment and contains the
        extra functionality you want.

>  I think that the AlignmentSequenceIterator can only use public functions from the Alignment interface.  This ensures that it will work for any implementation.  We can replace the sequenceIterator implementation in individual Alignment implemenations to return the underlying Sequences.  I'll look into that.
>
> 
>The reason that the new method returns a SequenceIterator is because that is an already existing commonly used iterator.  There are other methods (usually symbolListForLabel()) already in the interface if you use SymbolLists that don't readily translate into Sequences.
>
Yes, but by adding the sequenceIterator() method to the basic Alignment 
interface,
EVERYONE writing an Alignment class needs to implement this method even 
if the SymbolLists
it contains doesnt readily translate into Sequences. So they will then 
either return some strange
types of sequences, or they might just be forced to throw an Exception 
or just dont return a thing.
And to me, that signals, that the method dont realy belong in the 
 simple "core" Alignment interface.
As it should only contain the  absolute smallest set of methods that are 
needed to define the core
capabilities of an Alignment, and all those "core"methods should be able 
to return valid results from all
types of Alignments.

> 
>So, I'll look at making more specific implementations for sequenceIterator that return underlying sequences.
> 
>Nimesh
>
>  
>

>	>
>	>Here is the cod for AlignmentSequenceIterator:
>	>
>	>public class AlignmentSequenceIterator implements SequenceIterator {
>	>    private Alignment align;
>	>    private Iterator labels;
>	>    private SequenceFactory sf;
>	>    public AlignmentSequenceIterator(Alignment align) {
>	>        this.align = align;
>	>        labels = align.getLabels().iterator();
>	>        sf = new SimpleSequenceFactory();
>	>    }
>	>    public boolean hasNext() {
>	>        return labels.hasNext();
>	>    }
>	>    public Sequence nextSequence() throws NoSuchElementException, BioException {
>	>        if (!hasNext()) {
>	>            throw new NoSuchElementException("No more sequences in the alignment.");
>	>        }
>	>        else {
>	>            try {
>	>                Object label = labels.next();
>	>                SymbolList symList = align.symbolListForLabel(label);
>	>                Sequence seq = sf.createSequence(symList, label.toString(), label.toString(), null);
>	>                return seq;
>	>            } catch (Exception e) {
>	>         throw new BioException(e, "Could not read sequence");
>	>     }
>	>        }
>	>    }
>	>}
>	>_______________________________________________
>	>Biojava-l mailing list  -  Biojava-l@biojava.org
>	>http://biojava.org/mailman/listinfo/biojava-l
>	> 
>	>
>	
>  
>
>	
>	
>	
>
>  
>