[Biojava-l] adding toSequenceIterator method for Alignment
Kalle Näslund
kalle.naslund@genpat.uu.se
Tue, 30 Jul 2002 16:36:28 +0200
Singh, Nimesh wrote:
>Given any implementation of an alignment, there is no guarantee that the alignment will have links to the original underlying sequences.
>
I know this might be a bit finicky, and that you are most likely already
aware of this. But, An alignment
doesnt necesarily hold Sequences, so there might be no underlying
sequences. An Alignment is built up
from SymbolLists and is itself a SymbolList.
Basicly, the thing i am no comfortable with is the addition of the
SequenceIterator sequenceIterator() method
to the Alignment Interface. If you take a look at som other interfaces,
say SymbolList, Sequence etc you will see
that they only contain a small sett of methods and properties. The
methods and properties in the interface is
what makes an object into an object of that particular type. It could be
seen as the smallest common
set of functionality needed for being a Sequences, a SymbolLists etc.
The trick is to keep these "core" intefaces
as simple and as generic as possible, so no one gets restricted by them,
and can write their own implementations
that then nicely interacts with the rest of biojava.
The functionalily of the SequenceInterator sequenceIterator() method can
be broken down into two parts as i
see it.
1) Allow access to SymbolLists in the alignment via an Iterator object.
2) Construct SimpleSequence objects from SymbolLists.
The first point is a nice addition imho. You can of course first get all
the labels, and then iterate over those
and call SymbolList getSymbolListForLabel( Object label ) but this is
simpler, and in many situations you
just want an Iterator anyway. This doesnt realy add anything to what an
Alignment is, it just adds a method
to let you access the properties of an alignment, in a slightly
different way.
The second part is the thing i dont like. To me, the core properties
that defines what an Alignment is, doesnt
contain the functionality of being able to make SimpleSequences out of
SymbolLists. And should therefore
not be included in the Alignment interface ( as that should only define
the most basal basic things that
all Alignments have in common ).
Biojava contains many types of SymbolLists, that all can be used for
alignments, things that i remember out
of my head are SimpleSymbolList, GappedSymbolList, SimpleSequence,
GappedSequence, SimpleAlignment
and the list will grow more.
So then when people that dont work with SimpleSequences but with Say
GappedSeqeunces
or SimpleAlignments need the same type of conversion functionality, you
are in for a problem, either you add
yet more to the "core" Alignment interface, forcing every alignment
implementation to contain lots of
conversion code between different types of SymbolLists. This is also
nasty because you need to use a
different API for different types of Symbollists ( you need to call a
different Iterator function that gives you a
GappedSequenceIterator, or a SimpleAlignmentIterator ) so it becomes
much harder to switch from one type
of SymbolList to another.
Of course you can refuse to have any more SequenceIterator code in the
"core" Alignment interface, then
that type of code will have to go somwhere else, and in the end you have
code basicly doing the same things, but
spread over different locations in the source tree, making it harder to
maintain and keep track off.
To summarise the whole rant.
1) I do think its appropriate to add a method to the Alignment
interface that allows you to get hold of an iterator,
that you can sure to iterate over the SymbolLists in the alignment.
2) I dont think its appropriate to add a SequenceIterator to the
"core" Alignment interface. As conversion/
construction functionality doesnt realy belong in the simple
"core" interfaces. I wouldnt think that
adding toSequence() in the SymbolList interface would be
appropriate either, or adding
toGappedSequence in the Sequence interfacce.
I understand that you find SimpleSequence construction functionality
nice, so what i would suggest would be
somthing like.
1) change the sequenceIterator method into a symbolListIterator method
2) then just have your implementations of the AlignmentInterface
implement the sequenceIterator method
or, add that sequenceIterator method to some of the other, "more
advanced" alignment
interfaces, just dont put it in the "core" Alignment interface.
Or you could make something like SimpleSequenceAlignment that
extends Alignment and contains the
extra functionality you want.
> I think that the AlignmentSequenceIterator can only use public functions from the Alignment interface. This ensures that it will work for any implementation. We can replace the sequenceIterator implementation in individual Alignment implemenations to return the underlying Sequences. I'll look into that.
>
>
>The reason that the new method returns a SequenceIterator is because that is an already existing commonly used iterator. There are other methods (usually symbolListForLabel()) already in the interface if you use SymbolLists that don't readily translate into Sequences.
>
Yes, but by adding the sequenceIterator() method to the basic Alignment
interface,
EVERYONE writing an Alignment class needs to implement this method even
if the SymbolLists
it contains doesnt readily translate into Sequences. So they will then
either return some strange
types of sequences, or they might just be forced to throw an Exception
or just dont return a thing.
And to me, that signals, that the method dont realy belong in the
simple "core" Alignment interface.
As it should only contain the absolute smallest set of methods that are
needed to define the core
capabilities of an Alignment, and all those "core"methods should be able
to return valid results from all
types of Alignments.
>
>So, I'll look at making more specific implementations for sequenceIterator that return underlying sequences.
>
>Nimesh
>
>
>
> >
> >Here is the cod for AlignmentSequenceIterator:
> >
> >public class AlignmentSequenceIterator implements SequenceIterator {
> > private Alignment align;
> > private Iterator labels;
> > private SequenceFactory sf;
> > public AlignmentSequenceIterator(Alignment align) {
> > this.align = align;
> > labels = align.getLabels().iterator();
> > sf = new SimpleSequenceFactory();
> > }
> > public boolean hasNext() {
> > return labels.hasNext();
> > }
> > public Sequence nextSequence() throws NoSuchElementException, BioException {
> > if (!hasNext()) {
> > throw new NoSuchElementException("No more sequences in the alignment.");
> > }
> > else {
> > try {
> > Object label = labels.next();
> > SymbolList symList = align.symbolListForLabel(label);
> > Sequence seq = sf.createSequence(symList, label.toString(), label.toString(), null);
> > return seq;
> > } catch (Exception e) {
> > throw new BioException(e, "Could not read sequence");
> > }
> > }
> > }
> >}
> >_______________________________________________
> >Biojava-l mailing list - Biojava-l@biojava.org
> >http://biojava.org/mailman/listinfo/biojava-l
> >
> >
>
>
>
>
>
>
>
>
>