[Biojava-l] adding toSequenceIterator method for Alignment

Keith James kdj@sanger.ac.uk
31 Jul 2002 12:22:24 +0100


>>>>> "Kalle" == =?UTF-8?B?S2FsbGUgTsOkc2x1bmQ=?=  <UTF-8> writes:

[...]

    Kalle> The second part is the thing i dont like. To me, the core
    Kalle> properties that defines what an Alignment is, doesnt
    Kalle> contain the functionality of being able to make
    Kalle> SimpleSequences out of SymbolLists. And should therefore
    Kalle> not be included in the Alignment interface ( as that should
    Kalle> only define the most basal basic things that all Alignments
    Kalle> have in common ).

I agree with Kalle here. It also violates the iterator contract
because it doesn't actually iterate over the existing elements, but
creates new objects and returns those. If you iterate twice over the
same Alignment you would expect to get the same objects back the
second time that you did the first (barring addition/removal). This
doesn't happen here. You can no longer compare their references with
== between iterations, which you should be able to do. I think this
counts as a bug.

It also creates a nasty effect of continually wrapping every
SymbolList in another object each time it is iterated over and then
added to another Alignment.

[...]

   Kalle> To summarise the whole rant.

    Kalle> 1) I do think its appropriate to add a method to the
    Kalle> Alignment interface that allows you to get hold of an
    Kalle> iterator, that you can sure to iterate over the SymbolLists
    Kalle> in the alignment.

Yes, that would be a valuable addition.

    Kalle> 2) I dont think its appropriate to add a SequenceIterator
    Kalle> to the "core" Alignment interface. As conversion/
    Kalle> construction functionality doesnt realy belong in the
    Kalle> simple "core" interfaces. I wouldnt think that adding
    Kalle> toSequence() in the SymbolList interface would be
    Kalle> appropriate either, or adding toGappedSequence in the
    Kalle> Sequence interfacce.

    Kalle> I understand that you find SimpleSequence construction
    Kalle> functionality nice, so what i would suggest would be
    Kalle> somthing like.

    Kalle> 1) change the sequenceIterator method into a
    Kalle> symbolListIterator method 2) then just have your
    Kalle> implementations of the AlignmentInterface implement the
    Kalle> sequenceIterator method or, add that sequenceIterator
    Kalle> method to some of the other, "more advanced" alignment
    Kalle> interfaces, just dont put it in the "core" Alignment
    Kalle> interface.  Or you could make something like
    Kalle> SimpleSequenceAlignment that extends Alignment and contains
    Kalle> the extra functionality you want.

I like 1) and the suggestion of a SequenceAlignment interface.

    >> I think that the AlignmentSequenceIterator can only use public
    >> functions from the Alignment interface.  This ensures that it
    >> will work for any implementation.  We can replace the
    >> sequenceIterator implementation in individual Alignment
    >> implemenations to return the underlying Sequences.  I'll look
    >> into that.
    >> 
    >> The reason that the new method returns a SequenceIterator is
    >> because that is an already existing commonly used iterator.
    >> There are other methods (usually symbolListForLabel()) already
    >> in the interface if you use SymbolLists that don't readily
    >> translate into Sequences.

It's commonly used, but not appropriate for this case for the reasons
Kalle has given. I think the suggested compromise is a pretty good
one.

Keith

-- 

- Keith James <kdj@sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -