[Biojava-l] adding toSequenceIterator method for Alignment
Keith James
kdj@sanger.ac.uk
31 Jul 2002 12:22:24 +0100
>>>>> "Kalle" == =?UTF-8?B?S2FsbGUgTsOkc2x1bmQ=?= <UTF-8> writes:
[...]
Kalle> The second part is the thing i dont like. To me, the core
Kalle> properties that defines what an Alignment is, doesnt
Kalle> contain the functionality of being able to make
Kalle> SimpleSequences out of SymbolLists. And should therefore
Kalle> not be included in the Alignment interface ( as that should
Kalle> only define the most basal basic things that all Alignments
Kalle> have in common ).
I agree with Kalle here. It also violates the iterator contract
because it doesn't actually iterate over the existing elements, but
creates new objects and returns those. If you iterate twice over the
same Alignment you would expect to get the same objects back the
second time that you did the first (barring addition/removal). This
doesn't happen here. You can no longer compare their references with
== between iterations, which you should be able to do. I think this
counts as a bug.
It also creates a nasty effect of continually wrapping every
SymbolList in another object each time it is iterated over and then
added to another Alignment.
[...]
Kalle> To summarise the whole rant.
Kalle> 1) I do think its appropriate to add a method to the
Kalle> Alignment interface that allows you to get hold of an
Kalle> iterator, that you can sure to iterate over the SymbolLists
Kalle> in the alignment.
Yes, that would be a valuable addition.
Kalle> 2) I dont think its appropriate to add a SequenceIterator
Kalle> to the "core" Alignment interface. As conversion/
Kalle> construction functionality doesnt realy belong in the
Kalle> simple "core" interfaces. I wouldnt think that adding
Kalle> toSequence() in the SymbolList interface would be
Kalle> appropriate either, or adding toGappedSequence in the
Kalle> Sequence interfacce.
Kalle> I understand that you find SimpleSequence construction
Kalle> functionality nice, so what i would suggest would be
Kalle> somthing like.
Kalle> 1) change the sequenceIterator method into a
Kalle> symbolListIterator method 2) then just have your
Kalle> implementations of the AlignmentInterface implement the
Kalle> sequenceIterator method or, add that sequenceIterator
Kalle> method to some of the other, "more advanced" alignment
Kalle> interfaces, just dont put it in the "core" Alignment
Kalle> interface. Or you could make something like
Kalle> SimpleSequenceAlignment that extends Alignment and contains
Kalle> the extra functionality you want.
I like 1) and the suggestion of a SequenceAlignment interface.
>> I think that the AlignmentSequenceIterator can only use public
>> functions from the Alignment interface. This ensures that it
>> will work for any implementation. We can replace the
>> sequenceIterator implementation in individual Alignment
>> implemenations to return the underlying Sequences. I'll look
>> into that.
>>
>> The reason that the new method returns a SequenceIterator is
>> because that is an already existing commonly used iterator.
>> There are other methods (usually symbolListForLabel()) already
>> in the interface if you use SymbolLists that don't readily
>> translate into Sequences.
It's commonly used, but not appropriate for this case for the reasons
Kalle has given. I think the suggested compromise is a pretty good
one.
Keith
--
- Keith James <kdj@sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -