[Biojava-l] Alignments

Matthew Pocock matthew_pocock@yahoo.co.uk
Wed, 15 May 2002 12:29:29 +0100


David Waring wrote:
> I presume that this is api-short-hand for inserting gaps before or after
> a particular sequence in the alignment? This would be functionally
> equivalent to using a gapped sequence to view the alignment component
> and adding gaps at 0 or at length.
> 
> This is not exactly the same as gaps before and after. Take for instance an
> alignment of a small clone, and a larger reference sequence. The clone has a
> finite length shorter than the chromosome, so at the ends there is not a
> gap, it is just null. We could come up with a new Symbol to represent this
> but I think null works.

Ah, I see your point. The algebraeicly correct thing to do (since the 
whole symbol thing has algebraeicly correct ways to do things) is to 
ensure that internal gaps are represented by alpha.getGapSymbol(), but 
that leading/trailing gaps are represented by 
Alphabet.EMPTY_ALPHABET.getGapSymbol(). The former represents a gap that 
is the same shape as the symbols in the alphabet. For example, if the 
sequence was codons, then the gap would be a basis symbol (_, _, _). For 
the case of DNA, it would be (_). The gap obtained from EMPTY_ALPHABET 
represents a dimensionless empty set of symbols, we could represent it 
as () in our hokey notation. So, when you see alignments that use '~' 
for leading/trailing gaps and '-' for internal gaps, they should resolve 
to () and (-) respectively.

Returning null here will cause things like DP to fall over. Returning 
the empty gap will allow everything to work like a dream. Trust me, my 
PhD examiners did in the end ;-)


> 
>> Could you explain what Qualitative means?
> 
> 
> Qualitative is defined in biojava.bio.program.phred package. In the case of
> PhredSequence it represents the quality score given by Phred or Phrap. there
> is just one method qualityAt().


I will check this out. Quality scores are the sort of thing that the 
integer alphabet is meant to be used for, so I will see how the phred 
API shapes up to how Thomas and I had envisioned data being represented. 
It is a corner of the library that I have never visited before.

> 
> What methods would the gapped interface contain? I would be happy to
> make GappedSymbolList an interface and add a SimpleGappedSymbolList.
> Perhaps that would make people mad. The idea of GappedSymbolList was
> that it wrapped another symbol list, adding the ability to view it with
> gaps. GappedSequence does the same for a Sequence instance, and takes
> care of projecting features from un-gapped to gapped coordinates. Both
> classes have (or should have) methods to fetch the underlying object
> being viewed. Perhaps the need for your gapped interface goes away if we
> have a generic 'View' interface, and code would walk down the decorating
> views untill they hit one that has the funcitonality they want. Grr.
> Sometimes I don't like OOP.
> 
> 
> The gapped interface would contain the methods in GappedSymbolList. I see
> that we now have GappedSequence which is what I am after. But we also have
> GappedPhrepSequence, which with the exception of the capitolization of
> method names could implement the gapped interface. I suspect that renaming
> GappedSymbolList would cause a bunch of headaches so a different name for
> the interface might be in order
> 

Again, I will take a look at GappedPhredSequence and see if it can't be 
refactored as a gapped view of a phred sequence. Do any phred users have 
views?

> I understand the forwarder stuff, but my real question is whether a listener
> should be given one changeEvent that encompasses all the predicted changes
> or let them all come as they will. In the later case, one change to the
> alignment could give many changes to the listeners so it could trigger
> multiple redraws of a window, instead of a single redraw when all the
> changes were finished.


In swing, the repainting is batched so that multiple requests to repaint 
a component will cause one repaint if they happen within a short space 
of time. Could you give me an example of an event cascade, or where one 
method causes many changes?


> I guess I also have a question about the ChangeEvent itself. Is it supposed
> to hold all the information necessary for the listener to replicate the
> changes??? In the simplest case the ChangeEvent can describe the changes
> that are going to be made, so a Listener could reflect those changes. But in
> the case of more complicated changes, it is impractial to try to explain all
> the changes to the listener, but rather it seems to me, the listeners should
> be alerted that changes are being made, they can Veto them if the wish but,
> after they receive the postChange, they should look back at the event source
> to see what has happened.



You can leave the change and previous properties null if you wish. This 
means that some unknown modification will take place. In the case of 
cascading changes, the new change event will refer back to the 
underlying cause, so doesn't need to have change or previous set. If you 
do provide change or previous property values, then they should reflect 
the change taking place.

The new changeable stuff actualy formalises this further so that the 
combination of a ChangeEvent and an instance of PropertyChanger is all 
that is needed to make the modification. This, combined with explicit 
publication of listeners and forwarders allows a utility method to 
compute what Thomas called a Markov-Blanket - a finite state machine 
graph of the event cascade and listeners that would be generated. This 
could be used to ensure that listeners are notified the minimum number 
of necisary times of updates, as well as working out dynamic units for 
transactions.

But, untill I get a shiney new PC to design this on, it will sit on a 
branch gathering dust (my laptop screen is so small you can't see a 
sensible amout of code at any one time).

Matthew