[Biojava-dev] Count.java, Distribution.java and Alignment.java
Matthew Pocock
matthew_pocock at yahoo.co.uk
Fri Feb 14 21:01:54 EST 2003
Lachlan Coin wrote:
> I just had a few comments about these interfaces, which would make them
> easier/more efficient for me to use.
Great
>
> It would be great if both these interfaces enforced a nonZeroSymbols()
> method which returned the set of symbols have a non-zero count /
> probability respectively. Particularly if you are working with sparse
> counts over high dimensional cross-product alphabets, it seems pretty
> inefficient to iterate through all the members of a cross-product
> alphabet when only a small fraction of these have counts. This also
> relates to storage - it would be good to have a DistributionFactory that
> could create sparse distributions.
Sparcity would be a good thing. Feel free to write an implementation
that does this and come up with a method name and signature and return
type for nonZeroSymbols(). We can fold the impl in behind the factory
interface so that people don't know about it.
>
> Also, this is more minor, but Count uses doubles rather than integers,
> which is certainly more flexible, but would seem to take more memory. Is
> this flexibility needed - isn't Distribution supposed to be for this?
It is used in the training of HMMS. During the forwards-backwards step
of parameter estimation, counts are added in proportion to the
probability that parameters are used. It is actualy quite senestive to
these numbers being correct - if they are rounded too much either way,
the models do very strange things (like fitting the data worse and worse
each itteration).
>
>
> Finally, in Alignment.java, there are two methods, which use
> inconsitent container classes for the labels of the alignment.
>
> java.util.List getLabels()
>
> Alignment subAlignment(java.util.Set labels, Location loc)
>
> so that to get a subAlignment over all labels, you have to convert a List
> to a Set.
I think it's subAlignment that should be a List. Unfortunately, we can't
fix this right now as it would change the API while we're trying to get
the 1.3 release out of the door. Once 1.3 hits the street, feel free to
change it (leave the old method in with a @deprecated for a while).
Matthew
>
>
> Thanks,
>
> Lachlan
>
> -------------------------------------------------------------
> Lachlan Coin
> Wellcome Trust Sanger Institute Magdalene College
> Cambridge CB10 1SA Cambridge CB30AG
> Ph: +44 1223 494 820
> Fax: +44 1223 494 919
> ------------------------------------------------------------
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
>
--
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk
More information about the biojava-dev
mailing list