[Biojava-l] Distributions over infinite Alphabets

Schreiber, Mark mark.schreiber at agresearch.co.nz
Thu Apr 17 23:19:15 EDT 2003

```Hi -

The Double Alphabet doesn't have the required method. It would be useful to be able to make a distribution and specify that all Symbols in the ambiguity [0.0 to 1.0] have probability 0.3 etc. Eg it would be useful to be able to train the distribution.

Equally useful would be a method to construct a (for example) Gaussian distribution. This seems simpler as you could just implement Distribution, make a factory or constructor that took the mean and std dev and returns the appropriate stuff when asked to sample a Symbol. Getting the weight of the Symbol sensibly would still need the ambiguity Symbol.

I'm not too clear on how you would make such a Symbol as it technically contains an infinite number of AtomicSymbols.

- Mark

-----Original Message-----
From: Matthew Pocock [mailto:matthew_pocock at yahoo.co.uk]
Sent: Thu 17/04/2003 8:24 p.m.
To: Schreiber, Mark; biojava-l at biojava.org
Cc:
Subject: Re: [Biojava-l] Distributions over infinite Alphabets

The distribution interface is a bit of a misnomer - I
guess what we wanted was the integral over a PDF, but
because we nearly always used descrete alphabets,
nobody cared.

So - the short answer is that you should be geting the
probability of an ambiguity symbol over [0.0 ..
1.0]and the distribution impl should be integrating
PDF out over that range e.g. a gausian or something.

Does DoubleAlphabet have methods to make these kinds
of ambiguities? If not we need to add it.

Matthew

--- "Schreiber, Mark"
<mark.schreiber at agresearch.co.nz> wrote:
> Hi -
>
> Currently you can make a Distribution over (for
> example) the Double alphabet and you can train it or
> assign a weight to a value (eg the probability of
> the 2.0 Symbol could be set to 0.5).
>
> Can anyone think of a way to represent a probability
> density as a Distribution? For example you may want
> to set the probability of seeing a value between 0
> and 1.0 to be 0.8. This is a bit tricky as there are
> an infinite number of values between 0 and 1.0 and
> the value of getting exactly 0.237865765 would be
> infinitely small.
>
> Would this best be represented using something other
> than a Distribution?
>
> - Mark
>
>
>
=======================================================================
> Attention: The information contained in this message
> and/or attachments
> from AgResearch Limited is intended only for the
> persons or entities
> to which it is addressed and may contain
> confidential and/or privileged
> material. Any review, retransmission, dissemination
> or other use of, or
> taking of any action in reliance upon, this
> information by persons or
> entities other than the intended recipients is
> prohibited by AgResearch
> Limited. If you have received this message in error,
> sender immediately.
>
=======================================================================
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l

__________________________________________________
Yahoo! Plus
For a better Internet experience
http://www.yahoo.co.uk/btoffer

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

```