[Biojava-dev] bits of information

Lachlan Coin lc1 at sanger.ac.uk
Tue Jun 3 16:40:29 EDT 2003


The definitions are formal, and we all agree with the definition of
entropy.

Shannon's first coding theorem, tells us that
the entropy of an information source is equal to the minimum average
number of bits per symbol that must (and can in the limit) be used to
encode source outputs.  So, if I try to communicate to you (using binary
uniquely decipherable code) the outcome of sampling from a source X which
has  entropy H(X), then I must use at least H(X) bits per symbol (if I am
not to lose any information) and in the limit of transmiting N-> infinity
symbols, I can achieve an average H(X) bits per code.

Thus, H(X) - the entropy  - is a natural measure of the information
content of a distribution.  This is what the method is returning at the
moment.

Lachlan

On Tue, 3 Jun 2003, Francois Pepin wrote:

> I disagree on that one. The definitions are pretty formal and not based
> on intuition.
>
> Your definition of information there is actually the definition of
> entropy. Information is indeed maximum entropy - current entropy.
>
> A distribution with 100% A has a 0 entropy and maximal information (you
> always know what you're going to hit). An all 25% distribution has
> maximal entropy and 0 information, as you don't know anything to help
> you decide what the next one would be.
>
> The method below does seem to be returning bits of entropy rather than
> information (although I haven't had the time to go through the code to
> be sure).
>
> Francois
>
> -----Original Message-----
> From: biojava-dev-bounces at biojava.org
> [mailto:biojava-dev-bounces at biojava.org] On Behalf Of Lachlan Coin
> Sent: 3 juin, 2003 07:44
> To: Schreiber, Mark
> Cc: biojava-dev at biojava.org
> Subject: Re: [Biojava-dev] bits of information
>
>
> Hi,
>
> I guess it all depends on your intuition about what information actually
> means, but sticking to standard definitions, the low bits of information
> reflects the fact that there is not much  uncertainty in this
> distribution.  If the distribution was 100% A, then  there would be no
> uncertainty, and bits of information should return 0.  On the other
> hand, information (or uncertainty) is maximised with 25% A,C,G,T.
>
> Lachlan
>
>
> On Sun, 1 Jun 2003, Schreiber, Mark wrote:
>
> > Hi -
> >
> > The bitsOfInformation() method from DistributionTools seems to be
> > returning only the average weighted entropy not the actual
> > information.
> >
> > Eg for a distribution made thus:
> >
> >       //set the weight of a to 0.97
> >       dist.setWeight(DNATools.a(), 0.97);
> >       //set the others to 0.01
> >       dist.setWeight(DNATools.c(), 0.01);
> >       dist.setWeight(DNATools.g(), 0.01);
> >       dist.setWeight(DNATools.t(), 0.01);
> >
> > The bits of information is calculated to be: 0.24194073285321088 bits
> >
> > This strikes me as a bit low (excuse the pun). Possibly there should
> > be a method called totalEntropy and bits of information should return
> > log2(alpha size) - totalEntropy.
> >
> > - Mark
> >
> >
> > ======================================================================
> > =
> > Attention: The information contained in this message and/or
> attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or
> privileged
> > material. Any review, retransmission, dissemination or other use of,
> or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by
> AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> >
> =======================================================================
> >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at biojava.org
> > http://biojava.org/mailman/listinfo/biojava-dev
> >
>
> -------------------------------------------------------------
> Lachlan Coin
> Wellcome Trust Sanger Institute		Magdalene College
> Cambridge  CB10 1SA			Cambridge CB30AG
> Ph: +44 1223 494 820
> Fax: +44 1223 494 919
> ------------------------------------------------------------
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org http://biojava.org/mailman/listinfo/biojava-dev
>

-------------------------------------------------------------
Lachlan Coin
Wellcome Trust Sanger Institute		Magdalene College
Cambridge  CB10 1SA			Cambridge CB30AG
Ph: +44 1223 494 820
Fax: +44 1223 494 919
------------------------------------------------------------



More information about the biojava-dev mailing list