[Biojava-dev] bits of information

Francois Pepin frpepin at attglobal.net
Tue Jun 3 12:00:02 EDT 2003


I think that the name is misleading.

It's obviously a measure of information, but it gives back the entropy.

Just saying that something returns the information content is not quite
correct in this case as it returns the entropy.

The documentation should definitely be cleared up to make that clear.

I think that adding the method in question would be a good idea as well.

Francois

-----Original Message-----
From: biojava-dev-bounces at biojava.org
[mailto:biojava-dev-bounces at biojava.org] On Behalf Of Lachlan Coin
Sent: 3 juin, 2003 10:40
To: Francois Pepin
Cc: biojava-dev at biojava.org; 'Schreiber, Mark'
Subject: RE: [Biojava-dev] bits of information


The definitions are formal, and we all agree with the definition of
entropy.

Shannon's first coding theorem, tells us that
the entropy of an information source is equal to the minimum average
number of bits per symbol that must (and can in the limit) be used to
encode source outputs.  So, if I try to communicate to you (using binary
uniquely decipherable code) the outcome of sampling from a source X
which has  entropy H(X), then I must use at least H(X) bits per symbol
(if I am not to lose any information) and in the limit of transmiting
N-> infinity symbols, I can achieve an average H(X) bits per code.

Thus, H(X) - the entropy  - is a natural measure of the information
content of a distribution.  This is what the method is returning at the
moment.

Lachlan




More information about the biojava-dev mailing list