[Bro-Dev] Changes in entropy computation code.

Wed Oct 5 11:09:05 PDT 2011

On 10/5/2011 10:54 AM, Rakesh Gopchandani wrote:
> Hi,
>
>
>     This is well outside of my expertise, but your change is in
>     opposition to how ENT[1] does it.
>
>     -                       ent += prob[i] * rt_log2(1 / prob[i]);
>     +                       ent += prob[i] * rt_log2(prob[i]);
>
>     I just went back and verified and it looks like the original line
>     is how it's done.
>
>
> I checked it out. I think rt_log(prob[i]) is the correct way to do 
> this. It is the sum over entire alphabat, probability multiplied by 
> log of probablity: 
> http://en.wikipedia.org/wiki/Entropy_%28information_theory%29#Definition
>

Unfortunately, e-mail is not the best medium for arguing stuff like 
this.  Forgive the terrible formatting.

Let SUM(i, f(i)) be the sum of f(i) for all values of i that exist in 
some data set (in our case, the array prob).

Base Definition: H(X) = -1 * SUM(i, prob[i] * rt_log2(prob[i]))

What we want to prove:

(1) H(X) = -1 * SUM(i, prob[i] * rt_log2(prob[i]))

is equivalent to

(2) H(X) = SUM(i, prob[i] * rt_log2(1 / prob[i]))

Assuming I remember how to do math, I believe we can get from (2) to (1) 
with a little bit of algebraic manipulation (as follows):

SUM(i, prob[i] * rt_log2(1 / prob[i]))  // we're starting at (2)

SUM(i, prob[i] * (rt_log2(1) - rt_log2(prob[i])) )       // rule of 
logarithms: log(1 / x) = log(1) - log(x)

SUM(i, prob[i] * -rt_log2(prob[i]) )          // log(1) = 0

SUM(i, -1 * prob[i] * rt_log2(prob[i]) )                         // pull 
the -1 to the front to make the next step more obvious

-1 * SUM(i, prob[i] * rt_log2(prob[i]) )                   // Pull the 
constant -1 out of the SUM

... and we should be good.

--Gilbert

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20111005/86b3d3a4/attachment-0001.html