[Bro-Dev] Changes in entropy computation code.
Gilbert Clark
gc355804 at ohio.edu
Wed Oct 5 11:09:05 PDT 2011
On 10/5/2011 10:54 AM, Rakesh Gopchandani wrote:
> Hi,
>
>
> This is well outside of my expertise, but your change is in
> opposition to how ENT[1] does it.
>
> - ent += prob[i] * rt_log2(1 / prob[i]);
> + ent += prob[i] * rt_log2(prob[i]);
>
> I just went back and verified and it looks like the original line
> is how it's done.
>
>
> I checked it out. I think rt_log(prob[i]) is the correct way to do
> this. It is the sum over entire alphabat, probability multiplied by
> log of probablity:
> http://en.wikipedia.org/wiki/Entropy_%28information_theory%29#Definition
>
Unfortunately, e-mail is not the best medium for arguing stuff like
this. Forgive the terrible formatting.
Let SUM(i, f(i)) be the sum of f(i) for all values of i that exist in
some data set (in our case, the array prob).
Base Definition: H(X) = -1 * SUM(i, prob[i] * rt_log2(prob[i]))
What we want to prove:
(1) H(X) = -1 * SUM(i, prob[i] * rt_log2(prob[i]))
is equivalent to
(2) H(X) = SUM(i, prob[i] * rt_log2(1 / prob[i]))
Assuming I remember how to do math, I believe we can get from (2) to (1)
with a little bit of algebraic manipulation (as follows):
SUM(i, prob[i] * rt_log2(1 / prob[i])) // we're starting at (2)
SUM(i, prob[i] * (rt_log2(1) - rt_log2(prob[i])) ) // rule of
logarithms: log(1 / x) = log(1) - log(x)
SUM(i, prob[i] * -rt_log2(prob[i]) ) // log(1) = 0
SUM(i, -1 * prob[i] * rt_log2(prob[i]) ) // pull
the -1 to the front to make the next step more obvious
-1 * SUM(i, prob[i] * rt_log2(prob[i]) ) // Pull the
constant -1 out of the SUM
... and we should be good.
--Gilbert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20111005/86b3d3a4/attachment-0001.html
More information about the bro-dev
mailing list