[Bro-Dev] Changes in entropy computation code.
gc355804 at ohio.edu
Thu Oct 6 13:19:16 PDT 2011
Two conclusions here (referencing the following two lines of the patch):
- ent += prob[i] * rt_log2(1 / prob[i]);
+ ent += prob[i] * rt_log2(prob[i]);
(1) That the patched code would need to be:
ent -= prob[i] * rt_log2(prob[i]);
to be technically correct (unless the sign is flipped later and I just missed it :). That said, I don't really know that the negative sign in front of the entropy matters much.
(2) If the sign were correct, the submitted patch would not change the result of the code (e.g. that the code in there at the moment is technically correct). 
That said, the patch *might* make the code's function a little more obvious, though; it took me a minute to recognize what was going on there, since the form for entropy that I was taught matches what's described in the patch.
Related: there is an interesting effect described in 2.1 of  (called "cancellation"; see the footnote), though: as the individual probabilities decrease, I'd imagine that the accuracy of our calculation would fall (since the entropy is calculated incrementally, and individual contributions can be vanishingly small). It might be good to try to find a similarly alternative definition here, as well 
 I've since found a reference here: http://astarte.csustan.edu/~tom/SFI-CSSS/info-theory/info-lec.pdf that explicitly defines entropy with the prob[i] * rt_log2(1 / prob[i]) formula.
 Assuming I'm understanding this effect correctly...
From: Robin Sommer [robin at icir.org]
Sent: Thursday, October 06, 2011 3:08 PM
To: Clark, Gilbert
Cc: bro-dev at bro-ids.org
Subject: Re: [Bro-Dev] Changes in entropy computation code.
On Wed, Oct 05, 2011 at 14:09 -0400, you wrote:
> ... and we should be good.
Gilbert, I'm not sure what your conclusion is?
Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org
ICSI/LBNL * Fax +1 (510) 666-2956 * www.icir.org
More information about the bro-dev