<div dir="ltr"><div>I think this is a useful feature. I&#39;m a bit unclear on the logarithmic counts. Take, for instance SaDtTtT. If I&#39;m reading this correctly, I think that means 10-99 retransmissions from orig, followed by 10-99 from resp, then more retransmissions from orig (enough to reach a total of 100-999), and similarly more from resp. However, I could also interpret it as 10-99 from orig, 10-99 from resp, 10-99 from orig, 10-99 from resp.<br><br></div><div>Another question I had was that most of these are TCP-specific. Would checksum apply to UDP as well?<br></div><div><br></div>One downside of the logarithmic approach is that it makes it hard to search for, since searching for &#39;t.*t&#39; means one thing for small conns, and another for large conns. As you say, if what I care about is the overall number compared to the number of packets, that feels more like a percentage. To me, it&#39;d seem more natural to use something like &quot;0t&quot; means &quot;of the total number of packets from the originator, 0-9% were retransmissions,&quot; &quot;1t&quot; means 10-19%, etc.<br><div><div><div><div><br></div><div>What I&#39;m left debating is whether adding numerical data to history is the right approach, though. missed_bytes is a separate field, but it feels similar. If we did something like the log approach for that, we&#39;d lose exact counts, but we&#39;d have granularity on the direction. Maybe we add the new letters, but don&#39;t repeat them and also add new fields for exact bytecounts?<br></div><div><br></div><div> --Vlad<br></div><div><br><br></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 15, 2018 at 6:51 PM, Vern Paxson <span dir="ltr">&lt;<a href="mailto:vern@corelight.com" target="_blank">vern@corelight.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">&gt; Here we will not have cases where some repetitions are logarithmic, and<br>

&gt; some (like for R) are not. I guess that makes sense, but I can see it<br>

&gt; potentially being confusing.<br>

<br>

</span>Yeah, I chewed on that too, but I don&#39;t see a better solution.  The semantics<br>

of repeated R are different, too (per the recent $history thread, it entails<br>

differing sequence numbers), so I think once that&#39;s the case, then it&#39;s<br>

not all that much more confusing if the significance of a repetition has<br>

different semantics too.<br>

<span class="HOEnZb"><font color="#888888"><br>

                Vern<br>

</font></span><div class="HOEnZb"><div class="h5">______________________________<wbr>_________________<br>

bro-dev mailing list<br>

<a href="mailto:bro-dev@bro.org">bro-dev@bro.org</a><br>

<a href="http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev" rel="noreferrer" target="_blank">http://mailman.icsi.berkeley.<wbr>edu/mailman/listinfo/bro-dev</a><br>

</div></div></blockquote></div><br></div>