[Bro-Dev] [JIRA] (BIT-348) Reassembler integer overflow issues. Data not delivered after 2GB

Jon Siwek (JIRA) jira at bro-tracker.atlassian.net
Wed Apr 30 10:15:07 PDT 2014

    [ https://bro-tracker.atlassian.net/browse/BIT-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16303#comment-16303 ] 

Jon Siwek commented on BIT-348:

I did some more testing on a large trace, and I am seeing differences in the duration of a few connections. Have you seen that as well / do you have an idea where that's coming from? 

Here's an example:

before: 1359400833.646398       CHrUQS2wXshH59NEb6    XXXX 35752 YYYY    80  tcp     http    3.497443        380     428 SF   429     ShADaFfRR       8       712     4       172     (empty)
after:  1359400833.646398       C9yL7F1JnxhPtfoLMi    XXXX 35752 YYYY    80  tcp     http    0.240429        380     428 SF   429     ShADaFfRR       8       712     4       172     (empty)

This is tricky to explain, hang in there... here's basically what the lead up to the FIN exchange looks like:

responder: Seq=1, ACK=381 (everything looks fine up to and including this packet)
originator: Seq=381, ACK=430 (ack'd an unseen segment... still "fine", Bro can deal with that)
originator: Seq=381, ACK=430, FIN (still "fine")
responder: Seq=1, ACK=382, FIN (what?  It either backed down the sequence number or the peer's last ACK was wrong)

The sequence/ack number tracking in both the old code versus the new code behaves similarly: the responder's TCP reassembler will think that everything before 430 is "old" (in this case it's been reported to analyzers as Undelivered) because of the weird ACK, and that the last sequence seen is 1 (which is true, we only know about the SYN, but have not seen a packet carrying data for this endpoint).

The difference in the old code versus the new code that matters is in TCP_Reassembler::DataPending: https://github.com/bro/bro/compare/topic;jsiwek;bit-348#diff-a06b37d4ebafffdc8f9f39cca155b753R649

There's a new check that if the the last sequence seen is less than the last sequence that the reassembler is to treat as "old", then consider that case as "no data pending".  The pcap you sent me matches this case in the new code on certain calls and ends up returning a different value than the old code.  This matters in determining the connection duration because the connection will stop updating it's "last time" if both endpoints are treated as "closed with no pending data".  In the old code, the responder endpoint is prevented from satisfying the "no pending data" requirement and so always updates the "last time" on every packet.  In the new code, as soon as the FIN exchange is complete, both endpoints meet the "closed with no pending data" requirement and "last time" of the connection no longer updates with subsequent packets.

My justification for adding the additional check in TCP_Reassembler::DataPending was/is that the intention of that method is to tell whether there's any holes in the sequence space that could possibly be filled by a later TCP packet delivered out of order.  I didn't anticipate this particular scenario, but the logic still holds: it's never possibly to "deliver" any such data because the reassembler has moved on... if it gets stuff from the sequence space it considers "old", it will be dropped.  In that sense, there's "no data pending" and I think the new code is more "correct".

And the way in which duration is measured seems a bit finicky/arbitrary to begin with, not sure the difference for cases like this are important?

Also, just to note: the same connection but w/ sane seq/ack numbers would be handled the same way between code versions, and that way would produce results that are the same as what the new code produces.

> Reassembler integer overflow issues. Data not delivered after 2GB
> -----------------------------------------------------------------
>                 Key: BIT-348
>                 URL: https://bro-tracker.atlassian.net/browse/BIT-348
>             Project: Bro Issue Tracker
>          Issue Type: Problem
>          Components: Bro
>    Affects Versions: git/master
>            Reporter: gregor
>            Assignee: Robin Sommer
>            Priority: High
>              Labels: inttypes
>             Fix For: 2.3
> {noformat}
> #!rst
> The TCP Reassembler does not deliver any data to analyzers after the first 2GB due to signed integer overflow (Actually it will deliver again between 4--6GB, etc.) This happens silently, i.e., without content_gap events or Undelivered calls. 
> This report superseded BIT-315, BIT-137
> The TCP Reassembler (and Reassem) base class use ``int`` to keep track of sequence numbers and ``seq_delta`` to check for differences. If a connection exceeds 2GB, the relative sequence numbers (int) used by the Reassembler become negative. While many parts of the Reassembler still work (because seq_delta still reports the correct difference) some parts do not. In particular ``seq_to_skip`` is broken (and fails silently). There might well be other parts of the Reassembler that fail 
> silently as well, that I haven't found yet. 
> See Comments in TCP_Reassembler.cc for more details. 
> The Reassembler should use int64. However this will require deep changes to the Reassembler and the TCP Analyzer and TCP_Endpoint classes (since we also store sequence numbers there). Also, the analyzer framework will need tweaks as well (e.g., Undelivered uses ``int`` for sequence numbers, also has to go to 64 bit)
> As a hotfix that seems to work I disabled the ``seq_to_skip`` features. It wasn't used by any analyzer or policy script (Note, that seq_to_skip is different from skip_deliveries). Hotfix is in 
> topic/gregor/reassembler-hotfix
> {noformat}

This message was sent by Atlassian JIRA

More information about the bro-dev mailing list