From dumortie at student.fsa.ucl.ac.be Mon May 3 02:36:05 1999 From: dumortie at student.fsa.ucl.ac.be (Alexandre Dumortier) Date: Mon, 3 May 1999 11:36:05 +0200 (MET DST) Subject: Virtual Memory exceeded In-Reply-To: <199904300802.BAA15285@daffy.ee.lbl.gov> Message-ID: > > We've got some trouble with bro... > > After about 2 hours running bro (mt script), bro crash with a : > > "Virtual Memory exceeded in 'new'" Error. > > How large a volume traffic stream are you monitoring? (how many hosts, > connections/sec, raw link speed) What filter (bro -F) are you using? # hosts: about 60 # connections/sec: no idea. A lot of HTTP connections # raw link speed: 10Mb/s (ethernet-shared) Bro runs with no filter specified (bro -i eth0 mt.bro) We have 64Mb RAM and 64 Mb swap. The problem is that when Bro runs, the memory used by the application never decreases (even when the traffic decreases, during the week-end for example) Everything is ok with the size of the log files. Another remark we have. During our monitoring of the network, we get entries in bro.log: pm_getport unknown-1073741824 (timeout) how could such a huge port number be used ? Alexandre Dumortier Patrick Verstraete Universite catholique de Louvain, Belgium From vern at ee.lbl.gov Tue May 4 01:14:48 1999 From: vern at ee.lbl.gov (Vern Paxson) Date: Tue, 04 May 1999 01:14:48 PDT Subject: Virtual Memory exceeded In-Reply-To: Your message of Mon, 03 May 1999 11:36:05 PDT. Message-ID: <199905040814.BAA28022@daffy.ee.lbl.gov> > > > We've got some trouble with bro... > > > After about 2 hours running bro (mt script), bro crash with a : > > > "Virtual Memory exceeded in 'new'" Error. > > > > How large a volume traffic stream are you monitoring? (how many hosts, > > connections/sec, raw link speed) What filter (bro -F) are you using? > > # hosts: about 60 > # connections/sec: no idea. A lot of HTTP connections > # raw link speed: 10Mb/s (ethernet-shared) That's not much load at all. (Does it really run out of memory in 2 hours? Later you discuss running it over the weekend, which sounds like you run it a lot longer than 2 hours.) However, I wonder if: > Bro runs with no filter specified (bro -i eth0 mt.bro) this is tickling a memory leak somewhere, since I always run it with a filter so it only captures the traffic it's interested in. Try running with the following filter: -F "(tcp[13] & 0x7 != 0) or tcp port telnet or tcp port finger or tcp port ftp or port 111" and let me know if that does the trick. If not, and if you're willing to send me a trace file (you can make one using bro -w ), then I'll see if I can find the problem. > Another remark we have. During our monitoring of the network, we get > entries in bro.log: > pm_getport unknown-1073741824 (timeout) > how could such a huge port number be used ? That's a 32-bit portmapper port, not a 16-bit TCP/UDP port. See /etc/rpc (and Bro's portmapper.bro) for mappings from numbers to ports. Vern From dumortie at student.fsa.ucl.ac.be Thu May 6 13:12:34 1999 From: dumortie at student.fsa.ucl.ac.be (Alexandre Dumortier) Date: Thu, 6 May 1999 22:12:34 +0200 (MET DST) Subject: Bro always crashes Message-ID: Thanks Vern for your help As suggested, we've tried : bro -i eth0 -w bro.dump -f "(tcp[13] & 0x7 != 0) or tcp port telnet or tcp port finger or tcp port ftp or port 111" ../policy/mt.bro >> bro.out 2>> bro.err But unfortunately it didn't help. The only difference is that it take a bit longer before crashing. If you want, we will send you with another mail the compressed dump file (3Mb). Alexandre Dumortier Patrick Verstraete UCL, Belgium This is the 'ps' output taken every 10 minutes since Bro has started: Wed May 5 11:50:01 CEST 1999 100100 0 9775 1 12 5 7800 7256 R N p0 0:20 bro -i et Wed May 5 12:00:00 CEST 1999 100100 0 9775 1 10 5 16188 15644 R N p0 0:50 bro -i et Wed May 5 12:10:00 CEST 1999 100100 0 9775 1 10 5 24244 23708 R N p0 1:20 bro -i et Wed May 5 12:20:00 CEST 1999 100100 0 9775 1 10 5 32624 32088 R N ? 1:50 bro -i et Wed May 5 12:30:00 CEST 1999 100100 0 9775 1 12 5 40348 39812 R N ? 2:19 bro -i et Wed May 5 12:40:00 CEST 1999 100100 0 9775 1 12 5 46248 45712 R N ? 2:45 bro -i et Wed May 5 12:50:00 CEST 1999 100100 0 9775 1 11 5 53040 52504 R N ? 3:10 bro -i et Wed May 5 13:00:02 CEST 1999 100100 0 9775 1 13 5 59320 58784 R N ? 3:30 bro -i et Wed May 5 13:10:04 CEST 1999 100100 0 9775 1 9 5 66356 58788 R N ? 3:51 bro -i et Wed May 5 13:20:03 CEST 1999 100100 0 9775 1 13 5 72412 58052 R N ? 4:12 bro -i et Wed May 5 13:30:08 CEST 1999 100100 0 9775 1 11 5 79700 58908 wait_on_pag D N ? 4:35 bro -i et Wed May 5 13:40:09 CEST 1999 100100 0 9775 1 11 5 86904 58608 wait_on_pag D N ? 4:59 bro -i et Wed May 5 13:50:11 CEST 1999 100100 0 9775 1 9 5 94572 58756 wait_on_pag D N ? 5:24 bro -i et At this point, Bro crashed From vern at ee.lbl.gov Thu May 6 17:14:25 1999 From: vern at ee.lbl.gov (Vern Paxson) Date: Thu, 06 May 1999 17:14:25 PDT Subject: Bro always crashes In-Reply-To: Your message of Thu, 06 May 1999 22:12:34 PDT. Message-ID: <199905070014.RAA15327@daffy.ee.lbl.gov> Thanks for sending the trace. The problem is that either you have split routing, in which the monitor isn't seeing both sides of most connections, or the packet filter is dropping a whole lot of packets, so that effectively the monitor again doesn't see both sides. So Bro sees patterns like: A.1234 -> B.80 SYN ... A.1234 -> B.80 FIN without seeing a SYN-ack from B.80 in between. This then leads to Bro holding state for the half-established connection after it sees A.1234 -> B.80. That's arguably a bug, it should just flush the connection after it sees the half-close. The patch below makes it does this, and then instead of requiring 100+ MB to process the file you sent me, it needs about 20 MB. Give it a try and let me know how well it works. Vern *** TCP.cc- Thu May 6 16:49:13 1999 --- TCP.cc Thu May 6 16:50:26 1999 *************** *** 1711,1718 **** // connection has likely terminated. if ( (orig->did_close && resp->did_close) || (orig->state == TCP_RESET || ! resp->state == TCP_RESET) ) ! { // Either both closed, or one RST. // The Timer has Ref()'d us and won't Unref() // us until we return, so it's safe to have // the session remove and Unref() us here. --- 1711,1720 ---- // connection has likely terminated. if ( (orig->did_close && resp->did_close) || (orig->state == TCP_RESET || ! resp->state == TCP_RESET) || ! (orig->state == TCP_INACTIVE || ! resp->state == TCP_INACTIVE) ) ! { // Either both closed, or one RST, or half-opened. // The Timer has Ref()'d us and won't Unref() // us until we return, so it's safe to have // the session remove and Unref() us here. From vern at ee.lbl.gov Fri May 7 01:33:42 1999 From: vern at ee.lbl.gov (Vern Paxson) Date: Fri, 07 May 1999 01:33:42 PDT Subject: Bro always crashes In-Reply-To: Your message of Thu, 06 May 1999 17:14:25 PDT. Message-ID: <199905070833.BAA16515@daffy.ee.lbl.gov> > without seeing a SYN-ack from B.80 in between. This then leads to > Bro holding state for the half-established connection after it sees > A.1234 -> B.80. I should add that I diagnosed this because the connection summaries Bro generated on stdout looked like: 925897359.600000 0.26 http ? 1775 199.108.25.84 130.104.28.234 SHR X "SHR" indicates a half-stablished connection that was closed by the responder. (It's the responder in this case because the only packets Bro saw were the SYN-ack [rather than the SYN] and the FIN.) This is a highly unusual state for normal traffic, i.e. when Bro sees both sides of the connections. Vern From dumortie at student.fsa.ucl.ac.be Fri May 14 07:23:17 1999 From: dumortie at student.fsa.ucl.ac.be (Alexandre Dumortier) Date: Fri, 14 May 1999 16:23:17 +0200 (MET DST) Subject: hf : bug & fix Message-ID: Hi Vern and other Bro users, We had some trouble with the bro tool hf when resolving names of 150k lines logs... The problem is when using Linux and that names are not found on dns, gethostbyaddr hangs. We saw in the code that you (Vern) were already aware of this problem and that you already implement a timeout mechanism (-t time option). But even when using the -t option, it doesn't help because of a bug in the program. Indeed, unlike on BSD systems, signals under Linux are reset to their default behavior when raised. Reinstalling signal during interrupt procedure doesn't work due to the longjmp which does not allow interruption to finish properly. So, the consequence of all this is that alarm in hf only worked once... Solution is to save not only the environment but also the signal mask. Doing this way, once we've come back from interruption SIGALRM mask is set again. So, we suggest to patch hf.l as follow : 1) replace all "setjmp(alrmenv)" with "sigsetjmp(alrmenv,1)" 2) replace "longjmp(alrmenv,1)" with "siglongjmp(alrmenv,1)" Some other small suggestions: 1) put alarm(0) just after doingdns=0 2) move alarm(tmo) & doingdns=1 after "if (sigsetjmp(alrmenv)) { } " Best regards, Alexandre Dumortier & Patrick Verstraete Universite catholique de Louvain, Belgium From leres at ee.lbl.gov Sun May 23 20:46:33 1999 From: leres at ee.lbl.gov (Craig Leres) Date: Sun, 23 May 1999 20:46:33 PDT Subject: hf : bug & fix In-Reply-To: Your message of Fri, 14 May 1999 16:23:17 PDT. Message-ID: <199905240346.UAA12849@hot.ee.lbl.gov> > Indeed, unlike on BSD systems, signals under Linux are reset to their > default behavior when raised. I've put a new release of hf here: ftp://ftp.ee.lbl.gov/hf.tar.Z It uses the setsignal() routine I wrote for tcpdump that addresses the various behaviors of signal(). I believe under linux it uses sigaction() to setup a persistent signal handler. My only linux died mysteriously a few months ago and resited attempts to reinstall the OS so I can't easily test this change; so if it tests out ok, please let us know so Vern can include this version of hf in the next bro distribution. Craig