[Bro-Dev] Broker raw throughput

Matthias Vallentin vallentin at icir.org
Wed Feb 24 08:10:50 PST 2016


I'm taking a discussion about Broker performance public which we
previously had on our internal mailing list. Below, I'm pasting in
Dominik's answer to my email. Dominik, we've white-listed your email
address on bro-dev, but free to subscribe there to get future responses.
Apologies for the inconvenience this may have caused on your end.

While I'm at it: there's small but important difference:

    gperf != gperftools

The former is GNU profiler whereas the latter Google's instrumentation
framework. Unfortunately the naming jungle out there is convoluted.

    Matthias

> (Context for Dominik: we are measuring the maximum throughput CAF can
> deliver over a TCP connection and find it performs and order of
> magnitude below 0mq.)
>
> I've attempted to reproduce the same benchmark with pure CAF, without
> Broker, and can confirm the same order-of-magnitude difference. In fact,
> the rates I observe with CAF are in the same range as Justin's
> measurements.

We can get one level deeper by using brokers (I mean caf::broker actors)
directly to get rid of serialization entirely. I don't think that would
change the performance much, but it makes analyzing profiler output a
bit easier by removing unrelated components and functions from the
graph. I'm going to write a small test program for CAF in the next
couple of days.

> From looking at the plots, I suspect that the client is the bottleneck,
> which spends a third of its cycles in the function _sendto via
> caf::io::network::stream::handle_event. The multiplexer and BASP broker
> share the rest of CPU time.
>
> Dominik, does this make sense to you? Have you performed similar
> measurements in the past?

So far we tested primarily in comparison to other messaging frameworks
for specific test cases such as distributed computations with CAF vs.
MPI/Scala/Erlang. However, I ran a small benchmark comparing raw
throughput measured via iperf to a CAF setup a while ago and noticed
that receive performance was ok, but send performance was lacking. This
corresponds to your findings, but I don't remember it being factor 5-6
worse.

Thanks for sending me the gperf graphs. I will come back to you after
running a test series under Linux and digging through the code a bit.

    Dominik

----- Forwarded message from Matthias Vallentin <vallentin at icir.org> -----

From: Matthias Vallentin <vallentin at icir.org>
Sender: Matthias Vallentin <matthias at vallentin.net>
To: Dominik Charousset <dominik.charousset at haw-hamburg.de>
Cc: "Azoff, Justin S" <jazoff at illinois.edu>, bro-blue <bro-blue at bro.org>
Subject: Re: [Bro-Blue] Broker raw throughput
Date: Tue, 23 Feb 2016 17:23:09 -0800
User-Agent: Mutt/1.5.24 (2015-08-30)
X-Label: 

(Context for Dominik: we are measuring the maximum throughput CAF can
deliver over a TCP connection and find it performs and order of
magnitude below 0mq.)

I've attempted to reproduce the same benchmark with pure CAF, without
Broker, and can confirm the same order-of-magnitude difference. In fact,
the rates I observe with CAF are in the same range as Justin's
measurements. 

Then I tried to figure out two things: (i) whether the CAF profiler
throughput setting makes a difference, and (ii) where the bottleneck is.
Regarding (i), I tested the client with values 10, 100, 1K, and 10K,
100K, and 1M. Up to 10K, no difference in the rate. Starting at 100K
occasionally, and always at 1M, the benchmark no longer terminated. I
suspect that I simply overloaded CAF's I/O subsystem.

I've tried to figure out where CAF spends its time (because it's clearly
not I/O-bound), and produced two gperftools plots:

    http://www.icir.org/matthias/tmp/caf-client-freebsd.pdf
    http://www.icir.org/matthias/tmp/caf-server-freebsd.pdf
    http://www.icir.org/matthias/tmp/caf-client-mac.pdf
    http://www.icir.org/matthias/tmp/caf-server-mac.pdf

(There's an issue with the C++ standard library on Macs, which boils down
to a pthreads condition variable issue, so only the FreeBSD plots are
valuable. Still, I'm showing the Mac plots for completeness. I have been
running into this issue many times before.)

>From looking at the plots, I suspect that the client is the bottleneck,
which spends a third of its cycles in the function _sendto via
caf::io::network::stream::handle_event. The multiplexer and BASP broker
share the rest of CPU time. 

Dominik, does this make sense to you? Have you performed similar
measurements in the past?

    Matthias

On Wed, Feb 03, 2016 at 06:56:22PM +0000, Azoff, Justin S wrote:
> Hi,
> 
> I've tried writing some simple programs using broker to see what kind of raw message rate it can do.
> 
> I have them in a fork of the broker repo:
> 
> https://github.com/JustinAzoff/broker/blob/master/tests/test_listen.cc
> 
> https://github.com/JustinAzoff/broker/blob/master/tests/test_send.cc
> 
> Starting test_listen and then test_send gives this output:
> 
> Received [150000] duration: 0.832057 rate: 60092/sec
> Received [200000] duration: 0.981312 rate: 50952.2/sec
> Received [250000] duration: 0.984688 rate: 50777.5/sec
> Received [300000] duration: 0.888988 rate: 56243.7/sec
> Received [350000] duration: 0.99591 rate: 50205.3/sec
> Received [400000] duration: 0.824133 rate: 60669.8/sec
> 
> I can't get it to go any faster, if I remove or decrease the second sleep in the sender, something breaks and nothing is received.  If I start up 6+ senders, the listener seems to grind to a halt as well.
> 
> I ran a quick test using pyzmq (basically the example on https://learning-0mq-with-pyzmq.readthedocs.org/en/latest/pyzmq/patterns/pubsub.html) with a counter added, and that does:
> 
> 13200000 Received 'message 1771511' duration 1.3138718605 rate: 304443.69198
> 13600000 Received 'message 2172740' duration 1.26631116867 rate: 315878.126875
> 14000000 Received 'message 2575818' duration 1.24697780609 rate: 320775.556747
> 14400000 Received 'message 2979093' duration 1.32109212875 rate: 302779.792033
> 14800000 Received 'message 3423906' duration 1.69283390045 rate: 236290.164022
> 
> I know it's not apples-to-apples since broker is doing a bunch of serialization stuff, but I wouldn't expect the serialization code to add that much overhead.
> 
> 
> import zmq
> import time
> 
> port = "5556"
> 
> context = zmq.Context()
> socket = context.socket(zmq.SUB)
> 
> socket.bind("tcp://*:%s" % port)
> 
> socket.setsockopt(zmq.SUBSCRIBE, "message")
> s = time.time()
> i = 0
> while True:
>     string = socket.recv()
>     i += 1
>     if i % 400000 == 0:
>         e = time.time()
>         print i, "Received", repr(string), "duration", e-s, "rate:", (400000/(e-s))
>         s = time.time()
> 
> import zmq
> 
> port = "5556"
> context = zmq.Context()
> socket = context.socket(zmq.PUB)
> socket.connect("tcp://localhost:%s" % port)
> 
> i = 0
> while True:
>     socket.send("message %d" % (i))
>     i += 1
> 
> -- 
> - Justin Azoff
> 
> 
> _______________________________________________
> bro-blue mailing list
> bro-blue at bro.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro-blue

#include <iostream>

#include <caf/all.hpp>
#include <caf/io/all.hpp>

using namespace caf;
using namespace caf::io;
using namespace std;

int main(int argc, char** argv) {
  // Set scheduler.
  size_t throughput = -1;
  if (argc > 1) {
    stringstream ss;
    ss << argv[1];
    ss >> throughput;
  }
  cerr << "using scheduler with throughput " << throughput << endl;
  set_scheduler(2, throughput);
  // Run test code.
  auto server = remote_actor("127.0.0.1", 6666);
  cerr << "connected to 127.0.0.1:6666, blasting out data" << endl;
  auto i = 0;
  scoped_actor self;
  self->monitor(server);
  for (auto i = 0; i < 10000000; ++i)
    self->send(server, i++);
  self->receive(
    [&](down_msg const& msg) { 
      cerr << "server terminated" << endl;
    }
  );
  self->await_all_other_actors_done();
}

#include <iostream>

#include <caf/all.hpp>
#include <caf/io/all.hpp>

using namespace caf;
using namespace caf::io;
using namespace std;
using namespace std::chrono;

behavior server(event_based_actor* self, size_t n = 10) {
  auto counter = make_shared<int>();
  auto iterations = make_shared<int>(n);
  self->send(self, *counter, high_resolution_clock::now());
  return {
    [=](int i) {
      ++*counter;
    },
    [=](int last, high_resolution_clock::time_point prev) {
      auto now = high_resolution_clock::now();
      auto secs = duration_cast<seconds>(now - prev);
      auto rate = (*counter - last) / static_cast<double>(secs.count());
      cout << rate << endl;
      if (rate > 0 && --*iterations == 0) // Count only when we have data.
        self->quit();
      else
        self->delayed_send(self, seconds(1), *counter, now);
    }
  };
}

int main(int argc, char** argv) {
  // Set scheduler.
  auto iterations = size_t{0};
  if (argc > 1) {
    stringstream ss;
    ss << argv[1];
    ss >> iterations;
  }
  cerr << "spawning server for " << iterations << " iterations" << endl;
  // Run test code.
  scoped_actor self;
  auto s = self->spawn(server, iterations);
  publish(s, 6666, "127.0.0.1");
  self->await_all_other_actors_done();
}


----- End forwarded message -----
-------------- next part --------------
#include <iostream>

#include <caf/all.hpp>
#include <caf/io/all.hpp>

using namespace caf;
using namespace caf::io;
using namespace std;

int main(int argc, char** argv) {
  // Set scheduler.
  size_t throughput = -1;
  if (argc > 1) {
    stringstream ss;
    ss << argv[1];
    ss >> throughput;
  }
  cerr << "using scheduler with throughput " << throughput << endl;
  set_scheduler(2, throughput);
  // Run test code.
  auto server = remote_actor("127.0.0.1", 6666);
  cerr << "connected to 127.0.0.1:6666, blasting out data" << endl;
  auto i = 0;
  scoped_actor self;
  self->monitor(server);
  for (auto i = 0; i < 10000000; ++i)
    self->send(server, i++);
  self->receive(
    [&](down_msg const& msg) { 
      cerr << "server terminated" << endl;
    }
  );
  self->await_all_other_actors_done();
}
-------------- next part --------------
#include <iostream>

#include <caf/all.hpp>
#include <caf/io/all.hpp>

using namespace caf;
using namespace caf::io;
using namespace std;
using namespace std::chrono;

behavior server(event_based_actor* self, size_t n = 10) {
  auto counter = make_shared<int>();
  auto iterations = make_shared<int>(n);
  self->send(self, *counter, high_resolution_clock::now());
  return {
    [=](int i) {
      ++*counter;
    },
    [=](int last, high_resolution_clock::time_point prev) {
      auto now = high_resolution_clock::now();
      auto secs = duration_cast<seconds>(now - prev);
      auto rate = (*counter - last) / static_cast<double>(secs.count());
      cout << rate << endl;
      if (rate > 0 && --*iterations == 0) // Count only when we have data.
        self->quit();
      else
        self->delayed_send(self, seconds(1), *counter, now);
    }
  };
}

int main(int argc, char** argv) {
  // Set scheduler.
  auto iterations = size_t{0};
  if (argc > 1) {
    stringstream ss;
    ss << argv[1];
    ss >> iterations;
  }
  cerr << "spawning server for " << iterations << " iterations" << endl;
  // Run test code.
  scoped_actor self;
  auto s = self->spawn(server, iterations);
  publish(s, 6666, "127.0.0.1");
  self->await_all_other_actors_done();
}


More information about the bro-dev mailing list