[Bro-Dev] Broker raw throughput

Mon Mar 7 17:10:09 PST 2016

> I have created a ticket for further progress tracking / discussion [1]
> as this is clearly not a Bro/Broker problem. Thank you all for
> reporting this and all the input you have provided.

It's good to see the new commit improves performance. But I want to take
again the perspective of Broker, where we're measuring throughput in
number of messages per second. Before the changes, we could blast around
80K messages/sec through two remotely connected CAF nodes. After your
changes, I am now measuring peak rate of up to 190K/sec on my FreeBSD
box. That's more than double. Really cool! But: the benchmark no longer
terminates and the server quickly stops getting data, and I would like
to know why. Here is the modified actor-system code:

    // Client
    using namespace caf;
    using namespace caf::io;
    using namespace std;

    int main(int argc, char** argv) {
      actor_system_config cfg{argc, argv};
      cfg.load<io::middleman>();
      actor_system system{cfg};
      auto server = system.middleman().remote_actor("127.0.0.1", 6666);
      cerr << "connected to 127.0.0.1:6666, blasting out data" << endl;
      auto i = 0;
      scoped_actor self{system};
      self->monitor(server);
      for (auto i = 0; i < 1000000; ++i)
        self->send(server, i++);
      self->receive(
        [&](down_msg const& msg) {
          cerr << "server terminated" << endl;
        }
      );
      self->await_all_other_actors_done();
    }

    // Server
    using namespace caf;
    using namespace caf::io;
    using namespace std;
    using namespace std::chrono;

    CAF_ALLOW_UNSAFE_MESSAGE_TYPE(high_resolution_clock::time_point)

    behavior server(event_based_actor* self, int n = 10) {
      auto counter = make_shared<int>();
      auto iterations = make_shared<int>(n);
      self->send(self, *counter, high_resolution_clock::now());
      return {
        [=](int i) {
          ++*counter;
        },
        [=](int last, high_resolution_clock::time_point prev) {
          auto now = high_resolution_clock::now();
          auto secs = duration_cast<seconds>(now - prev);
          auto rate = (*counter - last) / static_cast<double>(secs.count());
          cout << rate << endl;
          if (rate > 0 && --*iterations == 0) // Count only when we have data.
            self->quit();
          else
            self->delayed_send(self, seconds(1), *counter, now);
        }
      };
    }

I invoke the server as follows:

  CPUPROFILE=caf-server.prof ./caf-server --caf#scheduler.scheduler-max-threads=4

And the client like this:

  CPUPROFILE=caf-client.prof ./caf-client --caf#scheduler.scheduler-max-threads=4 --caf#scheduler.max-throughput=10000

I've tried various parameters for the scheduler throughput, but they do
not seem to make a difference. Would you mind taking a look at what's
going on here? It looks like the "sender overload protection" you
mentioned is not working as expected.

I'm also attaching a new gperftools profiler output from the client and
server. The server is not too telling, because it was spinning idle for
a bit until I ran the client, hence the high CPU load in nanosleep.
Looking at the client, it seems that only 67.3% of time is spent in
local_actor::resume, which would mean that the runtime adds 33.7%
overhead. That's not correct, because gperftools cannot link the second
tree on the right properly. (When compiling with -O0 instead of -O3, it
looks even worse.) Still, why is intrusive_ptr::get consuming 27.9%?

Looking on the left tree, it looks like this workload stresses the
allocator heavily: 

    - 20.4% tc_malloc_skip_new_handler 
    - 7% std::vector::insert in the BASP broker
    - 13.5% CAF serialization (adding two out-edges from
            basp::instance::write, 5.8 + 7.5)

Perhaps this helps you to see some more optimization opportunities.

Switching gears to your own performance measurements: it sounded like
that you got gains at the order 400% when comparing just raw byte
throughput (as opposed to message throughput). Can you give us an
intuition how that relates to the throughput measurements we have been
doing?

    Matthias
-------------- next part --------------
A non-text attachment was scrubbed...
Name: caf-client-freebsd.pdf
Type: application/pdf
Size: 16369 bytes
Desc: not available
Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160307/dc017798/attachment-0002.pdf 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: caf-server-freebsd.pdf
Type: application/pdf
Size: 18170 bytes
Desc: not available
Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160307/dc017798/attachment-0003.pdf