[Bro] time machine filesize issue

Martin Holste mcholste at gmail.com
Thu Oct 28 15:40:06 PDT 2010


My performance issues were noticed when making a query over a large
timeset with many packets involved.  Since there is no way to specify
a limit of packets returned, the query takes forever.  I was looking
to improve that performance.  I will continue to play around with this
to see if there is any improvement worth the large hit for file
rollover.

With filesize set at exactly 280g (279g does not produce the problem)
tm will create one disk fifo file per packet in the workdir for each
evicted packet with a disk setting of 1000g.  I am only using one
default class for "all."

On Thu, Oct 28, 2010 at 4:42 PM, Gregor Maier <gregor at icir.org> wrote:
> On 10/28/10 14:13 , Martin Holste wrote:
>> I wanted to make my disk-bound queries faster, so I wanted the fewest
>> files to search through for tm because it appears that every separate
>> file makes the interval searches in pcapnav slower if you're
>> requesting many packets.  I found than when setting filesize > 289g,
>> tm creates a file per connection and trashes its working directory.
>> So two questions: am I right in thinking it is faster to search
>> through as few files as possible when using pcapnav?  And secondly,
>> does anyone know why tm breaks when trying to create files larger than
>> 289g?
>
> I'm don't think that pcapnav speed is significantly influenced by
> filesize. AFAIK pcapnav jumps to a random file offset, then tries to
> sequentially read until it finds something that looks like a pcap
> header. Then it checks the timestamp and reads sequentially or jumps
> somewhere else until it finds the request timestamp.
> If you have multiple files, then this is repeated for each file.
> However, the TM knows which files cover which time periods, so it will
> only access the files that it knows are candidates. So I would assume
> that the lookup speed should be similar. I think that the specifics of
> the query-result influence speed much more (e.g., is it only a single,
> narrow time interval to search, or multiple small ones, or a few large
> ones that cover almost the whole dataset).
> Long story short: the number of files to search should not influence the
> speed much.
> If the number of files is huge, then the only thing I could imagine is
> weird filesystem stuff going on when there are 1000s of files in one
> directory and.....
>
> OTOH, if the filesize is too large wrt the configured diskspace, the TM
> will get troubles. It will delete old files, if writing more data (or
> creating a new data file, can't recall which of the two). So if the data
> files are huge, this will introduce quite some variance in diskspace usage.
>
> That said: the TM definitely should not trash its working directory.....
> Do I understand you correctly that you get a myriad of files in the
> working directory. Do the files contain only a single (or handful) of
> packets (possible from different connections). How many packets per file?
> Also, how does your filesize relate to the configured disk-space?
>
>
> cu
> Gregor
> --
> Gregor Maier                                             gregor at icir.org
> Int. Computer Science Institute (ICSI)          gregor at icsi.berkeley.edu
> 1947 Center St., Ste. 600                    http://www.icir.org/gregor/
> Berkeley, CA 94704
> USA
>




More information about the Bro mailing list