[Bro] time machine filesize issue

Gregor Maier gregor at icir.org
Thu Oct 28 14:42:05 PDT 2010


On 10/28/10 14:13 , Martin Holste wrote:
> I wanted to make my disk-bound queries faster, so I wanted the fewest
> files to search through for tm because it appears that every separate
> file makes the interval searches in pcapnav slower if you're
> requesting many packets.  I found than when setting filesize > 289g,
> tm creates a file per connection and trashes its working directory.
> So two questions: am I right in thinking it is faster to search
> through as few files as possible when using pcapnav?  And secondly,
> does anyone know why tm breaks when trying to create files larger than
> 289g?

I'm don't think that pcapnav speed is significantly influenced by
filesize. AFAIK pcapnav jumps to a random file offset, then tries to
sequentially read until it finds something that looks like a pcap
header. Then it checks the timestamp and reads sequentially or jumps
somewhere else until it finds the request timestamp.
If you have multiple files, then this is repeated for each file.
However, the TM knows which files cover which time periods, so it will
only access the files that it knows are candidates. So I would assume
that the lookup speed should be similar. I think that the specifics of
the query-result influence speed much more (e.g., is it only a single,
narrow time interval to search, or multiple small ones, or a few large
ones that cover almost the whole dataset).
Long story short: the number of files to search should not influence the
speed much.
If the number of files is huge, then the only thing I could imagine is
weird filesystem stuff going on when there are 1000s of files in one
directory and.....

OTOH, if the filesize is too large wrt the configured diskspace, the TM
will get troubles. It will delete old files, if writing more data (or
creating a new data file, can't recall which of the two). So if the data
files are huge, this will introduce quite some variance in diskspace usage.

That said: the TM definitely should not trash its working directory.....
Do I understand you correctly that you get a myriad of files in the
working directory. Do the files contain only a single (or handful) of
packets (possible from different connections). How many packets per file?
Also, how does your filesize relate to the configured disk-space?


cu
Gregor
-- 
Gregor Maier                                             gregor at icir.org
Int. Computer Science Institute (ICSI)          gregor at icsi.berkeley.edu
1947 Center St., Ste. 600                    http://www.icir.org/gregor/
Berkeley, CA 94704
USA



More information about the Bro mailing list