From robin at corelight.com Wed Jul 1 01:59:03 2020 From: robin at corelight.com (Robin Sommer) Date: Wed, 1 Jul 2020 08:59:03 +0000 Subject: [Zeek-Dev] Log archival (Re: Zeek Supervisor: designing client and log archival) behavior In-Reply-To: References: Message-ID: <20200701085903.GI33767@corelight.com> On Tue, Jun 30, 2020 at 01:39 -0700, Jon Siwek wrote: > * https://github.com/zeek/zeek/wiki/Zeek-Supervisor-Log-Handling This overall sounds good to me. Some notes & questions: > Log Rotation > To help bridge/replace Step (4) and (5), suggest adding a new option: > Log::default_rotation_dir. The Log::rotation_format_func() will use > this as part of its default return value. Seems we should then set this to "." by default, and have the cluster framework override it. > The log_mgr will attempt to create necessary dirs just-in-time, > failing to do so emits an error, but otherwise continues with rotation > using working directory instead. I'd extend this to any error case: if moving from current location to Log::default_rotation_dir fails (e.g., because the latter is a on different file system), continue with new name inside the current working directory (and report the error). Once moved, I suppose we would continue to optionally run a post-processor, right? For a supervised cluster, we wouldn't use that and suggest that people go with "zeek-archive" instead; but with ZeekControl we'd keep the current behavior of gzipping behavior so that we don't break any setups. We can implement that distinction through the post-processer function: the new default function would just do the rename according to the new scheme, and a separate legacy function for ZeekControl spawns the "archive-log" script. > zeek-archiver I like making this a standard tool, but seems like something we could postpone doing right now and prioritize getting the Zeek-side infrastructure in place. > We can potentially have the Zeek Supervisor process configurable to > auto-start and keep a zeek-archiver child alive. I'd say that's a job for systemd (or whatever service manager). I know Seth disagress. :-) > Leftover Log Rotation > The rotation for such a leftover log file uses the metadata in the > shadowfile to help try to go through the exact rotation that it should > have occurred, including running the postprocessor function. Not sure it's worth retaining the information about the post-processor function, and it could to potentially lead to trouble if the function changed somehow in between (or disppeared). We could instead just run the leftovers through whatever the restarted config says to do with files. Do we even need any other meta data at all in the new scheme? I'm wondering if we could simplify this all to: "If at open() time, X.log exists, first rotate it away through the currently configured postprocessor function". If we did that, we should probably have an global boolean that allows to choose between that and just overwriting existing files. The latter would be the default to retain current command-line behavior, and the cluster framework would enable leftover recovery. Hmm, actually, there's a piece of meta that we'll need: the opening timestamp, so that one can incorporate that into the name of the rotated file (assuming we want to retain that capability). Unless we parsed that out of the X.log itself ... Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From robin at corelight.com Wed Jul 1 02:00:38 2020 From: robin at corelight.com (Robin Sommer) Date: Wed, 1 Jul 2020 09:00:38 +0000 Subject: [Zeek-Dev] Supervisor client (Re: Zeek Super-isor: designing client and log archival behavior) In-Reply-To: References: Message-ID: <20200701090038.GJ33767@corelight.com> > * https://github.com/zeek/zeek/wiki/Zeek-Supervisor-Client Some thoughts on the commands: > $ zeekc status [all | ] > Do we need to include any other metrics in the returned status? That information is mostly static, would be nice to get some dynamic information in there as well, like uptime, CPU/memory/traffic stats, No need to have that right away, but worth keeping in mind. > # Do we need more categories to filter by (e.g. node type) ? I'd skip for now. > # If there's downed nodes at this point, what do we expect users to do? > # Check the standard services logs for stderr/stdout info? Check reporter.log ? Yeah, would be cool if zeekc had access to the stderr/stdout from the nodes through their supervisors. The supervisors could buffer that for a while and return on request. More generally, the supervisor could get a "diagnostics buffer" that, over time, we could use for more stuff like store backtraces etc. "reporter.log" is out I'd say, that will go through the normal log rotation & archival, and be accessible that way. > # A `zeekc diag` command could help gather information, like ask Zeek supervisor > # to find core dumps and extract stack trace. Would it do more than that, like > # show last N lines of downed nodes' stderr, or last N lines of reporter.log? > $ zeekc check I'm wondering which supervisor that would be be talking to in a multi-system setup? All? > $ zeekc terminate > ... > # Normally wouldn't terminate the supervisor if a service-manager is handling > # the Zeek supervisor process itself and will just restart it, but`terminate` > # would be helpful for anyone running a supervised Zeek cluster > "manually". Another use case: If for some reason one wants to restart the supervisor itself, "terminate" would kill it and the service manager would then restart it. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From robin at corelight.com Wed Jul 1 02:02:08 2020 From: robin at corelight.com (Robin Sommer) Date: Wed, 1 Jul 2020 09:02:08 +0000 Subject: [Zeek-Dev] Zeek Supervisor Command-Line Client In-Reply-To: References: <20200618071141.GH9200@corelight.com> <20200619083810.GE49063@corelight.com> Message-ID: <20200701090208.GK33767@corelight.com> On Tue, Jun 30, 2020 at 14:29 -0700, Jon Siwek wrote: > Maybe the important observation is that the logic can be performed > anywhere that has access to the Zeek-Supervisor process. Agree. > So where we put the logic at this point may not be important. If we > can find a single-best-place for the logic to live, that's great I believe that's what Seth is arguing for: have a Zeek-side script be the single point of that logic, rather than implement it multiple times and/or outside of Zeek. I can see doing that in Zeek but I think there's a trade-off here: if we want to do the singe-place approach with a multi-system setup, we'd need an authoritative place to run this logic and hence depend on *that* Zeek supervisor being up and running for performing the operation. That may be a reasonably assumption (say if we dedicated the supervisor running the manager to also be the cluster coordinator), but it's different from a world where the client can execute higher-level operations on its own. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jsiwek at corelight.com Wed Jul 1 14:03:52 2020 From: jsiwek at corelight.com (Jon Siwek) Date: Wed, 1 Jul 2020 14:03:52 -0700 Subject: [Zeek-Dev] Log archival (Re: Zeek Supervisor: designing client and log archival) behavior In-Reply-To: <20200701085903.GI33767@corelight.com> References: <20200701085903.GI33767@corelight.com> Message-ID: On Wed, Jul 1, 2020 at 1:59 AM Robin Sommer wrote: > > > Log::default_rotation_dir > > Seems we should then set this to "." by default, and have the cluster > framework override it. Yes, exactly. > Once moved, I suppose we would continue to optionally run a > post-processor, right? For a supervised cluster, we wouldn't use that > and suggest that people go with "zeek-archive" instead; but with > ZeekControl we'd keep the current behavior of gzipping behavior so > that we don't break any setups. Yes, with the proposed changes, custom postprocessors still work the same as before and everything is backwards compatible / equivalent in non-supervised-mode. Supervised-mode is just picking some different default settings from non-supervised-mode: * don't use a postprocessing script (archive-log) * rotate into a `Log::default_rotation_dir` of "log-queue" instead of "." > Not sure it's worth retaining the information about the post-processor > function, and it could to potentially lead to trouble if the function > changed somehow in between (or disppeared). We could instead just run > the leftovers through whatever the restarted config says to do with > files. * Disappeared: easy to notice the function no longer exists and fallback to default post-processor * Changed: running through a function of same-name, but it happened to get changed between restart is probably still going to be closer to what user expects than running it through the default post-processor which is completely different ? > Do we even need any other meta data at all in the new scheme? I'm > wondering if we could simplify this all to: "If at open() time, X.log > exists, first rotate it away through the currently configured > postprocessor function". What if an open() rarely or never happens again for a given log? I'm thinking the rotation of leftover logs needs to happen once at startup rather than lazily. > Hmm, actually, there's a piece of meta that we'll need: the opening > timestamp, so that one can incorporate that into the name of the > rotated file (assuming we want to retain that capability). Unless we > parsed that out of the X.log itself ... Don't think we'd have the opening timestamp to parse from the log when LogAscii::use_json=T. So still think it's necessary to obtain open-time meta from a `.shadow.X.log`, either it's explicitly in there or use the files modified time (essentially creation time). The close-time of X.log is just taken as last-modified time of X.log. - Jon From robin at corelight.com Thu Jul 2 00:44:08 2020 From: robin at corelight.com (Robin Sommer) Date: Thu, 2 Jul 2020 07:44:08 +0000 Subject: [Zeek-Dev] Log archival (Re: Zeek Supervisor: designing client and log archival) behavior In-Reply-To: References: <20200701085903.GI33767@corelight.com> Message-ID: <20200702074408.GO33767@corelight.com> On Wed, Jul 01, 2020 at 14:03 -0700, Jon Siwek wrote: > What if an open() rarely or never happens again for a given log? Ah, right, forgot about that case. So yeah, agree, the shadow files are useful for this and to retain whatever information we need. > * Changed: running through a function of same-name, but it happened to > get changed between restart is probably still going to be closer to > what user expects than running it through the default post-processor > which is completely different ? I was thinking not the default post-processor, but whatever is configured for the log file we are just opening (if we did it at open() time). But yeah, won't work when the cleanup happens already before the new open. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From petar.backovic.fit at gmail.com Sun Jul 5 01:19:33 2020 From: petar.backovic.fit at gmail.com (Petar Backovic) Date: Sun, 5 Jul 2020 10:19:33 +0200 Subject: [Zeek-Dev] Email Zeek In-Reply-To: References: Message-ID: Respected devs, I installed Zeek and configure interface, email, private IP address, etc. I copied script for SSH password guessing from docs.zeekweb site and my listener on wlan0 works. When I failed to login on SSH with Putty enough times, I never recieved alert email. Every SSH login is in ssh.log file, but nothing on email. Internet works on my Rassoberry pi. Could you help me, where is the problem? Thank you in advance, Petar Backovic -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/zeek-dev/attachments/20200705/44cb2888/attachment.html From johanna at corelight.com Thu Jul 9 13:21:44 2020 From: johanna at corelight.com (Johanna Amann) Date: Thu, 09 Jul 2020 13:21:44 -0700 Subject: [Zeek-Dev] Zeek Table Cluster distribution using broker ready for testing Message-ID: Hello everyone, If you followed last year?s Zeek Week, you might be aware that we have been working on a new way to more easily distribute Zeek Table content in a cluster setup. We now have a working prototype - and I would be happy for feedback if someone wants to start playing with it. We tried to make this feature as easy to use as possible. In the case that you just want to distribute a table over an entire Zeek-cluster, you only have to add &backend=Broker::MEMORY to the table definition. So - for example: global table_to_share: table[string] of count &backend=Broker::MEMORY; This will automatically synchronized the table over the entire cluster. In the background, a Broker store (in this case a memory-backed store) is created and used for the actual data synchronization. Changes to the table are automatically sent to the broker store and distributed over the cluster. We also support persistent broker stores. At the moment you need to specify the path in which the database should be stored for this feature. Example: redef Broker::auto_store_db_directory = "[path]"; global table_to_share: table[string] of count &backend=Broker::SQLITE; Data that is stored in the table will be persistent across restarts of Zeek. Current limitations: * there is no conflict resolution. Simultaneous inserts for the same key will probably lead to a divergent state over the cluster. This is by design - if you need to be absolutely sure that you do not loose any data, or if you want conflict resolution for multiple inserts, you will still have to roll your own script-level logic using events. * tables only can have a single index, multi-indexed tables (like table[string, count] of X) are not yet supported * tables only can have simple values. Tables that store records, tables, sets, vectors are not supported. The reason for this is that we cannot track table-changes in these cases. * &expire_func cannot be used simultaneously. Normal expiry should work correctly. * documentation is basically still completely missing - I will write it over the next days. If you want to try this you have to compile the topic/johanna/table-changes branch of the Zeek repository. To check out this branch into a new directory, use something like: git clone https://github.com/zeek/zeek --branch topic/johanna/table-changes --recursive [target-directory] Please let me know if you have any feedback/questions/problems :) Johanna From bob.murphy at corelight.com Thu Jul 9 16:57:04 2020 From: bob.murphy at corelight.com (Bob Murphy) Date: Thu, 9 Jul 2020 16:57:04 -0700 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe Message-ID: Right now, if you try to use Zeek's debug logging facilities in DebugLogger.h concurrently from multiple threads, the contents of debug.log can get mixed up and look like like "word salad". I've been working on log writers for Zeek. Those operate in different threads, and using Zeek's current open-source debug logging implementation, trying to make sense of debug logs from those was a real headache. So in my own code, I've made debug logging thread-safe, so log text from different threads winds up on different lines in the debug.log file. I've also added more convenience macros to make logging some kinds of debug information easier. This proposal is to integrate those debug logging changes into open-source Zeek. I'd welcome any questions, suggestions or feedback. Bob Murphy | Corelight, Inc. | bob.murphy at corelight.com | www.corelight.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/zeek-dev/attachments/20200709/d01ee629/attachment.html From bob.murphy at corelight.com Thu Jul 9 18:19:43 2020 From: bob.murphy at corelight.com (Bob Murphy) Date: Thu, 9 Jul 2020 18:19:43 -0700 Subject: [Zeek-Dev] Proposal: Improve Zeek's log-writing system with batch support and better status reporting Message-ID: <8D06AACD-8721-4EDA-95BD-DAB3D60ACD84@corelight.com> Summary This proposal is aimed at solving two intertwined problems in Zeek's log- writing system: Problem: Batch writing code duplication - Some log writers need to send multiple log records at a time in "batches". These include writers that send data to elasticsearch, splunk hec, kinesis, and various HTTP-based destinations. - Right now, each of these log writers has similar or identical code to create and manage batches. - This code duplication makes writing and maintaining "batching" log writers harder and more bug-prone. Proposed Solution: Add a new optional API for writing a batch all at once, while still supporting older log writers that don't need to write batches. Problem: Insufficient information about failures - Different log writers can fail in a variety of ways. - Some of these failure modes are amenable to automatic recovery within Zeek, and others could be corrected by an administrator if they knew about it. - However, the current system for writing log records returns a boolean indicating only two log writer statuses: "true" means "Everything's fine!", and "false" means "Emergency!!! The log writer needs to be shut down!" Proposed Solution: a. For non-batching log writers, change the "false" status to just mean "There was an error writing a log record". The log writing system will then report those failures to other Zeek components such as plug-ins, so they can monitor a log writer's health, and make more sophisticated decisions about whether a log writer can continue running or needs to be shut down. b. Batching log writers will have a new API anyway, so that will let log writers report more detail about write failures, including suggestions about possible ways to recover. -------------------------------------------------------------------------------- Design Details Current Implementation At present, log writers are C++ classes which descend from the WriterBackend pure-virtual superclass. Each log writer must override several pure virtual member functions, which include: * DoInit: Writer-specific initialization method. * DoWrite: Write one log record. Returns a boolean, where true means "everything's fine", and false means "things are so bad, the log writer needs to be shut down." Log writers can also optionally override this virtual member functions: * DoWriteLogs: Possibly writer-specific output method implementing recording zero or more log entries. The default implementation in the superclass simply calls DoWrite() in a loop. New Implementation This has two main goals: * Provide a new base class for log writers that supports writing a batch of records at once, handles all the batch creation and write logic, and offers more sophisticated per-record reporting on failures. * Provide backward compatibility so "legacy" (existing, non-batching) log writers can build and run without code changes, while changing the meaning of "false" when returned from DoWrite() to "sending this one log record failed." These goals will be achieved using three writer backend classes: 1. BaseWriterBackend This will be a virtual base class, and is a superclass for both legacy and batching log writers. - It will have the same API signature as the existing WriterBackend, except it will omit DoWrite(). - It will also expose the existing DoWriteLogs() member function as a pure virtual function, so there's a standard interface for WriterBackend::Write() to call. 2. WriterBackend This class will derive from BaseWriterBackend, and will support legacy log writers as a drop-in replacement for the existing WriterBackend class. - It will add a pure virtual DoWrite member function to BaseWriterBackend, so its API signature will be identical to the existing WriterBackend class. That will let legacy log writers inherit from it with no code changes, and also support new log writers that don't need batching. - The return semantics for DoWrite will change so when it returns false, that will simply mean the argument record wasn't successfully written. - Its specialization of DoWriteLogs will be nearly identical to Zeek's current implementation, except that when DoWrite returns false, DoWriteLogs will simply report the failure to the rest of Zeek, rather than triggering a log writer shutdown. Then, other Zeek components can monitor the writer's health and decide whether to shut down the log writer or let it continue. 3. BatchWriterBackend This class will derive from BaseWriterBackend, and will write logs in batches. - Instead of DoWrite, it will expose a DoWriteBatch pure virtual member function to accept logs in batches. - Its specialization of DoWriteLogs will call DoWriteBatch. - It will support configuring per-log-writer criteria that trigger flushing a batch, including: * Maximum age of the oldest cached log (default value TBD) * Maximum number of cached log records (default value TBD) - DoWriteBatch will support rejecting logs at random indices in the batch, and will report details on which logs were rejected and why. This is the proposed signature for DoWriteBatch: int BatchWriterBackend::DoWriteBatch( int num_writes, threading::Value*** vals, BatchWriterBackend::status_vector& failures ); where: num_writes = the number of log records in the batch vals = the values of the log records to be written failures = information about failed record writes The return value is the number of log records actually written. Compared to DoWriteLogs, DoWriteBatch omits the num_fields and fields arguments. Those aren't needed because the log writer already has those values, which were stored when they were supplied to its Init member function. The failures argument is a reference to a std::vector of structs the log writer can fill in with details on failures to write individual records. The individual status structs will generally look like this: struct status { int m_failed_record_index; uint32_t m_failure_reason; uint32_t m_recovery_suggestion; }; where: m_failure_reason indicates the general reason for the failure m_recovery_suggestion might contain a suggestion about handling the failure If DoWriteBatch() returns a number that's smaller than num_writes, and the failures vector is empty, the caller will assume all the failed records were at the end of the batch, and try to re-transmit them in a later batch. -------------------------------------------------------------------------------- I'd welcome any questions, suggestions or feedback. Bob Murphy | Corelight, Inc. | bob.murphy at corelight.com | www.corelight.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/zeek-dev/attachments/20200709/a4e60227/attachment-0001.html From johanna at corelight.com Thu Jul 9 19:16:47 2020 From: johanna at corelight.com (Johanna Amann) Date: Thu, 09 Jul 2020 19:16:47 -0700 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe In-Reply-To: References: Message-ID: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> On 9 Jul 2020, at 16:57, Bob Murphy wrote: > Right now, if you try to use Zeek's debug logging facilities in > DebugLogger.h concurrently from multiple threads, the contents of > debug.log can get mixed up and look like like "word salad". Is there a reason why you didn?t just use the Debug call of the threading framework (which goes through the message queues then ends up in debug.log)? Johanna From bob.murphy at corelight.com Fri Jul 10 10:53:37 2020 From: bob.murphy at corelight.com (Bob Murphy) Date: Fri, 10 Jul 2020 10:53:37 -0700 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe In-Reply-To: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> References: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> Message-ID: <576BA5D9-0682-4619-B8FD-13D1BDAC4979@corelight.com> Hi Johanna, I wasn?t aware of that call, but it also wouldn?t have done what I needed. If I understand the code correctly, each MsgThread has a FIFO queue that it pushes messages onto. Later on, the main thread occasionally runs a loop where it handles all the queued messages from the first MsgThread, then all the queued messages from the second MsgThread, etc. The development I was doing sometimes required me to examine the debug messages from different threads in the chronological order they were generated. But if I understand it correctly, the threading framework?s logging doesn?t maintain that ordering. Also, that work sometimes generated a LOT of debug messages - thousands or millions of lines of them - when only a tiny fraction of them were interesting. To cut down on the garbage, I used the DebugLogger class?s member functions to selectively enable and disable individual streams when particular conditions occurred. However, those member functions have immediate effect, and because the threading framework?s Debug member function emits log lines after a delay, it seems likely I would have not seen debug output I wanted to see, and seeing debug output I didn?t want to see. Best regards, Bob > On Jul 9, 2020, at 7:16 PM, Johanna Amann wrote: > > On 9 Jul 2020, at 16:57, Bob Murphy wrote: > >> Right now, if you try to use Zeek's debug logging facilities in DebugLogger.h concurrently from multiple threads, the contents of debug.log can get mixed up and look like like "word salad". > > Is there a reason why you didn?t just use the Debug call of the threading framework (which goes through the message queues then ends up in debug.log)? > > Johanna From jsiwek at corelight.com Mon Jul 13 13:42:03 2020 From: jsiwek at corelight.com (Jon Siwek) Date: Mon, 13 Jul 2020 13:42:03 -0700 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe In-Reply-To: <576BA5D9-0682-4619-B8FD-13D1BDAC4979@corelight.com> References: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> <576BA5D9-0682-4619-B8FD-13D1BDAC4979@corelight.com> Message-ID: On Fri, Jul 10, 2020 at 11:00 AM Bob Murphy wrote: > The development I was doing sometimes required me to examine the debug messages from different threads in the chronological order they were generated. But if I understand it correctly, the threading framework?s logging doesn?t maintain that ordering. Yeah, or at least the time associated with a Debug message is its time-of-processing, not time-of-generation. Can see how the latter is more useful, but want to discuss the proposed solution with a bit more detail? Does it involve a locked mutex around only the underlying fprintf() or something more? I imagine it should be "something more" if the requirement is to make debug.log a convenient way of understanding operation ordering among many threads. - Jon From bob.murphy at corelight.com Tue Jul 14 08:05:20 2020 From: bob.murphy at corelight.com (Bob Murphy) Date: Tue, 14 Jul 2020 08:05:20 -0700 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe In-Reply-To: References: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> <576BA5D9-0682-4619-B8FD-13D1BDAC4979@corelight.com> Message-ID: > On Jul 13, 2020, at 1:42 PM, Jon Siwek wrote: > > On Fri, Jul 10, 2020 at 11:00 AM Bob Murphy wrote: > >> The development I was doing sometimes required me to examine the debug messages from different threads in the chronological order they were generated. But if I understand it correctly, the threading framework?s logging doesn?t maintain that ordering. > > Yeah, or at least the time associated with a Debug message is its > time-of-processing, not time-of-generation. Can see how the latter is > more useful, but want to discuss the proposed solution with a bit more > detail? Does it involve a locked mutex around only the underlying > fprintf() or something more? I imagine it should be "something more" > if the requirement is to make debug.log a convenient way of > understanding operation ordering among many threads. > > - Jon My current implementation does just use a mutex to control access to the output file, and reports the time of generation. Outside of this email thread, one person has suggested adding something to each debugging log line to identify its source thread. That could potentially be the thread ID, or the thread name, or both. Another person who runs multiple Zeek instances concurrently also suggested adding the process ID to each log line. So I was planning to add those to each debug log line before doing a pull request to merge my changes to Zeek master. - Bob P.S. If Zeek were to emit a lot of debugging log lines from enough threads very quickly, it?s possible the mutex would add excessive overhead. Boost has a lockfree interthread queue that could be the nucleus of a solution for that, but that would be a lot more complicated than just using a mutex. So I don?t want to look further into that unless and until we know it?s really needed. From jsiwek at corelight.com Tue Jul 14 11:35:24 2020 From: jsiwek at corelight.com (Jon Siwek) Date: Tue, 14 Jul 2020 11:35:24 -0700 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe In-Reply-To: References: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> <576BA5D9-0682-4619-B8FD-13D1BDAC4979@corelight.com> Message-ID: On Tue, Jul 14, 2020 at 8:05 AM Bob Murphy wrote: > My current implementation does just use a mutex to control access to the output file, and reports the time of generation. I was also trying to break down a couple distinct requirements and wondered if that actually covers the 2nd: (1) Fix the "word salad" (2) Ability to examine debug output from multiple threads in chronological order Is it fine to just be able to understand the ordering of "when the fprintf() happened" or is what's really needed is to understand ordering of "when operations associated with debug messages happened" ? Thread 1: Foo(); LockedDebugMsg("I did Foo."); Thread 2: Bar(); LockedDebugMsg("I did Bar."); debug.log [Timestamp 1] I did Foo. [Timestamp 2] I did Bar. That debug.log doesn't really tell us whether Foo() happened before Bar(), right? - Jon From bob.murphy at corelight.com Tue Jul 14 11:56:55 2020 From: bob.murphy at corelight.com (Bob Murphy) Date: Tue, 14 Jul 2020 11:56:55 -0700 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe In-Reply-To: References: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> <576BA5D9-0682-4619-B8FD-13D1BDAC4979@corelight.com> Message-ID: > On Jul 14, 2020, at 11:35 AM, Jon Siwek wrote: > > On Tue, Jul 14, 2020 at 8:05 AM Bob Murphy wrote: > >> My current implementation does just use a mutex to control access to the output file, and reports the time of generation. > > I was also trying to break down a couple distinct requirements and > wondered if that actually covers the 2nd: > > (1) Fix the "word salad" > (2) Ability to examine debug output from multiple threads in chronological order > > Is it fine to just be able to understand the ordering of "when the > fprintf() happened" or is what's really needed is to understand > ordering of "when operations associated with debug messages happened" > ? > > Thread 1: > Foo(); > LockedDebugMsg("I did Foo."); > > Thread 2: > Bar(); > LockedDebugMsg("I did Bar."); > > debug.log > [Timestamp 1] I did Foo. > [Timestamp 2] I did Bar. > > That debug.log doesn't really tell us whether Foo() happened before > Bar(), right? > > - Jon The version I have definitely fixes #1, the word salad. It also fixes #2 in the sense that the output is in the same chronological order the calls to LockedDebugMsg occur. The code you show should give correct ordering on when Foo() and Bar() finish. If you also want to know when they start, you could do: Thread 1: LockedDebugMsg(?About to do Foo."); Foo(); LockedDebugMsg("I did Foo."); Thread 2: LockedDebugMsg(?About to do Bar."); Bar(); LockedDebugMsg("I did Bar.?); From jsiwek at corelight.com Tue Jul 14 13:14:50 2020 From: jsiwek at corelight.com (Jon Siwek) Date: Tue, 14 Jul 2020 13:14:50 -0700 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe In-Reply-To: References: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> <576BA5D9-0682-4619-B8FD-13D1BDAC4979@corelight.com> Message-ID: On Tue, Jul 14, 2020 at 11:56 AM Bob Murphy wrote: > The code you show should give correct ordering on when Foo() and Bar() finish. Wondering what's meant by "correct ordering" here. Bar() can finish before Foo() and yet debug.log can report "I did Foo" before "I did Bar" for whatever thread-scheduling reasons happened to make that the case. Or Foo() and Bar() can execute together in complete concurrency and it's just the LockedDebugMsg() picking an arbitrary "winner". - Jon From bob.murphy at corelight.com Tue Jul 14 14:58:20 2020 From: bob.murphy at corelight.com (Bob Murphy) Date: Tue, 14 Jul 2020 14:58:20 -0700 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe In-Reply-To: References: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> <576BA5D9-0682-4619-B8FD-13D1BDAC4979@corelight.com> Message-ID: <9585DDC6-82DD-4F42-935D-08B6F4100C3B@corelight.com> > On Jul 14, 2020, at 1:14 PM, Jon Siwek wrote: > > On Tue, Jul 14, 2020 at 11:56 AM Bob Murphy wrote: > >> The code you show should give correct ordering on when Foo() and Bar() finish. > > Wondering what's meant by "correct ordering" here. Bar() can finish > before Foo() and yet debug.log can report "I did Foo" before "I did > Bar" for whatever thread-scheduling reasons happened to make that the > case. Or Foo() and Bar() can execute together in complete concurrency > and it's just the LockedDebugMsg() picking an arbitrary "winner". > > - Jon I see your point. For example: a. Foo() in thread 1 finishes before Bar() in thread 2 finishes b. The scheduler deactivates thread 1 for a while between the return from Foo() and the execution of LockedDebugMsg("I did Foo.?) c. Thread 2 proceeds from the return from Bar() without interruption Then debug.log would contain the message ?I did Bar? before ?I did Foo?. So the ordering in the log file really reflects how the kernel sees the temporal order of mutex locking inside LockedDebugMsg. That?s an inexact approximation of the temporal order of calls to LockedDebugMsg, and that?s an even more inexact approximation of the temporal order of code executed before LockedDebugMsg. For what I was doing, though, that proved to be good enough. :-) I?d be very interested in ideas about how to improve that, especially if they?re simple. I can think of a way to improve it, but it would be substantially more complicated than just a mutex. From robin at corelight.com Wed Jul 15 00:52:17 2020 From: robin at corelight.com (Robin Sommer) Date: Wed, 15 Jul 2020 07:52:17 +0000 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe In-Reply-To: <9585DDC6-82DD-4F42-935D-08B6F4100C3B@corelight.com> References: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> <576BA5D9-0682-4619-B8FD-13D1BDAC4979@corelight.com> <9585DDC6-82DD-4F42-935D-08B6F4100C3B@corelight.com> Message-ID: <20200715075217.GF41059@corelight.com> Reading through this thread, I'm wondering if we should focus on improving identification of log lines in terms of where they come from and when they were generated, while keeping to go through the existing mechanism of sending messages back to main process for output (so that we don't need the mutex). If we sent timestamps & thread IDs along with the Debug() messages, one could later post-process debug.log to, get things sorted/split as desired. This wouldn't support the use case of "millions of lines" very well, but I'm not convinced that's what we should be designing this for. A mutex becomes potentially problematic at that volume as well, and it also seems like a rare use case to begin with. In cases where it's really needed, a local patch to get logs into files directly (as you have done already) might just do the trick, no? Robin On Tue, Jul 14, 2020 at 14:58 -0700, Bob Murphy wrote: > > > On Jul 14, 2020, at 1:14 PM, Jon Siwek wrote: > > > > On Tue, Jul 14, 2020 at 11:56 AM Bob Murphy wrote: > > > >> The code you show should give correct ordering on when Foo() and Bar() finish. > > > > Wondering what's meant by "correct ordering" here. Bar() can finish > > before Foo() and yet debug.log can report "I did Foo" before "I did > > Bar" for whatever thread-scheduling reasons happened to make that the > > case. Or Foo() and Bar() can execute together in complete concurrency > > and it's just the LockedDebugMsg() picking an arbitrary "winner". > > > > - Jon > > I see your point. > > For example: > a. Foo() in thread 1 finishes before Bar() in thread 2 finishes > b. The scheduler deactivates thread 1 for a while between the return from Foo() and the execution of LockedDebugMsg("I did Foo.?) > c. Thread 2 proceeds from the return from Bar() without interruption > > Then debug.log would contain the message ?I did Bar? before ?I did Foo?. > > So the ordering in the log file really reflects how the kernel sees the temporal order of mutex locking inside LockedDebugMsg. That?s an inexact approximation of the temporal order of calls to LockedDebugMsg, and that?s an even more inexact approximation of the temporal order of code executed before LockedDebugMsg. > > For what I was doing, though, that proved to be good enough. :-) > > I?d be very interested in ideas about how to improve that, especially if they?re simple. I can think of a way to improve it, but it would be substantially more complicated than just a mutex. > > > > _______________________________________________ > Zeek-Dev mailing list > Zeek-Dev at zeek.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From robin at corelight.com Wed Jul 15 01:09:15 2020 From: robin at corelight.com (Robin Sommer) Date: Wed, 15 Jul 2020 08:09:15 +0000 Subject: [Zeek-Dev] Proposal: Improve Zeek's log-writing system with batch support and better status reporting In-Reply-To: <8D06AACD-8721-4EDA-95BD-DAB3D60ACD84@corelight.com> References: <8D06AACD-8721-4EDA-95BD-DAB3D60ACD84@corelight.com> Message-ID: <20200715080915.GG41059@corelight.com> On Thu, Jul 09, 2020 at 18:19 -0700, Bob Murphy wrote: > Proposed Solution: Add a new optional API for writing a batch all at once, while > still supporting older log writers that don't need to write batches. That sounds good to me, a PR with the proposed API would be great. > a. For non-batching log writers, change the "false" status to just mean > "There was an error writing a log record". The log writing system will then > report those failures to other Zeek components such as plug-ins, so they can > monitor a log writer's health, and make more sophisticated decisions about > whether a log writer can continue running or needs to be shut down. Not quite sure what this would look like. Right now we just shut down the thread on error, right? Can you elaborate how "report those failures to other Zeek components" and "make more sophisticated decisions" would look like? Could we just change the boolean result into a tri-state (1) all good; (2) recoverable error, and (3) fatal error? Here, (2) would mean that the writer failed with an individual write, but remains prepared to receive further messages for output. We could the also implicitly treat a current "false" as (3), so that existing writers wouldn't even notice the difference (at the source code level at least). > b. Batching log writers will have a new API anyway, so that will let log > writers report more detail about write failures, including suggestions about > possible ways to recover. Similar question here: how would these "suggestions" look like? Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From bob.murphy at corelight.com Wed Jul 15 14:57:36 2020 From: bob.murphy at corelight.com (Bob Murphy) Date: Wed, 15 Jul 2020 14:57:36 -0700 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe In-Reply-To: <20200715075217.GF41059@corelight.com> References: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> <576BA5D9-0682-4619-B8FD-13D1BDAC4979@corelight.com> <9585DDC6-82DD-4F42-935D-08B6F4100C3B@corelight.com> <20200715075217.GF41059@corelight.com> Message-ID: <70CC6BD9-B068-4DBB-BD1F-D21208CA45CA@corelight.com> > > On Jul 15, 2020, at 12:52 AM, Robin Sommer wrote: > Reading through this thread, I'm wondering if we should focus on > improving identification of log lines in terms of where they come from > and when they were generated, while keeping to go through the existing > mechanism of sending messages back to main process for output (so that > we don't need the mutex). If we sent timestamps & thread IDs along > with the Debug() messages, one could later post-process debug.log to, > get things sorted/split as desired. > > This wouldn't support the use case of "millions of lines" very well, > but I'm not convinced that's what we should be designing this for. A > mutex becomes potentially problematic at that volume as well, and it > also seems like a rare use case to begin with. In cases where it's > really needed, a local patch to get logs into files directly (as you > have done already) might just do the trick, no? > > Robin We could definitely change DebugLogger to improve the log line identification, and route it through the threading framework?s Debug() call. That will avoid turning debug.log into "word salad?. However, that would also cause a delay in writing the log lines, and I've run into situations working on Zeek where that kind of delay would make debugging harder. For example, sometimes I run tail on the log file in a terminal window. Then, when the code hits a breakpoint in a debugger, I can analyze the program state by looking at log lines emitted right before the breakpoint triggers, and compare them to variable contents, the stack trace, etc. That won't work if logging is delayed. There are multiple, conflicting use cases for logging in Zeek. Sometimes a developer might think: - Maximized throughput is important, but a delay is okay - No delay can be tolerated, but slower throughput is okay - Correct temporal ordering in the log is (or isn?t) important - fflush() after every write is (or isn?t) important - Debug logging output should go to the debug.log file, or stdout, or somewhere else This is a pretty common situation around logging, in my experience. One way to solve it, as Robin says, is for a developer with a use case Zeek doesn't support to apply a temporary local patch. Unfortunately, that doesn't help other developers who might have the same use case. Also, I personally hate to spend time writing code and getting it to work well, and then throw it away. On other projects, I've used a different approach that's worked really well: use a single, common logging API, but let it send its output to different output mechanisms that support different use cases. Then a developer could pick the output mechanism that works best for their use case at runtime, using a command line option or environment variable. I think it wouldn?t be very complicated to add that to Zeek. - Bob From bob.murphy at corelight.com Wed Jul 15 17:45:11 2020 From: bob.murphy at corelight.com (Bob Murphy) Date: Wed, 15 Jul 2020 17:45:11 -0700 Subject: [Zeek-Dev] Proposal: Improve Zeek's log-writing system with batch support and better status reporting In-Reply-To: <20200715080915.GG41059@corelight.com> References: <8D06AACD-8721-4EDA-95BD-DAB3D60ACD84@corelight.com> <20200715080915.GG41059@corelight.com> Message-ID: > On Jul 15, 2020, at 1:09 AM, Robin Sommer wrote: > > On Thu, Jul 09, 2020 at 18:19 -0700, Bob Murphy wrote: > >> Proposed Solution: Add a new optional API for writing a batch all at once, while >> still supporting older log writers that don't need to write batches. > > That sounds good to me, a PR with the proposed API would be great. That?s sounds great. I wanted to bounce the ideas around with people who know more about Zeek than i do before going into detail on a proposed API. > >> a. For non-batching log writers, change the "false" status to just mean >> "There was an error writing a log record". The log writing system will then >> report those failures to other Zeek components such as plug-ins, so they can >> monitor a log writer's health, and make more sophisticated decisions about >> whether a log writer can continue running or needs to be shut down. > > Not quite sure what this would look like. Right now we just shut down > the thread on error, right? Can you elaborate how "report those > failures to other Zeek components" and "make more sophisticated > decisions" would look like? Yes, right now, any writer error just shuts down the entire thread. That?s a good solution for destinations like a disk, because if a write fails, something really bad has probably happened. But Seth Hall pointed out that some log destinations can recover, and it?s not a good solution for those. Here are a couple of examples: 1. A writer might send log records to a network destination. If the connection is temporarily congested, it would start working again when the congestion clears. 2. The logs go to another computer that?s hung, and everything would work again if somebody rebooted it. Seth's idea was to report the failures to a plugin that could be configured by an administrator. A plugin for a writer that goes to disk could shut down the writer on the first failure, like Zeek does now. And plugins for other writers could approach the examples above with a little more intelligence: 1. The plugin for the network destination writer could decide to shut down the writer only after no records have been successfully sent for a minimum of ten minutes. 2. The plugin for the remote-computer writer could alert an administrator to reboot the other computer. After that, the writer would successfully resume sending logs. > Could we just change the boolean result into a tri-state (1) all good; > (2) recoverable error, and (3) fatal error? Here, (2) would mean that > the writer failed with an individual write, but remains prepared to > receive further messages for output. We could the also implicitly > treat a current "false" as (3), so that existing writers wouldn't even > notice the difference (at the source code level at least). I don?t think that would work, because the member function in question returns a bool. To change that return value to represent more than two states, we?d have to do one of two things: 1. Change that bool to some other type. If we did that, existing writers wouldn?t compile any more. 2. Use casts or a union to store and retrieve values other than 0 and 1 in that bool, and hope those values will be preserved across the function return and into the code that needs to analyze them. We can?t count on values other than 0 or 1 being preserved, because the bool type in C++ is a little weird, and some behaviors are implementation-dependent. I wrote a test program using a pointer to store 0x0F into a bool, and other than looking at it in a debugger, everything I did to read the value out of that bool turned it into 0x01, including assigning it to another bool or an int. The only thing that saw 0x0F in there was taking a pointer to the bool, casting it to a pointer to char or uint8_t, and dereferencing that pointer. > >> b. Batching log writers will have a new API anyway, so that will let log >> writers report more detail about write failures, including suggestions about >> possible ways to recover. > > Similar question here: how would these "suggestions" look like? For batching, I was thinking of having a way to send back a std::vector of structs that would be something like this: struct failure_info { uint32_t index_in_batch; uint16_t failure_type; uint16_t recovery_suggestion; }; The values of failure_type would be an enumeration indicating things like ?fatal, shut down the writer?, ?log record exceeds protocol limit?, ?unable to send packet?, ?unable to write to disk?, etc. Using a fixed-size struct member that?s larger than the enum would allow extra values to be added in the future. recovery_suggestion would be a similar enum-in-larger-type, and let the writer convey more information, based on what it knows about the log destination. That could indicate things like, ?the network connection has entirely dropped and no recovery is possible?, ?the network connection is busy, try again later?, ?this log record is too large for the protocol, but re-sending it might succeed if it?s truncated or split up?, etc. - Bob From seth at corelight.com Thu Jul 16 05:46:01 2020 From: seth at corelight.com (Seth Hall) Date: Thu, 16 Jul 2020 08:46:01 -0400 Subject: [Zeek-Dev] Proposal: Improve Zeek's log-writing system with batch support and better status reporting In-Reply-To: References: <8D06AACD-8721-4EDA-95BD-DAB3D60ACD84@corelight.com> <20200715080915.GG41059@corelight.com> Message-ID: On 15 Jul 2020, at 20:45, Bob Murphy wrote: >> On Jul 15, 2020, at 1:09 AM, Robin Sommer >> wrote: >> >> Not quite sure what this would look like. Right now we just shut down >> the thread on error, right? Can you elaborate how "report those >> failures to other Zeek components" and "make more sophisticated >> decisions" would look like? > > Yes, right now, any writer error just shuts down the entire thread. > > That?s a good solution for destinations like a disk, because if a > write fails, something really bad has probably happened. But Seth Hall > pointed out that some log destinations can recover, and it?s not a > good solution for those. More or less this is the same sort of thing that I'm always pushing for to move more functionality into scripts. If I got an event in scriptland I might be able to determine what resulting action to take in the script and whether or not to shut down the writer or to let it keep going. > For batching, I was thinking of having a way to send back a > std::vector of structs that would be something like this: > > struct failure_info { > uint32_t index_in_batch; > uint16_t failure_type; > uint16_t recovery_suggestion; > }; This is almost starting to sound a bit more complicated than is worth it. We may need to discuss this a bit more to figure out something simpler. The immediate problem that springs to mind is that as a developer, I don't think I'd have any clue what failure_types and recovery_suggestions could be common among export destinations. .Seth -- Seth Hall * Corelight, Inc * www.corelight.com From bob.murphy at corelight.com Thu Jul 16 17:15:38 2020 From: bob.murphy at corelight.com (Bob Murphy) Date: Thu, 16 Jul 2020 17:15:38 -0700 Subject: [Zeek-Dev] Proposal: Improve Zeek's log-writing system with batch support and better status reporting In-Reply-To: References: <8D06AACD-8721-4EDA-95BD-DAB3D60ACD84@corelight.com> <20200715080915.GG41059@corelight.com> Message-ID: <1509B814-58A1-4FF0-A334-579E5DE882AA@corelight.com> >> For batching, I was thinking of having a way to send back a std::vector of structs that would be something like this: >> >> struct failure_info { >> uint32_t index_in_batch; >> uint16_t failure_type; >> uint16_t recovery_suggestion; >> }; > > This is almost starting to sound a bit more complicated than is worth it. We may need to discuss this a bit more to figure out something simpler. The immediate problem that springs to mind is that as a developer, I don't think I'd have any clue what failure_types and recovery_suggestions could be common among export destinations. Seth and I were talking today, and came up with something like this: struct failure_info { uint32_t first_index; uint16_t index_count; uint16_t failure_type; }; Here?s how it would work: 1. The batch writing function would return a std::vector of these. If the entire batch wrote successfully, the vector would be empty. 2. The failure_type value would still indicate generally what happened, with predefined values indicating things like ?network failure?, ?protocol error?, ?unable to write to disk?, or ?unspecified failure". Seth thought we?d be likely to start out with about ten values like this. Using a 32-bit value for this provides lots of room for expansion :-) and maintain reasonable alignment within the struct. 3. first_index and index_count would specify a range. That way, if several successive log records aren?t sent for the same reason, that could be represented by a single struct, instead of a different struct for each one. This drops the recovery suggestion. The sizes of the struct fields are currently set to pack nicely into eight bytes, with no wasted space either within the struct or between structs in an array. We could make the fields different sizes, though. From robin at corelight.com Fri Jul 17 02:54:04 2020 From: robin at corelight.com (Robin Sommer) Date: Fri, 17 Jul 2020 09:54:04 +0000 Subject: [Zeek-Dev] Proposal: Improve Zeek's log-writing system with batch support and better status reporting In-Reply-To: <1509B814-58A1-4FF0-A334-579E5DE882AA@corelight.com> References: <8D06AACD-8721-4EDA-95BD-DAB3D60ACD84@corelight.com> <20200715080915.GG41059@corelight.com> <1509B814-58A1-4FF0-A334-579E5DE882AA@corelight.com> Message-ID: <20200717095404.GC43266@corelight.com> On Thu, Jul 16, 2020 at 17:15 -0700, Bob Murphy wrote: > Here?s how it would work: It would be helpful to see a draft API for the full batch writing functionality to see how the pieces would work together. Could you mock that up? That said, couple of thoughts: > 2. The failure_type value would still indicate generally what > happened, with predefined values indicating things like ?network > failure?, ?protocol error?, ?unable to write to disk?, or > ?unspecified failure". In my experience, such detailed numerical error codes are rarely useful in practice. Different writers will implement them to different degrees and associate different semantics with them, and callers will never quite know what to expect and how to react. Do you actually need to distinguish the semantics for all these different cases? Seems an alternative would be having a small set of possible "impact" values telling the caller what to do. To take a stab: - temporary error: failed, but should try again with same log data - error: failed, and trying same log data again won't help; but ok to continue with new log data - fatal error: Panic, shutdown writer. Depending on who's going to log failures, we could also just include a textual error message as well. Logging is where more context seems most useful I'd say. > 3. first_index and index_count would specify a range. That way, if > several successive log records aren?t sent for the same reason, that > could be represented by a single struct, instead of a different struct > for each one. One reason I'm asking about the full API is because I'm not sure where the ownership of logs resides that fail to write. Is the writer keeping them? If so, it could handle the retry case internally. If the writers discards after failure, and the caller needs to send the data again, I'd wonder if there's a simpler return type here where we just point to the first failed entry in the batch. The writer would simply abort on first failure (how likely is it really that the next succeeds immediately afterwards?) And just to be clear why I'm making all these comments: I'm worried about the difficulty of using this API, on both ends. The more complex we make the things being passed around, the more difficult it gets to implement the logic correctly and efficiently. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From robin at corelight.com Fri Jul 17 03:01:38 2020 From: robin at corelight.com (Robin Sommer) Date: Fri, 17 Jul 2020 10:01:38 +0000 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe In-Reply-To: <70CC6BD9-B068-4DBB-BD1F-D21208CA45CA@corelight.com> References: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> <576BA5D9-0682-4619-B8FD-13D1BDAC4979@corelight.com> <9585DDC6-82DD-4F42-935D-08B6F4100C3B@corelight.com> <20200715075217.GF41059@corelight.com> <70CC6BD9-B068-4DBB-BD1F-D21208CA45CA@corelight.com> Message-ID: <20200717100138.GD43266@corelight.com> On Wed, Jul 15, 2020 at 14:57 -0700, Bob Murphy wrote: > use a single, common logging API, but let it send its output to > different output mechanisms that support different use cases. I get that in general. It's just that afaik this is the first time this need comes up. Adding a full-featured, thread-safe logging framework is a trade-off against complexity and maintainance costs. Not saying it's impossible, but I'd like to hear more people thinking this is a good idea before committing to such a route. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From bob.murphy at corelight.com Sat Jul 18 13:48:33 2020 From: bob.murphy at corelight.com (Bob Murphy) Date: Sat, 18 Jul 2020 13:48:33 -0700 Subject: [Zeek-Dev] Proposal: Make Zeek's debug logging thread-safe In-Reply-To: <20200717100138.GD43266@corelight.com> References: <0E62D8DB-CD97-44D5-9173-02C7DD175320@corelight.com> <576BA5D9-0682-4619-B8FD-13D1BDAC4979@corelight.com> <9585DDC6-82DD-4F42-935D-08B6F4100C3B@corelight.com> <20200715075217.GF41059@corelight.com> <70CC6BD9-B068-4DBB-BD1F-D21208CA45CA@corelight.com> <20200717100138.GD43266@corelight.com> Message-ID: > On Jul 17, 2020, at 3:01 AM, Robin Sommer wrote: > > On Wed, Jul 15, 2020 at 14:57 -0700, Bob Murphy wrote: > >> use a single, common logging API, but let it send its output to >> different output mechanisms that support different use cases. > > I get that in general. It's just that afaik this is the first time > this need comes up. Adding a full-featured, thread-safe logging > framework is a trade-off against complexity and maintainance costs. > Not saying it's impossible, but I'd like to hear more people thinking > this is a good idea before committing to such a route. > > Robin I completely agree about that trade-off, which is why the work I?ve done so far is pretty simple. It doesn?t change the existing DebugLogger system other than adding thread safety. Then on the side, there are a few optional features like a scoping utility class and some preprocessor macros. That said, different developers have different debugging styles, and I'm a big fan of using feature-rich debug logging frameworks with multiple operating modes and destinations, because they let me fix bugs and write new code much faster than I could otherwise. Writing a powerful debug logging system does take time and effort, but my experience has been that once it?s finished, it usually doesn't require much ongoing maintenance. Working on open-source and commercial projects with lifetimes of more than a few years, I?ve always seen that time and effort pay for itself many, many times over by making it quicker and easier to diagnose bugs, write new features, and do performance enhancements. That?s especially been true when I?ve worked on code that handled large volumes of data, like Zeek does. If I need to track down a bug in a stream of data that doesn?t manifest until megabytes have gone by, I usually find it the quickest approach is to run the software and search for a diagnostic pattern in a gigantic log file, compared to other approaches like spending hours hitting the same debugger breakpoint over and over again. - Bob From johanna at icir.org Wed Jul 22 11:32:00 2020 From: johanna at icir.org (Johanna Amann) Date: Wed, 22 Jul 2020 11:32:00 -0700 Subject: [Zeek-Dev] Zeek mailing list move (zeek.org -> lists.zeek.org) Message-ID: <0DB62DEF-66AE-4553-820F-14BAED24F084@icir.org> Hello everyone, We are going to switch the zeek.org mailing lists to a new provider on Monday the 27th. This change means that the domain-part of all zeek.org mailing lists is going to change from ?zeek.org? to ?lists.zeek.org?. What changes does this entail / what does this mean for you: * All zeek.org mailing list domains will switch to lists.zeek.org. So, ?zeek at zeek.org? will be ?zeek at lists.zeek.org? afterwards. However, you will still be able to send messages to the old list address for the foreseeable future - they will automatically be forwarded to the new address If you are using mailing list filters to automatically sort Zeek mailing lists into folders, you will probably have to update them. * The mailing list archives and administrative interface will move to https://lists.zeek.org/. The old interface at http://mailman.icsi.berkeley.edu/mailman/listinfo will no longer be available; archives will also no longer be available at the old address. * Your subscription will automatically move, you do not have to take any action. When will this happen: * This change will happen on Monday the 27th of July, starting at approximately 9am PDT/noon EDT/4pm GMT/5pm BST/6pm CEST. Messages sent to the Zeek mailing lists during this time will be held. We will try to make sure that any messages that happen to be sent during this timeframe will make it over after the migration, but your message will probably make it faster if you wait till we are done. * The change will take a few hours; I will send another message to the individual lists once migration is done. Why are we moving the mailing lists: The current setup that we are using is being retired and we have to switch to a new provider. We are switching to a new domain because this makes our setup easier to maintain. If you have any questions or concerns, please let me know. Johanna