[Bro-Dev] Package manager meta data

Sat Oct 29 11:28:02 PDT 2016

I'll just jump in at the end of the thread: Switching to fully
self-describing packages sounds great to me, that should solve all the
issues I noticed. I also don't quite recall the reasoning for arriving
at the current scheme, but it was probably a combination of iterating
over the design a few times, along with a desire to keep it simple.
But having something like a cronjob, or git hook, trigger a rebuild of
a central cache seems easy enough, and would be a major usability
improvement.

I believe the main thing to consider is making it really easy for
package sources (in particular external ones not maintained by the
project) to run the meta data aggregation. Maybe that additional "git
commit && push" could even be integrated into an additional
server-side bro-pkg command. One could then drive that from either
cron or git hooks (if a source operator can do hooks that will avoid
any delays at all).

Robin

On Sat, Oct 29, 2016 at 17:01 +0000, you wrote:

> 
> > On Oct 28, 2016, at 5:52 PM, Jan Grashöfer <jan.grashoefer at gmail.com> wrote:
> > 
> > Correct me if I am wrong
> > but bro-pkg.meta contains stuff like script_dir and dependencies (so
> > rather technically), whereas bro-pkg.index contains the descriptive
> > information like info text and tags (which is metadata, too, one could
> > even argue it's "more meta" than script_dir etc.).
> 
> That’s right.  The way I was thinking about how it’s split up is: if the metadata is related to how users will search for and discover new packages, then put it bro-pkg.index.  Else it’s likely related to how the package will interoperate with bro, bro-pkg, other packages, etc., and that goes in bro-pkg.meta.
> 
> > I think the most desirable solution would be to have a
> > single file to put the meta data in, so that a package is completely
> > self-describing. This would also allow to provide different descriptions
> > for different versions.
> 
> Yes, I also think each package maintaining just it’s own, single metadata file is better.  It also means that if the package author ever registered their package with multiple sources, they don’t have to maintain the same bro-pkg.index in multiple places.
> 
> I don’t remember if we just settled on the current implementation because it was quick/easy or there were objections to other more complicated technical solutions.
> 
> > Regarding the technical solution, I'll try to sum up: Using a
> > distributed structure implies that important information is distributed,
> > too. I think the first question is, where to aggregate the information?
> > One could either maintain a cache in every client or integrate it into
> > the list of packages aka the public repository
> 
> Aggregating it into the package source is a better solution than having every client do it.  The later isn’t going to scale well:  the client will take longer and longer over time as more and more packages get registered to a source.  Also takes longer as a function of total number of release versions a package has because we are collecting metadata for each version.  Rather not ask users to just get used to developing more patience over time.
> 
> > The second question would be, whether and how to synchronize the
> > information? If the info is part of the repository this can be either
> > done manually (more or less the overriding solution of the current
> > implementation, assuming that the developers keep meta data in sync) or
> > automatically (e.g., by a script that fetches meta data of packages once
> > a day).
> 
> I’d opt for a daily cron job to aggregate metadata into package sources.
> 
> > If the cache is part of the client, this could be done based on
> > an expiration threshold or intentionally by the user (similar to dnf).
> > Finally one could drop the requirement of synced package and repository
> > meta data, risking to confuse the users. In that case the information
> > contained in the package should be used whenever possible (e.g., the
> > info command for a not installed package could obtain the most recent
> > information from the package's git repo).
> 
> It’s not a problem for the metadata to be out of sync for a day since only the “search” command is going to be using the aggregated data.  Other commands would have direct access to accurate metadata since they’ve already cloned the package locally.
> 
> It would also be trivial to give users access to the aggregation tool if they have a problem with potentially using day-old metadata in their searches and are prepared to wait however long the aggregation process takes.
> 
> E.g. we add this command/flag: `bro-pkg refresh —aggregate-metadata`
> 
> Then the only difference between the daily aggregation process and a user is that the daily process does a `git commit && git push` in the locally cloned package source that bro-pkg is using internally.
> 
> > Another question: Now that repositories only contain bro-pkg.index files
> > with links instead of submodules, how are deleted/unavailable packages
> > detected/removed?
> 
> At the moment, they’d have to be removed manually whenever someone notices or reports it.
> 
> If we switch to automated metadata aggregation, removal of nonexistent packages could naturally be a part of that.
> 
> - Jon
> 
> _______________________________________________
> bro-dev mailing list
> bro-dev at bro.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
> 

-- 
Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin