[Bro-Dev] Package manager meta data

Siwek, Jon jsiwek at illinois.edu
Mon Oct 31 11:52:41 PDT 2016


> On Oct 31, 2016, at 12:41 PM, Jan Grashöfer <jan.grashoefer at gmail.com> wrote:
> 
>> Theoretically, if the package was just temporarily unavailable, the next time the aggregation process runs, it would get listed again
> 
> How, if it is completely removed?

Oh, duh, I see what you mean.  I guess the answer is related to something we haven’t yet spec’d out: how should the structure of a package source’s index files change to adapt to the new scheme of aggregating metadata?

A package source could look like:

https://github.com/bro/packages
	0xxon/
		packages.index
		bro-sumstats-counttable.meta
	sethhall/
		packages.index
		credit-card-exposure.meta
		ssn-exposure.meta
		domain-tld.meta

Contents of sethhall/packages.index:

	https://github.com/sethhall/credit-card-exposure
	https://github.com/sethhall/ssn-exposure
	https://github.com/sethhall/domain-tld

Contents of sethhall/ssn-exposure.meta:

	# Automatically generated, do not edit.
	[master]
	url = https://github.com/sethhall/ssn-exposure
	tags = file analysis, social security number, ssn, dlp, data loss
	description = Detect and log US Social Security numbers.
	script_dir = scripts

	[1.0.0]
	…

	[2.0.0]
	…

The packages.index files are manually modified by users during the act of package registration.  The *.meta files are automatically created by the metadata aggregation process as it crawls the URLs listed in packages.index.

If a package is in packages.index, we say that its state is “registered”.  Then, once it has a *.meta file, we say that its state is “listed”.  If a package is “listed”, then bro-pkg users can see it show up from “search” and “list” commands.  If the metadata aggregation process finds an invalid/unreachable package, it removes it’s *.meta file, but keeps it “registered" in packages.index, so the next crawl will still attempt to list the package in case it was just temporarily unavailable.

Thoughts?  Is it useful to collect metadata for each version or just the latest?  “Latest" here would mean the latest release version tag or, if none exist, the latest master branch commit.

If per-version metadata collection isn’t needed, the structure outlined above still works, but the existing structure would alsol: just stick latest metadata directly into bro-pkg.index (mixing autogenerated data w/ user-entered data).

- Jon



More information about the bro-dev mailing list