[Bro] Detecting software components that do strange dns queries
C. L. Martinez
carlopmart at gmail.com
Fri Mar 22 08:06:02 PDT 2013
Vlad Grigorescu <vladg at cmu.edu>
> You can do character frequency analysis with a simple Bro script. Look at <http://www.bro.org/documentation-git/scripts/base/strings.bif.html> to see the functions you can use for strings.
>
> I think that this is asking the wrong question, however. I'd be amazed if you could reliably determine "good" domains from "bad" domains based simply on character frequency analysis. Bro can calculate entropy for you: <http://www.bro.org/documentation/scripts/base/bro.bif.html#id-find_entropy>. That being said, I don't think entropy is the right answer either.
>
> Here are the entropy results (in no particular order) for the 4 domains you listed and for 4 very common domains (google.com, twitter.com, fbcdn.net and amazon.co.uk):
>
> [entropy=2.646439, chi_square=450.8, mean=100.2, monte_carlo_pi=4.0, serial_correlation=0.096875]
> [entropy=3.085055, chi_square=400.538462, mean=104.692308, monte_carlo_pi=4.0, serial_correlation=-0.005991]
> [entropy=3.095795, chi_square=338.090909, mean=106.727273, monte_carlo_pi=4.0, serial_correlation=0.062381]
> [entropy=3.027169, chi_square=384.636364, mean=104.727273, monte_carlo_pi=4.0, serial_correlation=0.011643]
> [entropy=3.182006, chi_square=424.857143, mean=105.5, monte_carlo_pi=4.0, serial_correlation=-0.050923]
> [entropy=2.947703, chi_square=303.888889, mean=98.0, monte_carlo_pi=4.0, serial_correlation=-0.316796]
> [entropy=3.084963, chi_square=372.0, mean=97.666667, monte_carlo_pi=4.0, serial_correlation=-0.248104]
> [entropy=2.845351, chi_square=431.181818, mean=102.818182, monte_carlo_pi=4.0, serial_correlation=-0.322755]
>
> I don't know about you, but I can't tell which are good and which are bad. I suspect that DNS names are too short of a sample to provide any meaningful data.
>
> I think you should focus instead on the behavior that you're trying to detect. Looking at your example below, some alerts that'd be more useful might be:
>
> - Too many NXDOMAIN queries.
> - A query that resolves to an ISC sinkhole.
> - Queries for a domain that no one else queried.
> - Repetitive queries every X seconds with little to no deviation.
> - Queries for a domain that you haven't seen before.
>
> Hope this helps,
>
> --Vlad
>
Many many thanks Vlad for your explanation ... I'll think about it this weekend
