Johanna, thanks for your patient and detailed replies.

I think it is the latter. For example, some external connections (let's say
a HTTP Get/Post to server A, with a binary string with signature S1) will
raise some activities of local network components B and C, where the
traffic is associated with a signature S2. Then we would like to *learn*
such a pattern (HTTP_Get/Post_A, S1, B, C, S2) as a pattern signature. So
we consider "tokenize" the byte stream to extract and cluster the strings
from raw payload.
