Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

comment on

( [id://3333] : superdoc . print w/replies, xml ) Need Help??
For your original question, I'd try something like the following. (it is pretty rough, but its heart is in the right place).
#!/usr/bin/perl -w use strict; my @packets=('abcdef','123456','abc123'); my $size; for $size(2..4){ print "size=$size\n"; my %substrs; my $packet; foreach $packet(@packets){ my %data; for(0..(length($packet)-$size)){ $data{substr($packet,$_,$size)}=1; } my $k; foreach $k(keys %data){ if(defined $substrs{$k}){ $substrs{$k}++; }else{ $substrs{$k}=1; } } } foreach((sort {$substrs{$b} <=> $substrs{$a}} keys %substrs)[0..5]){ print "$_ $substrs{$_}\n"; } }
It isn't really efficient, but it will tell you which substrings of a particular length are most common across packets. It will tell you the most common substrings of a particular length. Answering your actual question "most common, longest substrings" is harder since you're trying to optomize 2 criteria at the same time. Which is better, a 5 character string that happens 20 times or a 20 character string tha happens 5 times?

However, in thinking about your problem in general I'd do an analysis something like this:

  1. Do a statistical analysis of the raw data to determine if it is encrypted, and if it is encrypted well. If it is encrypted well, it will be statistically indistinguisable from random noise. If it is encrypted poorly it will be somewhat distinguishable from random noise. I used to have a good reference to some algorithms for performaning this kind of analysis but can't find them right now.
  2. Compare (by hand) the same transaction done several times from several different hosts. Can you pick anything out.
  3. Since you said these are UDP packets, can you "replay" them from a different host to cause the same event?

In reply to Re: Finding patterns in packet data? by lhoward
in thread Finding patterns in packet data? by Guildenstern

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.