Guildenstern has asked for the wisdom of the Perl Monks concerning the following question:

I am testing a product, and am trying to determine how secure the remote adminstration tool is. I have captured many packets while performing normal admin type tasks. I have this information in a text file, with only the data sections of each packet present. (From the UDP checksum to the end of the packet.) What I want to do is write a script that will check outgoing and incoming packets for repetitions of patterns, since there is supposed to be a "magic number" embedded in the packet to identify it as coming from the correct application, and to see how well encrypted the password is (username is sent cleartext!). What would be a good way to go about finding the largest sequence of bytes that show up in the largest number of packets? I've hacked quite a few scripts in my day, but they were pretty simple. I have a feeling that if I go at this without a little direction the results could be particularly ugly. Any suggestions (hashes, regexp, invocation of minor deities) are welcome.

Replies are listed 'Best First'.
(jjhorner)Finding patterns in packet data?
by jjhorner (Hermit) on Aug 04, 2000 at 21:26 UTC

    Have you ever looked at snort?

    How are you going to sniff the packets? tcpdump? Or are you working on a Windows box?

    I am not sure why you are looking for a magic number in the packet. The packet should have some port information, and the tool should be working on a port. Just sniff packets to the server port and to the client port.

    Let us know more info if you have it.

    UPDATE: I'm obliged to remind all that sniffing packets across a network that is not under your command is wrong and is considered computer abuse. You can be prosecuted. Don't try to break the security and don't try to make the administrator's job any harder.

    J. J. Horner
    Linux, Perl, Apache, Stronghold, Unix
Re: Finding patterns in packet data?
by lhoward (Vicar) on Aug 04, 2000 at 22:26 UTC
    For your original question, I'd try something like the following. (it is pretty rough, but its heart is in the right place).
    #!/usr/bin/perl -w use strict; my @packets=('abcdef','123456','abc123'); my $size; for $size(2..4){ print "size=$size\n"; my %substrs; my $packet; foreach $packet(@packets){ my %data; for(0..(length($packet)-$size)){ $data{substr($packet,$_,$size)}=1; } my $k; foreach $k(keys %data){ if(defined $substrs{$k}){ $substrs{$k}++; }else{ $substrs{$k}=1; } } } foreach((sort {$substrs{$b} <=> $substrs{$a}} keys %substrs)[0..5]){ print "$_ $substrs{$_}\n"; } }
    It isn't really efficient, but it will tell you which substrings of a particular length are most common across packets. It will tell you the most common substrings of a particular length. Answering your actual question "most common, longest substrings" is harder since you're trying to optomize 2 criteria at the same time. Which is better, a 5 character string that happens 20 times or a 20 character string tha happens 5 times?

    However, in thinking about your problem in general I'd do an analysis something like this:

    1. Do a statistical analysis of the raw data to determine if it is encrypted, and if it is encrypted well. If it is encrypted well, it will be statistically indistinguisable from random noise. If it is encrypted poorly it will be somewhat distinguishable from random noise. I used to have a good reference to some algorithms for performaning this kind of analysis but can't find them right now.
    2. Compare (by hand) the same transaction done several times from several different hosts. Can you pick anything out.
    3. Since you said these are UDP packets, can you "replay" them from a different host to cause the same event?
      Thanks for the start. I can always modify it for larger strings and see what happens. As far as your analysis steps:
      1. According to the vendor, it's encrypted. No word on how strong they beleive it is. I performed actions that caused known data to be sent, so it should be fairly simple to tell how strong the encryption is. The problem here is that I have no clue how to go about it.
      2. The software is set up such that we can only use the two boxes it's installed on. I have run the same actions several different times, however, and there does appear to be similarities in the packet data.
      3. Did not think of that, but it sounds like an interesting test. Will investigate how to do it.
One possible answer
by tilly (Archbishop) on Aug 04, 2000 at 22:49 UTC
    I had an OK idea for finding the patterns of maximum length that appear in at least n of the packets. This could be sped up immensely. You can also keep an array of the top packs seen and change this to, say, return the top 100 length patterns seen.

    Coding it was not as easy as I hoped. :-(

    use strict; # Takes the minimum count you are interested in and an array of string +s, # returns the list of patterns of the maximum size that can be found i +n # at least that threshold of the strings, sorted by number of strings. # sub find_max_pat { my $min = shift; # Sanity check return () unless $min <= scalar (@_); # %active_pats are the counts of the size of remaining active patt +erns my %active_pats = ('', scalar (@_)); # @searches contain search information for all strings my @searches = map {new Search::Strings($_)} @_; my %last_pats = (); while (%active_pats) { %last_pats = %active_pats; %active_pats = (); # Get counts at the next length foreach my $search (@searches) { foreach my $pat ($search->inc(\%last_pats)) { ++$active_pats{$pat}; } } # Remove patterns below our threshold foreach my $pat (keys %active_pats) { delete $active_pats{$pat} unless ($min <= $active_pats{$pa +t}); } } return keys %last_pats; } package Search::Strings; use strict; sub inc { my $self = shift; my $is_cont = shift; my $str = $self->{str}; my $len = length($str); my $pat_len = ++$self->{pat_len}; my %cur_pat; foreach my $pat (keys %{$self->{pats}}) { next unless exists $is_cont->{$pat}; my $spots = $self->{pats}{$pat}; if ($len < $pat_len + $spots->[-1]) { # This pattern reached the end of the string pop @$spots; } foreach my $spot (@$spots) { push @{$cur_pat{ substr($str, $spot, $pat_len) }}, $spot; } } $self->{pats} = \%cur_pat; return keys %cur_pat; } sub new { my $class = shift; my $self = {}; $self->{str} = shift; $self->{pat_len} = 0; my %pats = ('', [0..(length($self->{str}) - 1)]); $self->{pats} = \%pats; return bless $self, $class; }
Re: Finding patterns in packet data?
by young perlhopper (Scribe) on Aug 04, 2000 at 23:00 UTC
    Some researches at IBM (i think) developed an algorithm called 'teresias' (named for the blind seer of greek mythology), the purpose of which was to discover all the maximal patterns present in several protein chains. It was successfully adapted by a group somewhere to recognize known network intrusion attempts. I have read most of the paper that the IBM group wrote, and it sounds to me like your problem could be solved by the same approach.

    I don't know where to find the paper or any online information off-hand, but if you email me at I will dig up the paper and let you know what journal it appeared in.

    Good luck,

(Guildenstern) Re: Finding patterns in packet data?
by Guildenstern (Deacon) on Aug 04, 2000 at 21:33 UTC
    I have already sniffed the packets using Win2K's built in network monitor. I have text files that contain the byte information of each packet sent between the boxes. The application that I am testing claims to insert a special value into each packet so that the receiving application knows it came from the correct source. So, basically what I have is a lot of hex data that I'm trying to compare to see if I can determine what the "magic number" is, if it changes between runs, etc. Basically the problem is - given several sets of hex data, how do I find the patterns that show up in the largets number of sets?

      You should be able to tell where the number will be in the packet. Once you do, you will have to either search the hex data for the number at that substr, or unpack and search the new data.

      J. J. Horner
      Linux, Perl, Apache, Stronghold, Unix
Re: Finding patterns in packet data?
by Anonymous Monk on Aug 04, 2000 at 22:37 UTC
    Ok. Maybe I'm not being clear enough about what I'm trying to do. Here's an example of what I've got:
    1C 30 00 04 E9 C9 00 06 81 00 00 5C 2C 9F 3F 84 94 81 36 B9 00 00 00 00 00 00 00 00 61 64 6D 69 6E 00 00 00 00 00 00 00 00 00 00 00

    This is the part of the UDP packet immediately after the UDP checksum with the trailing padding removed. This is, then, the meat of the packet. What I'm looking for is an unknown length value in this meat that is also in a large percentage of all the other meats. It won't be in all, because there will be a few challenge/response packet, etc.
    Do I compare packet A to B and see what matches, then A to C, A to D, etc and see what I get? This seems to be very cumbersome.
      My answer (above) works as follows:
      For every length you want to consider For each packet for each substring of length N increment count of N's occurrence by 1 Look at all the data you've amassed about which substrings occur with what frequency and spit out some data.
      Unless you specify what your "length vs frequency" preference is you can not get a generic answer. If what you are looking for is "what are the largest, most common substrings that occur in more than %90 of the packets" then you can do it.
RE: Finding patterns in packet data?
by eduardo (Curate) on Aug 04, 2000 at 22:45 UTC
    ok, i'm not thinking well enough right now to do this, but shouldn't it be possible to turn the info into a vec and then use the bitwise AND operator and get the similarities in the packet? again, i'm too tired to try to implement it... maverick why don't you give it a try?
(Guildenstern) Re: Finding patterns (prelim code)
by Guildenstern (Deacon) on Aug 07, 2000 at 18:58 UTC
    Here's what I've come up with so far. Basically I iterate over each packet and generate substrings from size 2 to the size of the packet. I then compare that against all of the other packets. It needs some tweaking in that area, but I get output that makes sense. Only problem is that on my machine, running it with large packets kills the OS after an hour or so. Apparently I need to make some memory optimizations somewhere. All comments are appreciated! Windows environ - no #!
    die "No file name" unless @ARGV; my @packets = (); while (<>) { push(@packets,$_); } chop(@packets); my $maxlen =0; my $index = 0; my $curpack; my %all; # iterate over all packets foreach $curpack (@packets) { $maxlen = length($curpack); # substring size for (my $fsize = 2; $fsize < $maxlen; $fsize ++) { # packet to compare against for (my $pnum = 0; $pnum < @packets; $pnum ++) { # start substr'ing at index 0 for (0..$maxlen-$fsize) { # don't compare against self if ($index != $pnum) { my $str = substr($curpack,$_,$fsize); my @temparr = ($packets[$pnum] =~ /$str/g); my $nmatch = @temparr; if (defined($all{$str})) { $all{$str} += $nmatch; } else { $all{$str} = $nmatch; } undef @temparr; } } } } $index ++; # print results for current packet print "Comparing packet ",$index," to all others:\n"; my $value; my @sort = sort {length($b) <=> length($a)} keys %all; foreach $value (@sort) { if ($all{$value} > 0) { print "$value: $all{$value} times\n"; } delete($all{$value}); } undef @sort; } exit;