Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

analyzing data

by matt00perl (Novice)
on Apr 23, 2014 at 08:22 UTC ( [id://1083297]=perlquestion: print w/replies, xml ) Need Help??

matt00perl has asked for the wisdom of the Perl Monks concerning the following question:

i have a data that i got from analysing tcpdump file. the result is below.

First column is time, follow by src mac, dest_mac, src_ip & src_port and dest_ip_dest_ip.

i have data from one source ip to a destination ip which appears in different rows, only with the same information except little different in time. Instead of displaying all this information, i will like to loop through the file, if destination ip is the same, record the start time and the end time, the take the difference and print just one row with the difference

My result at the moment

03-23 00:37:28.174515 | 8ca982044d00 | c04a00332142 | 192.168.1.100 | + 49671 | 180.149.153.11 | 80 03-23 00:37:28.174536 | 8ca982044d00 | c04a00332142 | 192.168.1.100 | + 49671 | 180.149.153.11 | 80 03-23 00:41:36.422588 | 8ca982044d00 | c04a00332142 | 192.168.1.100 | + 49672 | 180.149.153.11 | 80 03-23 00:44:18.584080 | 8ca982044d00 | c04a00332142 | 192.168.1.100 | + 49671 | 180.149.153.11 | 80 03-23 00:44:22.588592 | 8ca982044d00 | c04a00332142 | 192.168.1.100 | + 35660 | 180.149.134.61 | 80 03-23 00:45:12.636571 | 8ca982044d00 | c04a00332142 | 192.168.1.100 | + 35661 | 180.149.134.61 | 80

What i am expecting instead is:

(00:44:22 - 00:37:28) | 8ca982044d00 | c04a00332142 | 192.168.1.100 | + 35661 | 180.149.134.61 | 80

Any help will be appreciated thank you

Replies are listed 'Best First'.
Re: analyzing data
by salva (Canon) on Apr 23, 2014 at 08:44 UTC
    So, how would you do it by hand, using just pencil and paper?
      Second that. In your example, src_port varies, so does dest_ip. In the result row, you are using the src_port of the 6th row, although it looks that this row is not used in the result due to different dest_ip. Also, what about the times, are they truncated or rounded and how? Can we assume that all entries are sorted by time? The first step is to actually specify what you want to do.

        they are not sorted by time, all i want is to show how long one src_ip spent on particular dest_ip

      How can I ++ this comment more than once?

      This is exactly what I recommend to people to do if I want to lead them to learn to program.

      i will take start time minus end time which equal to the difference. Is that what you mean ?

        well, the details are important!

        How do you find the start and end times? How do you know you have covered all the entries?

        When writing some program, the first thing you need to do is to find a precise way to solve the problem. Then you can think about how to translate that into Perl (or any other language).

Re: analyzing data - if I understand the question
by Discipulus (Canon) on Apr 23, 2014 at 11:50 UTC
    Hello matt00perl and welcome,

    be sure next time to be precise as you can about what yoou have and what you expect, because, as chatted some hours ago, i think your expectection are mispelled.

    In any case, if this can help you, for uniqueness i suggest to use hash.

    #!perl use strict; use warnings; use Data::Dumper; my %occur; my $data_pos = tell DATA; # save the position, for later use while (<DATA>) { chomp; #elimnates newline s/\s*<p>\s*//g;#remove tags and unecessary withespaces my ($time,$src_mac,$dest_mac,$src_ip,$src_port,$dest_ip,$dest_port +) = split /\s\|\s/, $_; #check lesser time # be AWARE of the poor time comparison implementation: maybe bette +r transform each time #in seconds from epoch, do the comparison numerically (ie: < or > +instead of lt gt), riconvert in what you want if (defined $occur{$dest_ip}{'mintime'}) { $occur{$dest_ip}{'mintime'} = $time if $time lt $occur{$dest_i +p}{'mintime'}; } else {$occur{$dest_ip}{'mintime'} = $time} #check greater time if (defined $occur{$dest_ip}{'maxtime'}) { $occur{$dest_ip}{'maxtime'} = $time if $time gt $occur{$dest_i +p}{'maxtime'}; } else {$occur{$dest_ip}{'maxtime'} = $time} #you can save in the hash entry other fields you may need.. # $occur{$dest_ip}{'src_mac'} = $src_mac; and so on.. } print Dumper (\%occur); #or to be precise we need unique connections i think undef %occur; seek DATA, $data_pos, 0; #rewind DATA while (<DATA>){ chomp; s/\s*<p>\s*//g; my ($time,$src_mac,$dest_mac,$src_ip,$src_port,$dest_ip,$dest_port +) = split /\s\|\s/, $_; #change only the hash key creation my $connection = 'from_'.$src_ip.'_to_'.$dest_ip.'_port_'.$dest_po +rt; #all the same now if (defined $occur{$connection}{'mintime'}) { $occur{$connection}{'mintime'} = $time if $time lt $occur{$con +nection}{'mintime'}; } else {$occur{$connection}{'mintime'} = $time} #check greater time if (defined $occur{$connection}{'maxtime'}) { $occur{$connection}{'maxtime'} = $time if $time gt $occur{$con +nection}{'maxtime'}; } else {$occur{$connection}{'maxtime'} = $time} } print Dumper (\%occur); __DATA__ <p> 03-23 00:37:28.174515 | 8ca982044d00 | c04a00332142 | 192.168.1.10 +0 | 49671 | 180.149.153.11 | 80 <p> <p> 03-23 00:37:28.174536 | 8ca982044d00 | c04a00332142 | 192.168.1.10 +0 | 49671 | 180.149.153.11 | 80 <p> <p> 03-23 00:41:36.422588 | 8ca982044d00 | c04a00332142 | 192.168.1.10 +0 | 49672 | 180.149.153.11 | 80 <p> <p> 03-23 00:44:18.584080 | 8ca982044d00 | c04a00332142 | 192.168.1.10 +0 | 49671 | 180.149.153.11 | 80 <p> <p> 03-23 00:44:22.588592 | 8ca982044d00 | c04a00332142 | 192.168.1.10 +0 | 35660 | 180.149.134.61 | 80 <p> <p> 03-23 00:45:12.636571 | 8ca982044d00 | c04a00332142 | 192.168.1.10 +0 | 35661 | 180.149.134.61 | 80 <p> ####OUTPUT $VAR1 = { '180.149.153.11' => { 'maxtime' => '03-23 00:44:18.584080', 'mintime' => '03-23 00:37:28.174515' }, '180.149.134.61' => { 'maxtime' => '03-23 00:45:12.636571', 'mintime' => '03-23 00:44:22.588592' } }; $VAR1 = { 'from_192.168.1.100_to_180.149.153.11_port_80' => { 'maxtime +' => '03-23 00:44:18.584080', 'mintime +' => '03-23 00:37:28.174515' }, 'from_192.168.1.100_to_180.149.134.61_port_80' => { 'maxtime +' => '03-23 00:45:12.636571', 'mintime +' => '03-23 00:44:22.588592' } };
    HtH
    L*
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      thank you for the piece of code i appreciate your time.here is what is happening... i have foreach which index through my raw pcap data and decode it, after that i printed out the output above. Instead of printing out that out i want to extend the foreach to calculate the time based on destination ip add
Re: analyzing data
by Laurent_R (Canon) on Apr 23, 2014 at 09:42 UTC
    In your example, the last two lines have one of the two IP addresses which is different from the other lines. Do you nonetheless want to merge all these lines? And BTW, in your records, which is the source IP and which is the destination IP?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1083297]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (8)
As of 2024-04-25 11:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found