Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Matching an IP address

by ibanix (Hermit)
on Nov 19, 2002 at 18:34 UTC ( [id://214210]=perlquestion: print w/replies, xml ) Need Help??

ibanix has asked for the wisdom of the Perl Monks concerning the following question:

I have a snippet of code that looks like this:
foreach $line (@connections) { if ($line =~ /^\s*(\d+\.\d+\.\d+\.\d+):\d+ \s*->\s*(\d+\.\d+\.\d+\.\d+):\d+ \s*->\s*(\d+\.\d+\.\d+\.\d+)/x) { $client_ip{$1}{$iteration}++; $vips{$2}{$iteration}++; $frontend{$3}{$iteration}++; } }
(This collects information on a F5 BigIP loadbalancer, for those of you who are curious).

Is there a better way I can do the (\d+\.\d+\.\d+\.\d+)? As you can tell, I'm just trying to match for a numeric IP address.


<-> In general, we find that those who disparage a given operating system, language, or philosophy have never had to use it in pratice. <->

update (broquaint): title change (was Regex redux)

Replies are listed 'Best First'.
(z) Re: Regex redux
by zigdon (Deacon) on Nov 19, 2002 at 18:41 UTC
    Well, assuming you want to stick with this regex for ip's (which allows things like 999.012.4231.31245 - see Matching an IP address), you could just define it beforehand:
    my $ip = qr/\d+\.\d+\.\d+\.\d+/; foreach $line (@connections) { if ($line =~ /^\s*($ip):\d+ \s*->\s*($ip):\d+ \s*->\s*($ip)/x) { $client_ip{$1}{$iteration}++; $vips{$2}{$iteration}++; $frontend{$3}{$iteration}++; } }

    -- Dan

Re: Regex redux
by dws (Chancellor) on Nov 19, 2002 at 19:30 UTC
    Is there a better way I can do the (\d+\.\d+\.\d+\.\d+)?

    What you have is fine, though you could take more advantage of /x to clean up the regex:

    if ( $line =~ m{ ^\s* ( # begin client IP \d+\.\d+\.\d+\.\d+ ) :\d+ # client port (ignored) \s*->\s* ( # begin vips \d+\.\d+\.\d+\.\d+ ) :\d+ # vips port (ignored) \s*->\s* ( # begin frontend \d+\.\d+\.\d+\.\d+ ) }x ) { $client_ip{$1}{$iteration}++; $vips{$2}{$iteration}++; $frontend{$3}{$iteration}++; }
    If you know the number of spaces around "->", use it instead of \s* (e.g., "\s->\s" instead of "\s*->\s*").

    If you've got a lot of data, you're probably not going to want to pull it all into @connections. That's gotta suck up RAM.

    Also, consider inverting the data structure you're collecting the counts in. If $iteration is relatively fixed (i.e., changes slowly, compared to the number of connections you're processing), you might save significant time by taking counts without considering $iteration, then sweep those counts into a larger data structure whenever $iteration changes. This is one to benchmark, since it could easily backfire depending on your data mix.

      The data from @connections is thankfully small. It's the endless upper bounds of $iteration that should suck up RAM in the long run. I haven't figured how I should deal with that yet.

      I've posted the full script (and questions) at http://www.perlmonks.org/index.pl?node_id=214252

      Thanks!

      <-> In general, we find that those who disparage a given operating system, language, or philosophy have never had to use it in pratice. <->
Re: Regex redux
by shemp (Deacon) on Nov 19, 2002 at 18:47 UTC
    not the absolute greatest reduction, but:
    ((?:\d+\.){3}\d+)
    The IP is still stored in $1, since the inner parentheses has the ?: directive so that value isnt captured, but the match to the outer parentheses goes in $1
Re: Regex redux
by adrianh (Chancellor) on Nov 19, 2002 at 21:52 UTC
    Is there a better way I can do the (\d+\.\d+\.\d+\.\d+)? As you can tell, I'm just trying to match for a numeric IP address.

    Take a look at Regexp::Common::net. Your code could be written as something like (untested code):

    use Regexp::Common qw /net/; # match dotted decimal IP address # $1 = whole match, $2-5 = bytes my $IP = $RE{net}{IPv4}{-keep}; foreach my $line (@connections) { if ($line =~ /^\s*$IP:\d+\s*->\s*$IP:\d+\s*->\s*$IP/x) { $client_ip{$1}{$iteration}++; $vips{$6}{$iteration}++; $frontend{$11}{$iteration}++; } }

    Note that $IP will only match on legal decimal IP addresses (so 666.666.666.666 won't match). This may, or may not, be what you want ;-)

Re: Regex redux
by petral (Curate) on Nov 20, 2002 at 02:36 UTC
    if ( @ips = $line =~ /[\d.]{7,15}/g ) { $client_ip{$ips[0]}{$iteration}++; . . . }
    If you need to check the validity of the line, you can wrap that in something like (using the regex from Mastering Reg Exps as cited here):
    $numrx = qr/[01]?\d\d?|2[0-4]\d|25[0-5]/; $iprx = qr/($numrx\.){3}$numrx/; if ( $line =~ /^(\s*($iprx):\d+\s*->){2}\s*$iprx/ ) { . . . }
    Or, of course combine them: if ( $line =~ /^\s*($iprx):\d+\s*->\s*($iprx):\d+\s*->\s*($iprx)/ ) {
      p

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://214210]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-20 14:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found