Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Regex confusion

by peschkaj (Pilgrim)
on Oct 30, 2002 at 17:04 UTC ( [id://209146]=perlquestion: print w/replies, xml ) Need Help??

peschkaj has asked for the wisdom of the Perl Monks concerning the following question:

Regular expressions confuse me. Deeply.

I am grabbing a process name with Proc::ProcessTable, but I also need to grab out certain fields to display. I know that my current regular expression is horribly flawed, but I will provide what I am working with.

I am eventually going to use this code with SOAP so that I do not need to be on the UNIX box to view running processes. However, the process information often looks like this:

OPSms -N op01Sms1 -t SMS -T 0x70000000 -U op01/op011 -I sms1

I need to grab the OPSms, op01Sms1, and 0x70000000 fields.

My "code":

#!/usr/bin/perl -w use strict; use Proc::ProcessTable; my $t = new Proc::ProcessTable; print "PID PPID UP START CMD TYPE CM +D NAME TRACE\n"; foreach my $p (@{$t->table}) { if ($p->{cmd} =~ /(op01)/) { my $seconds = $p->{time} % 60; my $diff = ($p->{time} - $seconds) / 60; my $minutes = $diff % 60; $seconds = "0$seconds" if $seconds =~ /^\d$/; my $bigTime = "$minutes:$seconds"; ### Grab process name, type, and trace from cmd $p->{cmd} =~ /(\w+)\b\w+\b(\w+?)\b/; my $procType = $1; my $procName = $2; #not checking for trace yet, since I can't get the procname my $FORMAT = "%-6s %-6s %-7s %-24s %-15s\n"; printf($FORMAT, $p->{pid}, $p->{ppid}, $bigTime, scalar(localtime($p->{start})), $procType, $procName); } }

I'm getting a "Use of unitialized value in printf at ./procs2.pl line 29" which is the printf.



If you make something idiot-proof, eventually someone will make a better idiot.
I am that better idiot.

Replies are listed 'Best First'.
Re: Regex confusion
by Enlil (Parson) on Oct 30, 2002 at 17:38 UTC
    If all lines that you are matching the regex are in the same format and you want to grab things from the fields (delimited by spaces), you might want to split the line into an array and then just grab the fields you want. for example to grab the things you want in the line you might want something like:
    my @line_array = split /\s+/,$p->{cmd}; my $procType = $line_array[0]; my $procName = $line_array[2];

    if you want to use a regular expression you could change it to (untested): /(\w+)\s+[^\s]+\s+(\w+)\s+[^\s]+\w+\s+[^\s]+(\w+)/

    which is vastly more complicated than with just a split. Note, that the + after the \s is only necessary if there is a chance there is more than whitespace character in those locations.

    -enlil

Re: Regex confusion
by sauoq (Abbot) on Oct 30, 2002 at 19:06 UTC

    I agree with both fglock and Enlil. Using split is the way to go. Of the two solutions they provided, Enlil's is probably better because it makes the least assumptions about the data.

    Even better than splitting on /\s+/ is splitting on a single literal space. This is a special case for split and eliminates the leading null field in the event that your input has leading space. This is almost always what people really want when they need to split on whitespace.

    Also, I'd just assign a slice of split's return list to my variables to eliminate the use of an unnecessary named temporary array.

    my ($procType, $procName, $traceLevel) = (split ' ', $p->{cmd})[0,2,6] +;
    -sauoq
    "My two cents aren't worth a dime.";
    
Re: Regex confusion
by fglock (Vicar) on Oct 30, 2002 at 17:25 UTC

    Another way to do it, using split:

    $a = "OPSms -N op01Sms1 -t SMS -T 0x70000000 -U op01/op011 -I sms1"; @a = split( /\s+-\w\s+/, $a); print "@a\n"; Output: OPSms op01Sms1 SMS 0x70000000 op01/op011 sms1

    You want $a[0], $a[1], $a[3]

Re: Regex confusion
by fglock (Vicar) on Oct 30, 2002 at 21:18 UTC

    I had another idea, after reading sauoq's:

    $a = "OPSms -N op01Sms1 -t SMS -T 0x70000000 -U op01/op011 -I sms1"; %a = split( " ", "-first " . $a); print "$a{-first} $a{-N} $a{-T}\n"; Output: OPSms op01Sms1 0x70000000
      How wonderfully cunning. ++ and kudos to you.

      If you make something idiot-proof, eventually someone will make a better idiot.
      I am that better idiot.
Re: Regex confusion
by peschkaj (Pilgrim) on Oct 30, 2002 at 17:46 UTC
    Thank you for the assistance. This is what I came up with in the end. I had to iterate through the loop to get the trace level because sometimes the SMS parameter isn't in there.
    #!/usr/bin/perl -w use strict; use Proc::ProcessTable; my $t = new Proc::ProcessTable; print "PID PPID UP START CMD TYPE +CMD NAME TRACE\n"; foreach my $p (@{$t->table}) { if ($p->{cmd} =~ /op01/) { ### Format time into a MM:SS format. ### accounts for single digits in seconds field ### pretty obvious, really. my $seconds = $p->{time} % 60; my $diff = ($p->{time} - $seconds) / 60; my $minutes = $diff % 60; $seconds = "0$seconds" if $seconds =~ /^\d$/; my $bigTime = "$minutes:$seconds"; ### Grab process name, type, and trace from cmd my $output = $p->{cmd}; my @array = split( /\s+-\w\s+/, $output); my $procType = $array[0]; my $procName = $array[1]; my $traceLevel; for (my $i = 2; $i < @array; $i++) { if ($array[$i] =~ /\dx(?:\d+)/) { $traceLevel = $array[$i]; last; } } $traceLevel = "N/A" if !defined($traceLevel); my $FORMAT = "%-6s %-6s %-7s %-24s %-15s %-19s %s\n"; printf($FORMAT, $p->{pid}, $p->{ppid}, $bigTime, scalar(localtime($p->{start})), $procType, $procName, $traceLevel); } }


    If you make something idiot-proof, eventually someone will make a better idiot.
    I am that better idiot.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://209146]
Approved by zigdon
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2024-04-19 07:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found