http://qs321.pair.com?node_id=253226

LogicalChaos has asked for the wisdom of the Perl Monks concerning the following question:

I'm hoping one (or more) of you can help me on my path to enlightment.

I'm trying to match a string and pull out the appropriate data. The string can be in one of three forms:
'3d', '3d 2h', '6h'.

What I have now (which works) is:
$value =~ /^(?:(?:(\d+)d)?\s*(?:(\d+)h){1})|(?:(?:(\d+)d){1}\s*(?:(\d+)h)?)$/i; my $day = $1 || $3 || 0; my $hour = $2 || $4 || 0;
But it's rather of painful to look at. What alternatives exist? I suspect there is a way to only specify each match once, and not use '|', but...

If it makes any difference, this is running on ActiveState 5.6.0 (build 616)

Replies are listed 'Best First'.
Re: Help with a more precise regex
by Ovid (Cardinal) on Apr 25, 2003 at 19:53 UTC

    I think I might be misunderstanding the problem because it seems straightforward:

    foreach (<DATA>) { my ($day) = /(\d+)d/; my ($hour) = /(\d+)h/; $_ ||= 0 foreach $day, $hour; print "Day: ($day) Hour: ($hour)\n"; } __DATA__ 3d 3d 2h 6h

    The prints:

    Day: (3) Hour: (0)
    Day: (3) Hour: (2)
    Day: (0) Hour: (6)

    Cheers,
    Ovid

    New address of my CGI Course.
    Silence is Evil (feel free to copy and distribute widely - note copyright text)

      Duh, I couldn't see the trees for the regex forest I had created...

      Thanks for your clear thinking Ovid.

      LogicalChaos

      *snif* That's beautiful!

      I regret that I have only one ++ vote to give to this note...

Re: Help with a more precise regex
by jasonk (Parson) on Apr 25, 2003 at 19:55 UTC

    A little obfuscated, but I kind of like this method:

    my %t = reverse ($value =~ /(\d+)([dh])/g); my $day = $t{d} || 0; my $hour = $t{h} || 0;

    We're not surrounded, we're in a target-rich environment!
      I Love You FolksTM! I've never used 'g' outside of s/// before.

      Thanks for the tip,
      LogicalChaos
Re: Help with a more precise regex
by Jenda (Abbot) on Apr 25, 2003 at 20:01 UTC

    First. Your regexp is slightly wrong. It matches even strings like 'hello there 1d'. You would need one more (?:...) around the whole regexp except the ^ and $ to fix it. Currently the first alternative matches only at the start of the string, while the second only at the end. You want them to only match the whole string.

    Anyway I would suggest something like this:

    while (<STDIN>) { chomp; if ($_ and /^(?:(\d+)d)?(?:\s*(\d+)h)?$/i) { print "Days: $1, Hours: $2\n"; } else { print "Doesn't match\n"; } }
    The $_ and ... is necessary to prevent an empty string from matching, the rest is taken care of by the regexp.

    Jenda
    Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
       -- Rick Osborne

    Edit by castaway: Closed small tag in signature

      Hey Jenda,

      This is the one I was looking for. Pointing out the relative stupidity of my regex. I was checking for empty string, but then lost the fact that I didn't need to require the 'd' or 'h' portion, thus allowing for the deletion of the '|'. And at one point, I did correctly have the ^(?:...)$ wrapping the entire match, but lost it in one of my incantations.

      So, the final form I've chosen is:
      if ( $value and $value !~ /^(?:(\d+)d)?(?:\s*(\d+)h)?$/i ) { $result = "Please only enter time in one of these three formats: ' +3d', '4d 3h', '22h'"; } else { $day = $1 || 0; $hour = $2 || 0; }
      And it seems to work!

      Thanks,
      LogicalChaos
Re: Help with a more precise regex
by vladb (Vicar) on Apr 25, 2003 at 19:53 UTC
    Not a direct answer to your question, but as a suggestion I wanted to point you in the direction of the Date::Manip module. Often in my scripts which dealt with time/date parsing, this module saved my day. It is able to do a lot of date 'magic' but may require some digging in order to understand :)

    Update:
    Here's a rendition of Ovid's code, but using the Date::Manip module. I agree that using the module for such a simple task may look like an overkill, but I have this feeling your program may have to deal with a lot more date/time manipulation going forward. Even if this is not the case, it's always good to have another way of doing a thing ;-)
    use strict; use Date::Manip; foreach (<DATA>) { chomp; next if /^$/; my @delta=split(/:/, ParseDateDelta($_)); my ($day) = $delta[3]; my ($hour) = $delta[4]; print "$_\t=>\tDay: $day\tHour: $hour\n"; } __DATA__ 3d 3d 2h 6h
    And the output the script generates is:
    3d => Day: 3 Hour: 0 3d 2h => Day: 3 Hour: 2 6h => Day: 0 Hour: 6


    _____________________
    # Under Construction
      Actually, I won't be using date manipulations much. This is in a ClearQuest project I'm implementing, and all I need to know is that there are 8 hours in a day. The info gathered will be presented in days/hours.

      And, unless I mis-remember, Date::Manip has compiled code, which would make it very difficult for me to place in cqperl (which is derived from ActiveState 5.6.0 perl). I can put in Perl only modules, but wouldn't want to hazzard puting in DLL's (or whatever is used on Windows).

      Thanks for your input,
      LogicalChaos
        Although the solution I was proposing may not be precisely the one you were looking for; I thought it was worth noting for sake of others who might stumble on your post and wonder about alternative ways of doing this. It also appears simpler than the regexp way :)

        There should be absolutely no difficulty in getting Date::Manip to work with cqperl as the module "is written entirely in perl" (quote). Despite of the fact that it may increase the size of your final executable and/or also affect it's run-time speed, there is no other visible harm in using the module. As I had stated, it would be an overkill to use the module to simply decypher date and hour values from the sample input you provided. Nontheless, it does appear cleaner than using raw regexp ;)

        _____________________
        # Under Construction
Re: Help with a more precise regex
by BrowserUk (Patriarch) on Apr 25, 2003 at 20:54 UTC

    A somewhat different approach. If you were using 5.8 the $^N is probably a better choice than $+ though it makes little difference in this case.

    #! perl -sw use strict; for ('6d', '3h', '4d 3h','6D', '3H', '4D 3H') { our ($d,$h) = (0,0); print "$d days $h hours\n" if m[^ (?: (\d+)d (?{ $d = $+ || '0' }) )? \s* (?: (\d+)h (?{ $h = $+ || '0' }) )? $]xi; } __END__ C:\test>test 6 days 0 hours 0 days 3 hours 4 days 3 hours

    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: Help with a more precise regex
by perlplexer (Hermit) on Apr 25, 2003 at 19:55 UTC
    How about this: /^(\d+)([dh])(?:\s+(\d+)h)?/
    $1 - number, $2 defines whether it's 'd' or 'h'
    $3 - number for 'h' (if defined)

    --perlplexer
Re: Help with a more precise regex
by asdfgroup (Beadle) on Apr 26, 2003 at 00:25 UTC
    Hi, This one should work fine and detect all "format errors":
    for ("2D 3h", "2d", "3h", "3h 2d", "Some string") { my ($day, $hour) = map {$_+0} /^(?:(\d+)d\s*)?(?:(\d+)h)?$/i; $day || $hour ? print "Day - $day, hour - $hour\n" : warn "Wrong fo +rmat $_" }
    Sincerely, Nikita Savin
Re: Help with a more precise regex
by Nkuvu (Priest) on Apr 25, 2003 at 19:58 UTC

    It could be painfully obvious (just suggested as an alternative):

    if ($value =~ /^(\d+)d$/i) { $day = $1; } elsif ($value =~ /^(\d+)d\s+(\d+)h$/i) { $day = $1; $hour = $2; } elsif ($value =~ /^(\d+)h$/i) { $hour = $1; } else { print "Hork!\n"; }

    What exactly do you want out of this?

    Update: Sorry, just ignore this post. It's too ugly to live. ;)