Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Regexp map weirdness

by Kickstart (Pilgrim)
on May 12, 2001 at 05:31 UTC ( [id://79892]=perlquestion: print w/replies, xml ) Need Help??

Kickstart has asked for the wisdom of the Perl Monks concerning the following question:

Oh perl masters...why does this output 'tt0tt1', printing the second match twice (the tt from the word 'written' and the value of $c) rather than the 'h' from 'the', then the second match's value without the value of $c? I thought map would be smarter than that (or I'm just a dumb perl programmer, which is what I suspect).

open(FH,"</usr/bin/perldoc"); undef $/; $_=<FH>; map (($f[$c]=$1,print "$f[$c]", $c++), m/t(.*?)e/, m/wri(.*?)en/);

Thanks!
Kickstart

Replies are listed 'Best First'.
Re: Regexp map weirdness
by chipmunk (Parson) on May 12, 2001 at 06:21 UTC
    You're doing a map over a list of two values: 'h' and 'tt'. However, within the body of the map, you ignore the elements of the list and use the value of $1 instead. Since the m/wri(.*?)en/ match was executed second, $1 holds 'tt'. Keep in mind that, as implemented, the entire argument list is evaluated before the map begins.

    If you change your code to use $_ instead, you will get the results you expected:

    open(FH,"</usr/bin/perldoc"); undef $/; $_=<FH>; map (($f[$c]=$_,print "$f[$c]", $c++), m/t(.*?)e/, m/wri(.*?)en/);
    I'm not sure how this snippet is being used in a larger script, but map may not be what you want here. One alternative would be to assign to @f directly: @f = (m/t(.*?)e/, m/wri(.*?)en/);
Re: Regexp map weirdness
by Banky (Acolyte) on May 12, 2001 at 11:28 UTC
    Well it depends what you're reading in.

    However both of your matches start at the beginning of the string not where the previous one left off.

    For that type of behavior check out the /g modifier and the /G tag. Or you could combine everything into one regular expression if that'd work.

Re: Regexp map weirdness
by srawls (Friar) on May 13, 2001 at 04:03 UTC
    One thing you should read is death to dot start. You should change the following regex:
    m/t(.*?)e/, m/wri(.*?)en/);
    to:
    m/t([^e])e/, m/wri([^e])en/);
    The above regex is much more efficient, and it says what you really mean.

    The 15 year old, freshman programmer,
    Stephen Rawls

      You probably meant m/t([^e]*)e/, m/wri([^e]*)en/); didn't you? :)) And a quick benchmark shows the speed advantage:

      Benchmark: timing 300000 iterations of dotstar, negchar... on string q +/tddddde/ dotstar: 1 wallclock secs ( 1.37 usr + 0.00 sys = 1.37 CPU) @ 21 +8658.89/s (n=300000) negchar: 0 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CPU) @ 31 +5457.41/s (n=300000) Rate dotstar negchar dotstar 218659/s -- -31% negchar 315457/s 44% --
      The longer the captured part, the better performs negchar. In the case of a failure, both methods take approx. the same time.

      -- Hofmator

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://79892]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-24 02:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found