Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

RE: (Ovid - Common regex error) RE: A two-liner for Backtracking for substitutions

by markwild (Sexton)
on Oct 03, 2000 at 02:44 UTC ( [id://35004]=note: print w/replies, xml ) Need Help??


in reply to (Ovid - Common regex error) RE: A two-liner for Backtracking for substitutions
in thread Backtracking for substitutions

Thanks Ovid! I appreciate the friendly amendment. How does the rest of the code look? I'd also like to hear from the original anonymous poster. Did he get his problem solved? --Mark

Replies are listed 'Best First'.
(Ovid - Regex efficiency issues) RE(3): A two-liner for Backtracking for substitutions
by Ovid (Cardinal) on Oct 03, 2000 at 03:10 UTC
    There are a few problems with your code. However, if I write a book, I'm going to dedicate it to:
    #!/usr/bin/perl -w use strict;
    Those two little lines have saved me more trouble than you can possibly imagine and I would strongly recommend that you incorporate them. Admittedly, you just posted what you did for testing purposes, but I still have this "knee jerk" reaction regarding anything without the -w switch or use strict.

    Your first regex can be made a bit more efficient (and accurate) by eliminating the .*, matching to the beginning of the line and using the /m switch:

    $mydata =~ s/^([\w\s]+)\s([\w]+)\s(0000)/$1,$2,$3/mg;
    I haven't actually benchmarked this, but I'd bet good money that this is the case. See Death to Dot Star! for information on why .* is problematic. The accuracy issue is probably a mute point if you have relatively clean data.

    The second regex has two issues. You forgot to put parentheses around the \s0000. Those parentheses were supposed to capture this data and substitute it back using $2. I just changed it to the following:

    $mydata =~ s/(:\d\d)\s0000/$1, 0000/g;
    The other problem is a really just a minor efficiency issue: \d{2} is better written as \d\d (this is from MRE, so it may be out of date for newer regex engines). Basically, when you use \d{2}, the regex engine is forced to keep track of the number of instances of \d. This slows it down just a tad (which can be significant when iterating over a large amount of data). However, when the regex engine sees \d\d, it just matches each instance of \d which is faster.

    Hope this helps!

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just go the the link and check out our stats.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://35004]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-03-28 16:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found