Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Seduced by a Regex

by kelan (Deacon)
on Sep 30, 2002 at 15:30 UTC ( [id://201743]=perlmeditation: print w/replies, xml ) Need Help??

Or: How I Forgot (and Remembered) Normal String Processing

Remember the days of doing string processing in C? The old goodies like strlen, strchr, strcmp, strcpy, and their relatives, were close at hand. Of course dynamically typed languages such as Perl have usually had nicer string manipulation capabilities; indeed, that being a much heralded advantage to these languages. It is an opinion with which I happen to agree, but I was to learn that some things can be too good.

Perl has a delicious extension to string manipulation: regular expressions.1 At first when I began learning Perl I thought they were okay but didn't really understand how to build good ones. As I became more familiar with the concept and Perl's regex syntax, I became ever more enamored with them. These regular expressions are great! With their help, I can do anything involving strings! Matching, substitution, pulling out embedded info, and hundreds of other uses all came easily within my reach.

My shiny new toy was pulled out on every possible occassion. 'Hey, that can be done with a regex!' could have been my mantra. But love had blinded me. I had forgotten that can does not always mean should.

I have recently had this realization, and it was spurred by taking more notice of certain answers given for questions regarding regex problems. Many times, the asker's problem can be solved with a simple substr or even index; more and more, I have been appreciating the replies that offer those solutions. And I have begun to catch myself when considering using a regex, to think if there is a simpler way with the other tools at my disposal.

So Regex Lovers, listen up! I'm not trying to tell you to eschew these wonderous doodads. But do not forget the old fashioned and sometimes simpler ways. substr, (r)index, reverse can be powerful tools. Remember them, use them. And thank you to those who have helped to remind me.

kelan

1. Yes, I know Perl is not the only language with regular expressions. But most of the others have them as a direct consequence of their popularity in Perl.2

2. I also don't know much language history so please forgive me if your-favorite-language-that-has-regexs had them before or independent of Perl.


Yak it up with Fullscreen ChatterBox

Replies are listed 'Best First'.
Re: Seduced by a Regex
by grinder (Bishop) on Sep 30, 2002 at 16:40 UTC

    I actually gave a lightening talk on "How not to abuse regular expressions" at YAPC::Europe 2000. A couple of months later japhy posted Code Smarter which also touches on some of the ideas I mentioned. I had a list of common regexp idioms that are far more efficient when recast as substr, index and so on, I should see if I still have my notes lying around.

    There are also a couple of paradoxical things to be aware of with regexps. It may seem that s/^\s+|\s+$/g should be faster than s/^\s+//; s/\s+$//. But it's nearly always not. The reason is that in the former case, the string has to be scanned in its entirety. In the latter case, only the beginning and end of the string need to be examined. The longer the string, the bigger the win.


    print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'
      I would be quite interested in seeing some of those notes, if you can find them.

      kelan


      Yak it up with Fullscreen ChatterBox

      i'd also love to see those notes....
Re: Seduced by a Regex
by diotalevi (Canon) on Sep 30, 2002 at 16:35 UTC

    There are three things I keep in mind with regard to regular expressions and efficiency. Number one: when looking for a static string index() is faster than regex (or so sez the Mastering Algorithms Book). Regex has an optimization that probably lets it work just like index() but it has some additional overhead to deal with since it's a more complex beastie. Number two: Ovid's Death to Dot Star!. Number three: there's a wonderful article on perl.com about 'sexegers' which is just applying a regex to the same data except backwards. This comes up occasionally so it's a neato trick. From here just get into things like unnessessary use of substitution, capturing and other nastiness.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://201743]
Approved by rattusillegitimus
Front-paged by hsmyers
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2024-04-25 21:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found