Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Regular Expression Tweaking

by dooberwah (Pilgrim)
on Feb 20, 2002 at 23:25 UTC ( [id://146659]=perlquestion: print w/replies, xml ) Need Help??

dooberwah has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a small script to format a text file. My text file isn't just a normal document, it's a poem. I'm dumping the end result into an HTML table to make it look pretty.

That part isn't actually very interesting, because it's so simple. The more interesting (?) tidbit is this:

$line =~ /^$/ or chomp ( $line );

What is it? Well, what it does is removes the newline from the end of the line, unless the only thing on the line is a newline character. This is so normal lines are chomped, and lines that separate the stanzas aren't.

Is this the best way to do it? It runs fine ... and hey, TMTOWTDI, but is it the best (fastest, cleanest, prettiest, safest)? Comments from those wiser than I would be greatly appreciated.

-Ben Jacobs (dooberwah)
http://dooberwah.perlmonk.org
"one thing i can tell you is you got to be free"

Replies are listed 'Best First'.
(Ovid) Re: Regular Expression Tweaking
by Ovid (Cardinal) on Feb 21, 2002 at 00:04 UTC

    dooberwah asked:

    Is this the best way to do it? It runs fine ... and hey, TMTOWTDI, but is it the best (fastest, cleanest, prettiest, safest)? Comments from those wiser than I would be greatly appreciated.

    I would hesitate to say that I am wiser, but my knee-jerk reaction is to usually consider best to mean "easiest to read". Of course, "easiest" is subjective. As for your example, the "best" approach will vary depending upon things that you haven't shown. For example, you might just want something in the following form:

    while (<FILE>) { next if ! /\S/; chomp; process_stuff( $_ ); }

    From what you've given us, though, I suspect that I would prefer the following:

    while (<FILE>) { chomp if /\S/; process_stuff( $_ ); }

    I used $_, but there's no problem with using $line, of course. I was just being lazy :) A more direct comparison shows the difference in being able to read the two:

    Yours: $line =~ /^$/ or chomp ( $line ); Mine: chomp $line if $line =~ /\S/;

    While I could be mistaken, I think the latter is easier to read and that is what is important. Don't worry about fastest or prettiest. Focusing on those will only get you in trouble.

    Of course, it's only fair to point my snippet isn't the same as yours. I assumed that you also wouldn't care about chomping a line with all whitespace, but I could be mistaken.

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Why use a regex that changes the behavior for whitespace-only lines? Matching "something other than newline" can be written as /[^\n]/ or more succinctly as </code>/./</code>
      chomp $line if $line =~ /./;
      Note: this wont behave the same as the original for strings consisting entirely of multiple newlines. i.e. original chomped "\n\n\n\n", this will not.;

      -Blake

Re: Regular Expression Tweaking
by japhy (Canon) on Feb 21, 2002 at 03:45 UTC
    Why not read the poem stanza-by-stanza instead of line by line?
    { local $/ = ""; # two or more newlines in a row @stanzas = <POEM>; } # or { local $/ = "\n\n"; # two newlines in a row @chunks = <POEM>; # you might not get the same results }

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a (from-home) job
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Regular Expression Tweaking
by YuckFoo (Abbot) on Feb 20, 2002 at 23:51 UTC
    Why not go ahead and chomp? Handle your special cases of ($line eq '') when generating the output.

    YuckFoo

Re: Regular Expression Tweaking
by Anonymous Monk on Feb 21, 2002 at 00:44 UTC
    I'll probably have to regret I said this, but I think this is both fastest and prettiest. I'm too lazy to do any benchmarks myself, but I encourage the curious to do that. :-)

    s/(?!^)\n$//;

    -Anomo
      I don't think that that solution gives the same results.

      From perlre:

      (?!pattern)
      A zero-width negative lookahead assertion. For example /foo(?!bar)/ matches any occurrence of ``foo'' that isn't followed by ``bar''. Note however that lookahead and lookbehind are NOT the same thing. You cannot use this for lookbehind.

      Perhaps you mean to use the (?<!pattern) look behind assertion?

      More From perlre:

      (?<!pattern)
      A zero-width negative lookbehind assertion. For example /(?<!bar)foo/ matches any occurrence of ``foo'' that isn't following ``bar''. Works only for fixed-width lookbehind.

      -Ben Jacobs (dooberwah)
      http://dooberwah.perlmonk.org
      "one thing i can tell you is you got to be free"

        Actually, lookbehind and lookahead seem to be the same when the assertion itself has zero-width... In the following code, looking ahead for the anchor produces the same results as looking behind for it.
        #!/usr/bin/perl -wT use strict; # Replace 'See' anywhere in the string $_ = "See spot run. See spot jump."; s/See/MATCH/g; print "'$_'\n"; # 'MATCH spot run. MATCH spot jump.' # Replace 'See' unless lookahead finds the anchor $_ = "See spot run. See spot jump."; s/(?!^)See/MATCH/g; print "'$_'\n"; # 'See spot run. MATCH spot jump.' # Replace 'See' unless lookabehind finds the anchor $_ = "See spot run. See spot jump."; s/(?<!^)See/MATCH/g; # 'See spot run. MATCH spot jump.' print "'$_'\n";

        -Blake

        When having a zero-width pattern inside the assertion it doesn't matter in which direction you look. It's zero width. It's like saying that you jump 0 meters up in the air, or 0 meters down into the ground. You're still not moving. Sure, it might make more sense to say you jumped 0 meter up from the ground than down in the ground, but effectively it's still the same thing.

        I used look-ahead because if I recall correct it's faster, and it was supposted to be fast. Plus it's one byte shorter, and he wanted prettiness. I consider (?!) prettier than (?<!).

        Cheers,
        -Anomo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://146659]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-19 20:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found