Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

I'm looking for a one-liner filter 'twixt two regexes

by msemtd (Scribe)
on Sep 05, 2002 at 11:12 UTC ( #195333=perlquestion: print w/replies, xml ) Need Help??

msemtd has asked for the wisdom of the Perl Monks concerning the following question:

Hi people,
I require a one-liner to extract all the lines between two regexes in all the html files I have in a directory -- I know its going to go along the lines of...
perl -pi.bak -e "...mumble mumble something something..." *.html
...but I'm not sure how say I want all lines between the lines that match start-regex & end-regex.
BTW: the end-regex wants to be non-greedy so it stops at the first occurance of end-regex.
Love, msemtd

Replies are listed 'Best First'.
Re: I'm looking for a one-liner filter 'twixt two regexes
by zigdon (Deacon) on Sep 05, 2002 at 12:03 UTC
    like everyone else, I'm not 100% sure I understand what you want. But if it's just to print out the lines between a start and a stop regexp, wouldn't this do the trick?
    perl -ne 'print if /start/ .. /end/' *.html
    Note that this will not allow /start/ and /end/ to be on the same line. If you want that as an option, you need to replace the .. operator with ....

    See man perlop.

    -- Dan

Re: I'm looking for a one-liner filter 'twixt two regexes
by RMGir (Prior) on Sep 05, 2002 at 12:13 UTC
    Like everyone else, I'm not sure I understand the question.

    But I think what you mean is "I want to start printing when I see regex 1, and stop printing when I see regex 2"... (Edit: Aha, looking at your 2nd post, that does seem to be what you want)

    If so, here goes:

    perl -i.bak -ne'if(/whateverStartRegexIs/){$print{$ARGV}=1} if(/whatev +erEndRegexIs/){$print{$ARGV}=0;} print if $print{$ARGV};' *.html
    Now, if start regex and end regex match on the same line, this won't print at all. You have to figure out if that's what you want or not.

    Also, if you have start...end...start...end in the same file, this will print two sets of lines for a given file. If that's not what you want, you could add a %done hash keyed on $ARGV as well...

    Hope this helps!

Re: I'm looking for a one-liner filter 'twixt two regexes
by msemtd (Scribe) on Sep 05, 2002 at 11:31 UTC
    Um, I have tried this fella...
    perl -pi.bak -e "/end_reg/ && print && last; /start_reg/ && $inside++; + next unless inside;" *.html
    but it doesn't output anything! Any ideas?
Re: I'm looking for a one-liner filter 'twixt two regexes
by thpfft (Chaplain) on Sep 05, 2002 at 11:25 UTC

    perhaps I'm reading this too simply, and you mean something more complex by 'between two regexes', but it sounds like you just want:

    perl -pi.bak -e "s/start(.*?)stop/$1/s" *.html

    where 'start' and 'stop' are the opening and closing matches, and with a g modifier as well if you want to replace more than once in each file.

    update: hold on, that's stupid. sorry. you want to replace the whole file with the match, don't you, or capture that text somewhere else? d'uh. please ignore.

Re: I'm looking for a one-liner filter 'twixt two regexes
by msemtd (Scribe) on Sep 05, 2002 at 13:42 UTC
    Thankyou all,
    This fella is great and exactly what I'm after since the start and end regexes appear only once per file as a nice set and never appear on the same line...
    perl -ne 'print if /start/ .. /end/' *.html
    My problem is now due to my WinNT shell (CMD.exe) not being able to read the quotes and deal with the glob!

      cmd.exe quoting rules are almost simple. There is only one kind of quote, the double quote. So

      perl -ne "print if /start/ .. /end/" *.html
      works. I'm not sure how you escape a double quote on the command line of cmd.exe, so I use qq(\n) whenever I need to embed a newline...

      perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
        Aw! I get...

        C:\temp\>perl -n -e "print if /start/ .. /end/" *.html
        Can't open *.html: Invalid argument.

        When run under CMD.exe -- I'm not sure if it's perl.exe or CMD.exe complaining here. :(
      Wow. It boggles my mind that that works. But you're right, it does. (Edit: And looking at the perlop perldocs, now I see why. "bi-stable, like a flipflop", cool!)

      If you can install cygwin, use that. It will give you a "sane" shell to run perl scripts with.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://195333]
Approved by krisahoch
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2022-11-28 16:01 GMT
Find Nodes?
    Voting Booth?