Re^2: Perl RE; how to capture, and replace based on a block?
by taint (Chaplain) on Dec 18, 2013 at 05:51 UTC
|
cat ./FILE.html | perl {...}
in an open xterm. After several failures, and no more ideas. I closed the xterm, and asked for help. I didn't think it'd be of any use in the request.
I've since read every single reference in the Perl documentation, and while I think I've got the RE part down. I'm quite sure I don't know how to feed Perl the file properly to do any more than eat a single line at a time.
So let me have another go at it. The following
#!/usr/bin/perl -w
#retest.pl
# my feeble attempt to a multi-line RE in Perl
$regexp = shift;
while (<>) {
print if /$regexp/;
}
won't work as
# ./retest.pl \</\div\>\n\<\/body\> ./FILE.html
because shift will only manage input one line at a time. Attempts to figure how to make use of psed, and s2p, have failed miserably.
Apologies for the previous noise, and thank you for the thoughtful responses.
--Chris
Yes. What say about me, is true.
| [reply] [d/l] [select] |
|
Hi Chris, specifying a regex on the command line seems a difficult thing to do. At least you should be printing your $regexp to see what it contains.
In any case, this code seems to work:
my $str = "
</div>
</body>
";
print "Success\n" if $str =~ /\<\/div\>\n\<\/body\>/;
which suggests that if you slurp in your whole file as a single string (e.g. by unsetting $/), your regex should do its job.
local $/;
my $str = <>;
print "Success\n" if $str =~ /\<\/div\>\n\<\/body\>/;
| [reply] [d/l] [select] |
|
Perl said; Success.
Thanks a million, hdb! Your suggestion has helped me greatly in putting the last piece in my current "puzzle".
Thanks again. I'd like to buy a round of +'s, for the house.
--Chris
UPDATE; I forgot to mention. The reason I was feeding the file to Perl is
1) That's what worked best for me with sed.
2) It seemed the easiest way to experiment getting a correct match with Perl.
Yes. What say about me, is true.
| [reply] |
|
For fiddling with little bits of code, just use the debugger straight away:
swedish_chef> perl -demo
Loading DB routines from perl5db.pl version 1.32
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
main::(-e:1): mo
DB<1> $string = "one two three four"
DB<2> x $string =~ m/(\w+)/g
0 'one'
1 'two'
2 'three'
3 'four'
Note that "my" variables don't work as expected, I think they get created in the Debug scope, and not in the interpreted scope. But otherwise, have fun in the sandbox.
-QM
--
Quantum Mechanics: The dreams stuff is made of
| [reply] [d/l] |
Re^2: Perl RE; how to capture, and replace based on a block?
by taint (Chaplain) on Dec 18, 2013 at 07:11 UTC
|
Thank you educated_foo.
" Keep adding "VAR," between the parens in #2 as long as Perl complains about 'Global symbol "VAR"...'
Someone should probably write an Acme:: module to do this automatically."
I'll be glad to. Just as soon as I figure this all out. :)
My biggest hangup, I think, is that I'm quite comfortable with sed. But sed is "greedy" by default, and while Perl RE can be. It's not, by default, and that's what I need here (not greedy).
s/\<\/div\>/,/\<\/body\>/
will match my pattern in sed. But it will match from the first </div> till the first </body>. Which is too much.
Thanks again for the response, educated_foo
--Chris
Yes. What say about me, is true.
| [reply] [d/l] |
|
By default, Perl RE are greedy. Have you considered the possibility that the end of line might be more than \n (if the file is coming from Windows, for example)?
| [reply] |
|
Greetings, Laurent_R, and thanks for the reply.
Oh yes. I'm keen on the \n v \r v \n\r thing, and you're absolutely correct. Except, in my case, I'm on a *NIX box, and I've written the files myself. So I know they're utf-8 (no BOM), with newlines, no "hard" returns. :)
Maybe it's just the examples I was reading (perlrequick, perlretut, and perlfaq6) but I got the impression that Perl RE wasn't greedy. More Perl RE reading, I guess.
Thanks again, for the response Laurent_R.
--Chris
Yes. What say about me, is true.
| [reply] |
|
|
|
... sed ...
here is my test program
use re 'debug';
$_ = q{</div>
</body>};
print 'does it match ', int m{\<\/div\>\n\<\/body\>};
| [reply] [d/l] |