Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: How can use Perl to strip away some nested HTML markup code, like <SCRIPT> ?

by Pedro Picasso (Sexton)
on Oct 15, 2003 at 14:10 UTC ( [id://299416]=note: print w/replies, xml ) Need Help??


in reply to How can use Perl to strip away some nested HTML markup code, like <SCRIPT> ?

Let's say you have some html like this:
<b>I like</b> <i>squirrels!</i>.
You could use this:
$html =~ s/<[^>]*>([^<]*)<\/[^>]*>/$1/gs;
To turn it into this:
I like squirrels.
{QandAEditors note: merlyn points out by way of followup that the above regexp only works for simple HTML, and that in real life HTML, the regexp can't be counted upon to not fail. See the followup for details. }

Replies are listed 'Best First'.
•Re: Answer: How can use Perl to strip away some nested HTML markup code, like <SCRIPT> ?
by merlyn (Sage) on Oct 15, 2003 at 15:02 UTC
    Sure, that works for simple HTML, but real life HTML can fail on such a simple regex. For example:
    <!-- > this is still the comment --> and some more text
    In that case, "this is still the comment" would be left within the output, when it shouldn't be.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://299416]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-25 16:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found