Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: Removing text between HTML tags

by perll (Novice)
on Sep 23, 2014 at 10:10 UTC ( #1101619=note: print w/replies, xml ) Need Help??


in reply to Re: Removing text between HTML tags
in thread Removing text between HTML tags

Thanks,  s/<.+?>//g; is awesome, it removes all html tags, but I agree will that I should use HTML Parser, as I have parser thousands of URL and it is a big risk to use regex. Also, I found that website generates XML pages so we can parse :) so any XML parser you can suggest? I found XML::Parser and will try that.

Replies are listed 'Best First'.
Re^3: Removing text between HTML tags
by choroba (Archbishop) on Sep 23, 2014 at 21:19 UTC
    I prefer XML::LibXML which can handle HTML as well. XML::Twig is also quite popular. They are both a bit higher level than XML::Parser.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re^3: Removing text between HTML tags
by Laurent_R (Canon) on Sep 23, 2014 at 17:53 UTC
    That's the XML parser that I would have recommended for a start, but I do not use very much XML, and it is usually simple and well-formed XML, so that I don't need anything fancier and did not really try others.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1101619]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2021-01-27 08:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?