Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Problem with <> and regex

by choroba (Cardinal)
on Mar 11, 2014 at 15:30 UTC ( [id://1077861]=note: print w/replies, xml ) Need Help??


in reply to *fixed*Problem with <> and regex

It seems you are trying to handle HTML with regexes. It is a painful way. Instead, take a look at a real parsers to help you: HTML::TreeBuilder, XML::LibXML.

For example, in XML::XSH2, a wrapper around XML::LibXML, you can write just

open :F html file.html ; my $words = //span[@itemprop="author"]/text() ;
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Replies are listed 'Best First'.
Re^2: Problem with <> and regex
by AnomalousMonk (Archbishop) on Mar 11, 2014 at 22:57 UTC

    People often object that using a full-blown HTML/XML parser on "just a simple string" is overkill: it's "too much code". The reply to this is that a "simple string" all too often becomes complicated (*ML is, after all, a complicated spec), and then the overhead of maintaining a regex-based solution can explode. Do you know of a tutorial or discussion on this or any site along the lines of Dominus's Why it's stupid to `use a variable as a variable name' that addresses "Why It's Stupid to Parse HTML/XML With Regexes"?

      I usually link to this question on StackOverflow. Its top answer is quite funny, but some of the other answers are more informative.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1077861]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-04-25 05:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found