Re: Problem with <> and regex


Think about Loose Coupling
	PerlMonks

Re: Problem with <> and regex

by choroba (Cardinal)

on Mar 11, 2014 at 15:30 UTC ( [id://1077861]=note: print w/replies, xml )

Need Help??

in reply to *fixed*Problem with <> and regex

It seems you are trying to handle HTML with regexes. It is a painful way. Instead, take a look at a real parsers to help you: HTML::TreeBuilder, XML::LibXML.

For example, in XML::XSH2, a wrapper around XML::LibXML, you can write just

open :F html file.html ;
my $words = //span[@itemprop="author"]/text() ;
[download]

لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Comment on Re: Problem with <> and regex Select or Download Code

Replies are listed 'Best First'.
Re^2: Problem with <> and regex by AnomalousMonk (Archbishop) on Mar 11, 2014 at 22:57 UTC
People often object that using a full-blown HTML/XML parser on "just a simple string" is overkill: it's "too much code". The reply to this is that a "simple string" all too often becomes complicated (*ML is, after all, a complicated spec), and then the overhead of maintaining a regex-based solution can explode. Do you know of a tutorial or discussion on this or any site along the lines of Dominus's Why it's stupid to `use a variable as a variable name' that addresses "Why It's Stupid to Parse HTML/XML With Regexes"?	[reply]
Re^3: Problem with <> and regex by choroba (Cardinal) on Mar 11, 2014 at 23:09 UTC
I usually link to this question on StackOverflow. Its top answer is quite funny, but some of the other answers are more informative. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://1077861]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others sharing their wisdom with the Monastery: (4)

As of 2024-04-25 05:01 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found