note
hsmyers
In almost any situation, there are those who prefer maxims to thinking---ignore them. With specific regard to the odd web scrape, there are a couple of things that can be a problem. <ul><li>The need to handle nesting.</li><li>The need to survive arbitrary changes in the source.</li></ul> The first can be handled by tight expression bounding or by using something like <a href="http://search.cpan.org/author/NWALSH/DelimMatch-1.06a/DelimMatch.pm">Text::DelimMatch</a> or <a href="http://search.cpan.org/author/DCONWAY/Text-Balanced-1.95/lib/Text/Balanced.pm">Text::Balanced</a>. You manage the second by vigilance---patch it when needed.</p>
<p>
As for those who knee jerk instead of thinking, since many of them have no experience witting parsers (formal or ad hoc) they fail to see that the regex approach is a form of parsing, just without the overhead of dealing with things that don't matter.
</p>
<p>--hsm</p>
<em>"Never try to teach a pig to sing...it wastes your time and it annoys the pig."</em>
272213
272213