Re: Being a heretic and going against the party line.

by Chmrr (Vicar)
on Oct 03, 2002 at 13:23 UTC

in reply to Being a heretic and going against the party line.

I can certainly see your point, and personally I would never downvote anyone for giving an example which worked and which fit the requirements.

Personally, I find that using the HTML::* modules makes the code cleaner, more compact, and easier to understand what's doing on. For example, here's how I'd write the bit of code in question, using HTML::TreeBuilder (my personal favorite for doing such manipulations):

#!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TreeBuilder; my $html = get("") or die "Getting html: $!" +; my $tree = HTML::TreeBuilder->new_from_content($html) or die "Building html tree: $!"; $tree = $tree->look_down("_tag"=>"a", "href"=>"") ->look_up("_tag","tr"); $tree->objectify_text(); print join ' ', map {$_->attr("text")} $tree->look_down("_tag","~text" +);

Whenever I see a huge regex, I must admit that my eyes generally glaze over slightly. Even though ones such as the one you gave are not actually all that complex, they tend to look rather intimidating.

A personal anecdote on the use of HTML::* modules for parsing; a while back, I wrote a program which, given an ISBN number, would look up basic information off to Amazon, such as title, author, possibly series, and so on. This was nearly a year ago. Just last week, someone asked me if I still had the code around. I dug around and ran it -- and, lo and behold, it spat back information. Despite that Amazon had rearranged the webpage significantly over that time, the extractor still worked.

I shan't just tell you to drink the kool-aid, but in general most of my solutions, if I have the choice at all, will use HTML::* modules. Why? I've placed my trust in them many a time, and they have yet to let me down. I will suggest that others will do the same, but if they choose not to -- well, that is their choice, and they may well be right or wrong down the road. It's their kool-aid. ;>

