note
valdez
<p>For the parsing part I would use [cpan://HTML::TokeParser::Simple], wrote by our brother [Ovid]; here it is an example from the documentation:<br />
<code>
use HTML::TokeParser::Simple;
my $p = HTML::TokeParser::Simple->new( $somefile );
while ( my $token = $p->get_token ) {
# This prints all text in an HTML doc (i.e., it strips the HTML)
next unless $token->is_text;
print $token->as_is;
}
</code>
Nice, isn't it? HTML parsing is not easy as it may seem, relying on a well written module is not a sin :)
</p>
<div class="pmsig"><div class="pmsig-166227">
<p>Ciao, Valerio</p>
</div></div>
607885
607885