Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Re: Dealing with Word Compact HTML

by matija (Priest)
on Apr 14, 2004 at 15:07 UTC ( #345082=note: print w/replies, xml ) Need Help??

in reply to Re: Dealing with Word Compact HTML
in thread Dealing with Word Compact HTML

Don't forget he could use HTML::TokeParser or even HTML::TokeParser::Simple.

Replies are listed 'Best First'.
Re: Re: Re: Dealing with Word Compact HTML
by format_c (Initiate) on Apr 14, 2004 at 22:45 UTC
    I tried a bit with HTML::Parser an I hate it because I think it's complicated to use. But parsing HTML with RegEx quickly become more complicated than parsing with HTML::Parser. So here's my snippet and I hope it'll help you:
    # This script will extract text which is incuded in <b> use strict; use HTML::Parser; local $/; my $html = <DATA>; my $p = HTML::Parser->new(api_version => 3, start_h => [\&b_start_handler,"tagname,self"] ); $p->parse($html); sub b_start_handler { my ($tagname,$self) = @_; return unless $tagname eq 'b'; $self->handler(text => [], '@{dtext}' ); $self->handler(end => \&b_end_handler,"tagname,self"); } sub b_end_handler { my($tag,$self) = @_; my $text = join("", @{$self->handler("text")}); print "$text\n---\n"; $self->handler("text", undef); $self->handler("start", \&b_start_handler); $self->handler("end", undef); } __DATA__ <P class=para><a name="watch dog"></a><b>watch dog -</b> A big dog that makes sure that you don't do anything that you're not supposed to).</p> <p class=para><a name="WR"></a><b>wooden round </b> A big piece of ro und wood.</p>
    Greets Alex

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://345082]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2023-09-29 11:05 GMT
Find Nodes?
    Voting Booth?

    No recent polls found