Re: How to parse not closed HTML tags that don't have any attributes?


"be consistent"
	PerlMonks

Re: How to parse not closed HTML tags that don't have any attributes?

by jcb (Parson)

on Mar 06, 2021 at 21:41 UTC ( [id://11129215]=note: print w/replies, xml )

Need Help??

in reply to How to parse not closed HTML tags that don't have any attributes?

I suggest HTML::Parser and a state machine.

You want three states:

idle
find telephone number item
extract telephone number

Start in idle state and transition to "find telephone number item" when you get a start event for a P tag with class="title", then transition from that to "extract telephone number" when you get a text event containing "Telephone", otherwise return to idle state at the next text event. In "extract telephone number" state, store away the phone number at the first text event matching m/[[:digit:]]/ and return to idle state. If you only have one telephone number per page, you can also abort the parse at that point.

See the documentation for HTML::Parser for more details about that module and any good computer science text for more details about using finite state machines as parsers.

Comment on Re: How to parse not closed HTML tags that don't have any attributes? Select or Download Code

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://11129215]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others taking refuge in the Monastery: (12)

As of 2024-04-23 08:42 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found