"be consistent" | |
PerlMonks |
Re: How to parse not closed HTML tags that don't have any attributes?by jcb (Parson) |
on Mar 06, 2021 at 21:41 UTC ( [id://11129215]=note: print w/replies, xml ) | Need Help?? |
I suggest HTML::Parser and a state machine. You want three states:
Start in idle state and transition to "find telephone number item" when you get a start event for a P tag with class="title", then transition from that to "extract telephone number" when you get a text event containing "Telephone", otherwise return to idle state at the next text event. In "extract telephone number" state, store away the phone number at the first text event matching m/[[:digit:]]/ and return to idle state. If you only have one telephone number per page, you can also abort the parse at that point. See the documentation for HTML::Parser for more details about that module and any good computer science text for more details about using finite state machines as parsers.
In Section
Seekers of Perl Wisdom
|
|