XP is just a number | |
PerlMonks |
in reply to How to parse not closed HTML tags that don't have any attributes?
The HTML is indeed brokeninconsistent, and you've only shown one sample, so any example code will be correspondingly brittle. Like marto, I would suggest Mojo::DOM, as it has an IMHO nice interface, and it is still able to parse that HTML.
use warnings; use strict; use Mojo::DOM; use Mojo::Util qw/trim/; use Data::Dump; my $dom = Mojo::DOM->new(<<'HTML'); <div class="phone"> <div class="icon"></div> <p class="title">Telephone</p> <p>0123-4 56 78 90 <p class="title">Telefax</p> <p> </div> HTML my %hash = @{ $dom->find('p.title')->map(sub { return ( trim($_->text), trim($_->next->text) ) }) }; dd \%hash; __END__ { Telefax => "", Telephone => "0123-4 56 78 90" }
Update: Assuming you've got a lot of other <div>s in your HTML, you may want to change the expression in ->find() to '.phone p.title'.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: How to parse not closed HTML tags that don't have any attributes? (updated)
by tobyink (Canon) on Mar 08, 2021 at 13:24 UTC | |
by haukex (Archbishop) on Mar 08, 2021 at 15:30 UTC | |
Re^2: How to parse not closed HTML tags that don't have any attributes? (updated)
by Rantanplan (Novice) on Mar 07, 2021 at 13:52 UTC | |
by haukex (Archbishop) on Mar 07, 2021 at 14:03 UTC | |
by Rantanplan (Novice) on Mar 07, 2021 at 16:40 UTC | |
by haukex (Archbishop) on Mar 07, 2021 at 17:47 UTC | |
by Rantanplan (Novice) on Mar 08, 2021 at 14:39 UTC | |
|