http://qs321.pair.com?node_id=11129209


in reply to How to parse not closed HTML tags that don't have any attributes?

Consider this Mojo::DOM example, I've made some assumptions as your source data does not seem complete:

cat dragnet.pl #!/usr/bin/perl use strict; use warnings; use Mojo::DOM; use feature 'say'; my $html = '<div class="phone"> <div class="icon"></div> <p class="title">Telephone</p> <p>0123-4 56 78 90</p> <p class="title">Telefax</p> <p>just the fax ma\'am</p> </div>'; my $dom = Mojo::DOM->new( $html ); my $phone = $dom->at('div.phone > p:nth-of-type(2)')->text; say $phone; my $fax = $dom->at('div.phone > p:nth-of-type(4)')->text; say $fax;

Prints:

0123-4 56 78 90 just the fax ma'am

Let us know if you have any problems or your input data is somehow weirder.

Update: Sorry, late in the day on a Saturday here. Since the HTML isn't valid, and I'm guessing you can't change that try:

#!/usr/bin/perl use strict; use warnings; use Mojo::DOM; use Mojo::Util qw(trim); use feature 'say'; my $html = '<div class="phone"> <div class="icon"></div> <p class="title">Telephone</p> <p>0123-4 56 78 90 <p class="title">Telefax</p> <p>just the fax ma\'am </div>'; my $dom = Mojo::DOM->new( $html ); my $phone = trim( $dom->at('div.phone > p:nth-of-type(2)')->text ); say $phone; my $fax = trim( $dom->at('div.phone > p:nth-of-type(4)')->text ); say $fax;

Which still prints:

0123-4 56 78 90 just the fax ma'am