This is PerlMonks "Mobile"

Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  


in reply to How to parse not closed HTML tags that don't have any attributes?

Consider this Mojo::DOM example, I've made some assumptions as your source data does not seem complete:

cat dragnet.pl #!/usr/bin/perl use strict; use warnings; use Mojo::DOM; use feature 'say'; my $html = '<div class="phone"> <div class="icon"></div> <p class="title">Telephone</p> <p>0123-4 56 78 90</p> <p class="title">Telefax</p> <p>just the fax ma\'am</p> </div>'; my $dom = Mojo::DOM->new( $html ); my $phone = $dom->at('div.phone > p:nth-of-type(2)')->text; say $phone; my $fax = $dom->at('div.phone > p:nth-of-type(4)')->text; say $fax;

Prints:

0123-4 56 78 90 just the fax ma'am

Let us know if you have any problems or your input data is somehow weirder.

Update: Sorry, late in the day on a Saturday here. Since the HTML isn't valid, and I'm guessing you can't change that try:

#!/usr/bin/perl use strict; use warnings; use Mojo::DOM; use Mojo::Util qw(trim); use feature 'say'; my $html = '<div class="phone"> <div class="icon"></div> <p class="title">Telephone</p> <p>0123-4 56 78 90 <p class="title">Telefax</p> <p>just the fax ma\'am </div>'; my $dom = Mojo::DOM->new( $html ); my $phone = trim( $dom->at('div.phone > p:nth-of-type(2)')->text ); say $phone; my $fax = trim( $dom->at('div.phone > p:nth-of-type(4)')->text ); say $fax;

Which still prints:

0123-4 56 78 90 just the fax ma'am