http://qs321.pair.com?node_id=274230

svsingh has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to pull the title and h1 out of an HTML file (local). I figured this is a little too simple to use any of the HTML parsing modules and I'm using a simple match. The HTML file is guaranteed to have only one h1.

Here's what I'd like to do ...

$/ = '</h1>'; my $chunk = <HTMFILE>; $chunk =~ m%<title>(.+)</title>.*<h1>(.+)</h1>%i;

... which returns a pair of undefs. If I split the match over a couple of lines, however, everything works out just fine. Here's what's working:

$/ = '</h1>'; my $chunk = <HTMFILE>; $chunk =~ m%<title>(.+)</title>%i; my $title = $1; $chunk =~ m%<h1>(.+)</h1>%i; my $heading = $1;

The best explanation I can think of is .* only matches up to a certain number of characters. My test file has 3750 characters between </title> and <h1>. Is that what's happening here?

Thanks for your help.