comment on

Hi,

It is generally not recommended to use regex matches to parse HTML files.

Instead as swkronenfeld pointed out its better to use the CPAN module HTML::Parser

Below is an example of its usage.

#!/usr/bin/perl

use Modern::Perl;
use autodie;
use HTML::Parser ();

my $p = HTML::Parser->new(
        start_h => [\&start, 'tagname, attr'],
);

open my $fh, '<', shift;
$p->parse_file($fh);
$fh->close;

sub start {
        my ($tag_name, $attrs) = @_;
        return unless $tag_name eq 'div';
        say 'sample Text' if exists $attrs->{class} 
                and $attrs->{class} and $attrs->{class} =~ /^lastUnit.
+*/;
}
[download]

-Kiel

In reply to Re: How to grab a portion of file with regex by kielstirling
in thread How to grab a portion of file with regex by romy_mathew

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Syntactic Confectionery Delight
	PerlMonks