http://qs321.pair.com?node_id=589089


in reply to Process a HTML file to get information from it.

Depending on the surrounding HTML and how static your source is, you might be able to get by without the Parser.

Perhaps you could just use a regular expression in quick-but-dirty fashion like this:

/pdf.+?>(.+?)<.+span>(\d{9})<\/span>/;
Then your data might be in $1 and $2. What have you tried?

non-Perl: Andy Ford

Replies are listed 'Best First'.
Re^2: Process a HTML file to get information from it.
by Griffler (Novice) on Dec 11, 2006 at 17:22 UTC
    I was using the code sample from the HTML::Parser mod and it parsed out all the href's but I could not figure out how to get the 9 digit number after Here is the code for that I was using
    use HTML::Parser; my $p = HTML::Parser->new(api_version => 3, start_h => [\&a_start_handler, "self,tagname +,attr"], report_tags => [qw(a img)], ); $p->parse_file(shift || die) || die $!; sub a_start_handler { my($self, $tag, $attr) = @_; return unless $tag eq "a"; return unless exists $attr->{href}; print "A $attr->{href}\n"; $self->handler(text => [], '@{dtext}' ); $self->handler(start => \&img_handler); $self->handler(end => \&a_end_handler, "self,tagname"); } sub img_handler { my($self, $tag, $attr) = @_; return unless $tag eq "img"; push(@{$self->handler("text")}, $attr->{alt} || "[IMG]"); } sub a_end_handler { my($self, $tag) = @_; my $text = join("", @{$self->handler("text")}); $text =~ s/^\s+//; $text =~ s/\s+$//; $text =~ s/\s+/ /g; print "T $text\n"; $self->handler("text", undef); $self->handler("start", \&a_start_handler); $self->handler("end", undef); }
    The file has a ton of other stuff in it but the what I posted is the main guts.