Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Process a HTML file to get information from it.

by andyford (Curate)
on Dec 11, 2006 at 17:13 UTC ( [id://589089]=note: print w/replies, xml ) Need Help??


in reply to Process a HTML file to get information from it.

Depending on the surrounding HTML and how static your source is, you might be able to get by without the Parser.

Perhaps you could just use a regular expression in quick-but-dirty fashion like this:

/pdf.+?>(.+?)<.+span>(\d{9})<\/span>/;
Then your data might be in $1 and $2. What have you tried?

non-Perl: Andy Ford

Replies are listed 'Best First'.
Re^2: Process a HTML file to get information from it.
by Griffler (Novice) on Dec 11, 2006 at 17:22 UTC
    I was using the code sample from the HTML::Parser mod and it parsed out all the href's but I could not figure out how to get the 9 digit number after Here is the code for that I was using
    use HTML::Parser; my $p = HTML::Parser->new(api_version => 3, start_h => [\&a_start_handler, "self,tagname +,attr"], report_tags => [qw(a img)], ); $p->parse_file(shift || die) || die $!; sub a_start_handler { my($self, $tag, $attr) = @_; return unless $tag eq "a"; return unless exists $attr->{href}; print "A $attr->{href}\n"; $self->handler(text => [], '@{dtext}' ); $self->handler(start => \&img_handler); $self->handler(end => \&a_end_handler, "self,tagname"); } sub img_handler { my($self, $tag, $attr) = @_; return unless $tag eq "img"; push(@{$self->handler("text")}, $attr->{alt} || "[IMG]"); } sub a_end_handler { my($self, $tag) = @_; my $text = join("", @{$self->handler("text")}); $text =~ s/^\s+//; $text =~ s/\s+$//; $text =~ s/\s+/ /g; print "T $text\n"; $self->handler("text", undef); $self->handler("start", \&a_start_handler); $self->handler("end", undef); }
    The file has a ton of other stuff in it but the what I posted is the main guts.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://589089]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-25 17:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found