Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: HTML::TableExtract - ugly - is there better way?

by poj (Abbot)
on Apr 09, 2017 at 14:52 UTC ( [id://1187526]=note: print w/replies, xml ) Need Help??


in reply to HTML::TableExtract - ugly - is there better way?

If you know the characters you want, just eliminate everything else.

#!perl use strict; use warnings; use HTML::TableExtract; use LWP::UserAgent (); my $url = 'http://www.nasdaq.com/extended-trading/premarket-mostactive +.aspx'; my $headers = ['Symbol', 'Last Sale*', 'Change Net / %', 'Share Volume +']; my $ua = LWP::UserAgent->new; $ua->timeout(10); $ua->env_proxy; my $response = $ua->get($url); if ( !$response->is_success) { die $response->status_line; } my $htm = $response->decoded_content; # table4 my $table_extract = HTML::TableExtract->new( count => 4, headers => $headers); my $tbl = $table_extract->parse($htm); my $data = cleanup($tbl); report('Advances',$data); # table5 $table_extract = HTML::TableExtract->new( count => 5, headers => $headers); $tbl = $table_extract->parse($htm); $data = cleanup($tbl); report('Decliners',$data); sub cleanup { my $table = shift; my @data = (); for my $row ($table->rows) { my @clean = map{ s/[^A-Z0-9%,+-\.]/ /g; # allowable s/^ +| +$//g; # trim spaces $_ } @$row; push @data,\@clean; } return \@data; } sub report { my ($title,$data) = @_; print "$title\n"; for (@$data){ my ($stock,$openpr,$tmp,$vol) = @$_; my ($change,$pct) = split / +/,$tmp; my $closepr = $openpr - $change; print join "\t",($stock,'$'.$closepr,'$'.$openpr,$pct,$vol); print "\n"; } }
poj

Replies are listed 'Best First'.
Re^2: HTML::TableExtract - ugly - is there better way?
by rtwolfe (Initiate) on Apr 10, 2017 at 03:25 UTC
    POJ - WOW! Impressed by how quick one gets a response. Thank you for adding code to pull the webpage in the script. Also, need to learn to be 'smarter' about parsing. Never occurred to me to just take out the special characters and replace with space. Using the 'pipe' to drop leading and trailing spaces is cool too. Haven't run into Map before but read a little about it today. Need to understand what @$row is/does. Using subroutines was helpful too. This beginner is very appreciative.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1187526]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-20 08:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found