Am a beginner so be 'gentle'. Have built a program to pull data from a NASDAQ page for certain stocks. HTML::TableExtract looked like a slick way to go. Nasdaq will tweak this page occasionally and thought by manipulating headers and the count, I could easily keep the script working. And yes, this current script does accomplish my goal. But thought this was a 'teachable moment' to learn more about Perl.
I put this line in a windows batch file to dump the page data to a file for processing.
perl -e "use LWP::Simple; getprint('http://www.nasdaq.com/extended-tra
+ding/premarket-mostactive.aspx')" >> nasdaq-stocks.txt
My biggest issue is that when I pull the HTML file data through HTML::TableExtract, there is a lot of clean up work to get the tab delimited format wanted. But to get there, I fell back to writing to files and parsing / substituting rows/lines until I got in the final format.
Here's what the final (good) output looks like.
AMRI $13.53 $14.75 9.02% 6,984
AUPH $6.59 $7.07 7.28% 632,035
ATEC $2.19 $2.30 5.02% 3,880
SBLK $12.13 $12.71 4.78% 10,123
OCLR $8.95 $9.29 3.80% 147,875
FRSH $5.79 $6 3.63% 6,100
KTOS $7.88 $8.16 3.55% 6,901
INCY $135.5 $139.75 3.14% 6,734
TVIX $35.4 $36.45 2.97% 234,847
OSUR $12.3 $12.65 2.85% 4,500
Here's my code - I tried to enter comments to explain what I was thinking. Am hoping there is a cleaner way to use HTML::TableExtract to get real close to the final tab delimtied file.
Assume pulling apart the characters between the open price and
change percent, is pretty tricky but can't the rest of the fields get dropped directly to a tab delimited file without the extraneous junk?
use strict;
use warnings;
use HTML::TableExtract;
#Get HTML file and set up headers for HTML::TableExtract
my $doc = 'nasdaq-stocks.txt';
my $headers = ['Symbol', 'Last Sale*', 'Change Net / %', 'Share Volume
+'];
#table 4 is advances. Need to do again for 5 decliners
my $table_extract = HTML::TableExtract->new(count => 4, headers => $he
+aders);
#parse the nasdaq-stocks.txt file and print to outup-temp.txt file
#?? found this code.
#Is the code below taking HTML loaded in string $table and
#breaking into rows to print to a file???
$table_extract->parse_file($doc);
my ($table) = $table_extract->tables;
open (UPFILE, '>outup-temp.txt');
for my $row ($table->rows) {
print UPFILE @$row, "\n";
}
close(UPFILE);
#tried to add the Substitutes below to the loop above
#but failed miserably
#.. am taking outup-temp.txt
#and load the array @lines for removing junk in the loop below
my $filename = 'outup-temp.txt' ;
open my $fh , '<' , $filename or die "Cannot read '$filename': $!\n" ;
my @lines = <$fh> ;
close $fh ;
# process the array @lines and remove some of the junk
for ( @lines ) {
s/^\s+// ; # No need for global substitution
s/[\x0A\x0D]{3,}/\t/g; # 3 CR LF become a tab
#double tab-change to one tab - never got this to work??
# s/[\x09]{2,}/\t/g;
s/\$//g; # Substitute all dollar signs with nothing
s/\x20/\t/g; # space becomes a tab
# Change chars between open and change pct to tab
s/\xC2\xA0\xE2\x96\xB2\xC2\xA0/\t/;
}
#write cleaned lines to outup-temp.txt
open $fh , '>' , $filename or die "Cannot write '$filename': $!\n" ;
print $fh @lines ;
close $fh ;
# now that we have some tab delimiters, use split to break out the
# fields and calculate the closing price, then write to file
my $stock;
my $filler1;
my $openpr;
my $change;
my $pct;
my $vol;
my $filler2;
my $closepr;
open (FILE, 'outup-temp.txt');
open STDOUT, '>', "outup.txt";
while (<FILE>) {
chomp;
($stock,$filler1,$openpr,$change,$pct,$vol,$filler2) = split("\t")
+;
#calculate closing price from prior day for advancers
$closepr = $openpr-$change;
#add back $ signs - print tab delimited fields to file
print "$stock\t\$$closepr\t\$$openpr\t$pct\t$vol\n";
}
close(FILE);
close (STDOUT);
Thanks in advance. Have googled many things from this website that helped to get my kludg-ie code working