Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

printing to filehandles

by technofrog (Initiate)
on Sep 02, 2005 at 06:58 UTC ( [id://488562]=perlquestion: print w/replies, xml ) Need Help??

technofrog has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have written the following code to look at a table on a website of foreign exchange currency data and extract the bid and ask price, at an interval of 5 minutes, and store these in some CSV files with differing timestamps. My script seems to open the filehandles fine(it creates the files that need to be written to), but the actual files turn out blank upon inspection. I have run through this script with the debugger for hours today, and it is finding my regular expressions and everything, my print commands just seem to be broken. Any help would be much appreciated. Thanks, David.
#!/usr/bin/perl -w use LWP::Simple; use strict; my $times = 1; my $oldday = 0; my $oldmin = 0; my $oldmon = 0; my $pagesource; my @pagesource; my $index; open(OUT, ">/Documents/Applesauce/Scripts/cumulative.csv") || die; while ($times != 0){ my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localti +me time; if ($min >= $oldmin+5 || $oldmin-$min >= 54){ if ($mday != $oldday){ print "Day over!\n"; close DAY; open(DAY, ">/Documents/Applesauce/Scripts/$mday.csv") || die; $oldday = $mday; } if ($mon != $oldmon){ print "Month over!\n"; close MONTH; open(MONTH, ">/Documents/Applesauce/Scripts/$mon.csv") || die; $oldmon = $mon; } $pagesource = get('http://fxtrade.oanda.com') || die; print "Got website!\n"; @pagesource = split(/\n/, $pagesource); $index=0; while ($index < @pagesource){ if ($pagesource[$index] =~ /EUR\/USD/)} print OUT "EUR\/USD," || die; print DAY "EUR\/USD," || die; print MONTH "EUR\/USD," || die; print OUT "$mon:$mday:$year:$hour:min," || die; print DAY "$mday:$hour:$min," || die; print MONTH "$mday:$hour:$min," || die; $pagesource[$index+1] =~ /color=\#666666>(.+)<\/font>/; print "$1\n" || die; print OUT "$1," || die; print DAY "$1," || die; print MONTH "$1," || die; $pagesource[$index+2] =~ /color=\#666666>(.+)<\/font>/; print "$1\n" || die; print OUT "$1\n" || die; print DAY "$1\n" || die; print MONTH "$1\n" || die; } $index++; } print "$hour:$min\n"; $oldmin = $min; } }

Replies are listed 'Best First'.
Re: printing to filehandles
by davido (Cardinal) on Sep 02, 2005 at 07:18 UTC

    Your code indenting is very difficult to follow. Try to use indenting-whitespace to make code more legible. perlstyle is a good guide to start with.

    As for your problem, my recommendation is to check the return values of your various print statements like this:

    print OUT "$1\n" or die "Couldn't print to OUT at line __LINE__:\n$!\n";

    You may be surprised to see:

    Couldn't print to OUT at line 23: Bad file descriptor

    With warnings turned on, are you seeing any warnings like...

    print() on closed filehandle OUT at line 23.

    I'm just making up 'line 23'... could be anything. The point is, I think you may be attempting to print to closed filehandles.

    It's also possible that you're never satisfying the conditions of the if() block that actually prints to the files. Why not put in a little check with a print statement that prints "Printing to files...\n". That way at least you'll know whether or not you're entering the if block that does all the output.


    Dave

      Thanks for your quick reply! I went back through the code and attempted to redo the indentions - hope that hepled. I added the "die's" to the end of the print commands, but nothing died...it seems to think that it's actually printing. (No warnings either.) Also, I have some prints to STDOUT during the print chunk of the code that happens when the if() block is executed that happen just fine. Could this have something to do with factors other than the code itself such as file permissions or something? I'm running Mac OSX (Darwin). I learned perl/UNIX on a pre-configured HPUX workstation at work this summer, and I have not done any configuration to this machine except add an alias to my .bashrc file and load some perl modules. Thanks again! David
        It's quite possible that this is related to something outside of perl, hence the suggestion to check the $! error string:
        open(FH, '>', $outfile) or die "can't open $outfile: $!";
        This should tell you more about what's going on.
Re: printing to filehandles
by Roger (Parson) on Sep 02, 2005 at 10:54 UTC
    You have to flush your file handle after printing to write to the disk immediately, otherwise it gets conveniently buffered in memory.
    use FileHandle; ... open OUT, ...; autoflush OUT 1; print OUT ...;

    The alternative is to close the file after you have finished reading, and open it again next time before writing to it. Keep the time that the file is openned as short as possible.

Re: printing to filehandles
by graff (Chancellor) on Sep 02, 2005 at 20:07 UTC
    It looks like you want this to run indefinitely ($times is never set to zero anywhere in the while loop), so you may want to reduce the load it places on your system (and simplify the code and the extent of indentation) by removing this:
    if ($min >= $oldmin+5 || $oldmin-$min >= 54){
    (and its matching close-bracket) and putting "sleep 60 * 5;" just before the close bracket of the while loop.

    Since you'll only be writing a small amount of data to each output file once every 5 minutes, you'll want to make sure that autoflushing is turned on for each file handle, or else (better yet) always  open( HANDLE, ">>$filename" ) or die "$filename: $!"; on each output before writing, and always close the outputs after writing, on every iteration. As for other minor details:

    You initialize "$oldmon" to zero, then you set "$mon" via "localtime time", and then you say:

    if ($mon != $oldmon){ ...
    You should read "perldoc -f localtime", and work out what your code would do if you started it up in January.

    Your regexes for pulling the target data out of the web page will need to be fixed when that web site decides to change its appearance or other irrelevant details. (For example, when I checked the page source for that url just now, I saw that the third cell in each table row has quotes on the 'color=' attribute value of the "td" tag, which won't get matched in your code.)

    Some monks might suggest using an HTML parser, but here's a slightly modified version of your approach that might make things easier and somewhat more robust:

    my @pagesource = split( /<\/tr>/, $pagesource ); my $found = 0; for ( @pagesource ) { if ( /(EUR\/USD).*?(\d+\.\d+).*?(\d+\.\d+)/s ) { # open output files here... print OUT "$1,$mon:$mday:$year:$hour:min,$2,$3\n"; print DAY "$1,$mday:$hour:$min,$2,$3\n"; print MONTH "$1,$mday:$hour:$min,$2,$3\n"; $found++; # close output files here... last; } } print "No EUR/USD data found.\n" unless ( $found );
    Note that this will not print anything from the page if the regex doesn't match, but at least you'll get a notification on STDOUT that the match is failing.
Re: printing to filehandles
by monkey_boy (Priest) on Sep 02, 2005 at 08:48 UTC
    have you run out of disk space?



    This is not a Signature...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://488562]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-16 13:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found