Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Confirming what we already knew

by AssFace (Pilgrim)
on Mar 05, 2003 at 21:10 UTC ( [id://240717]=note: print w/replies, xml ) Need Help??


in reply to Confirming what we already knew

Okay - the code length difference is better explained - there were two other functions in there that were used for manipulating files after this was done executing - the C version doesn't have that.
So to be fair, this is the code with everything extra stuff stripped out, no real comments.
I also removed the output to a file at the end. It has benchmark code in there.
I renamed things, but it should still get accross the idea of what is being done.
use Benchmark; $tg = 3000; $fip = 0.03; $lfv = 41; $ccc = 20.00; $pond = 100; @rddt = (); $mbvr = 0; sub analyzeThis { my ($tickerName) = @_; open(TICKERDATA, $tickerName) or die "Can't open TICKERDATA: $!\n" +; my @allTickerData = <TICKERDATA>; close(TICKERDATA) or die "Can't close TICKERDATA: $!\n"; shift @allTickerData;#strip out the first row of descriptive text @rddt = (); #this also reorders the data from what was newest to oldest in the + array to now being oldest to newest (Better for our needs) for(my $i = 0; $i < scalar(@allTickerData); $i++){ my @tempArray = split(',' , $allTickerData[$i]); unshift @rddt , $tempArray[3]; } if(scalar(@rddt) > 1200){ my $loopCount = scalar(@rddt) - 1200; for(my $i = 0; $i < $loopCount; $i++){ shift @rddt; } } #################################################################### + #up to here is only run once and takes less than 1 wallclock second# #################################################################### + $bahh = ""; $bvr = 0; my $cond = int rand 6; my $oper = int rand 3; my $pon = int rand 2; my $amt = rand 20; $amt = sprintf("%0.2f", $amt); if($pon == 0){ $amt = $amt * -1; } my $ifPos = 0;#currently will stay this "always on" state for now $bahh = $cond . "," . $oper . "," . $amt . "," . $ifPos; for(my $mg = 0; $mg < $tg; $mg++){ my $shortLoopCount = (int rand 5) + 1;#loop at least one time, + max 5 times for(my $aa = 0; $aa < $shortLoopCount; $aa++){ my $modMe = int rand 7; if($modMe == 0){ $cond = int rand 6; } elsif($modMe == 1){ my $oper = int rand 3; } elsif($modMe == 2){ $pon = int rand 2; $amt = rand 20; $amt = sprintf("%0.2f", $amt); if($pon == 0){ $amt = $amt * -1; } } elsif($modMe == 3){ $ifPos = 0;#switched this to "always on" } elsif($modMe == 4){ my @tempArray = split(',',$bahh); my $tempcond = int rand 6; $cond = int ($tempArray[0]/2 + $tempcond/2); } elsif($modMe == 5){ my @tempArray = split(',',$bahh); my $tempoper = int rand 3; $oper = int ($tempArray[1]/2 + $tempoper/2); } elsif($modMe == 6){ my @tempArray = split(',',$bahh); my $tempamt = rand 20; my $tempPON = rand 2; $amt = $tempArray[2]/2 + $tempamt/2; $amt = sprintf("%0.2f", $amt); if($tempPON == 0){ $amt = $amt * -1; } } } my $car= $cond . "," . $oper . "," . $amt . "," . $ifPos; + my $csc = 0; my $cpp = 0; my $cnp = 0; my $tp = 0; for(my $i = 64; $i < (scalar(@rddt) - (2 * $pond)); $i++){ $cv = ccond($cond, $i); my $tv = mvsrt($cv, $oper, $amt); $checkValue = 0; if($tv == 0){ if($ifPos == 0){ for(my $ii = $i; $ii < ($i + $lfv + 1); $ii++){ + $checkValue = (($rddt[$ii]) - ($rddt[$i]))/$rd +dt[$i]; if($checkValue >= $fip){ $cpp++; last; } } } else{ if($rddt[$i + $lfv] < $rddt[$i]){ $cnp++; } } } else{ my $otherPos = 0; if($ifPos == 0){ $otherPos = 1; } if($otherPos == 0){ for(my $ii = $i; $ii < $i + $lfv + 1; $ii++){ $checkValue = (($rddt[$ii]) - ($rddt[$i]))/$rd +dt[$i]; if($checkValue >= $fip){ $cpp++; last; } } } else{ if($rddt[$i + $lfv] < $rddt[$i]){ $cnp++; } } } $tp++; } $csc = ($cpp + $cnp)/$tp; if($csc > $bvr){ $bvr = $csc; $bahh = $car; } } $mbvr = $bvr; return $bahh; } sub mvsrt{ my (@params) = @_; my $returnValue = 0; if($params[1] == 0){ if($params[0] < $params[2]){ $returnValue = 0; } else{ $returnValue = 1; } } elsif($params[1] == 1){ if($params[0] > $params[2]){ $returnValue = 0; } else{ $returnValue = 1; } } elsif($params[1] == 2){ if($params[0] == $params[2]){ $returnValue = 0; } else{ $returnValue = 1; } } return $returnValue; } sub ccond{ my (@params) = @_; my $returnValue = 0; if($params[0] == 0){ my $tt = 0; my $average = 0; for(my $i = $params[1] - 12; $i < $params[1]; $i++){ $tt = $tt + $rddt[$i]; } $average = $tt / 12; $returnValue = ($rddt[$params[1]] - $average); } elsif($params[0] == 1){ my $tt = 0; my $average = 0; for(my $i = $params[1] - 50; $i < $params[1]; $i++){ $tt = $tt + $rddt[$i]; } $average = $tt / 50; $returnValue = ($rddt[$params[1]] - $average); } elsif($params[0] == 2){ my $absMin = $rddt[$params[1]-5]; for(my $i = $params[1] - 5; $i < $params[1]; $i++){ if($rddt[$i] < $absMin){ $absMin = $rddt[$i]; } } $returnValue = ($rddt[$params[1]] - $absMin); } elsif($params[0] == 3){ my $absMin = $rddt[$params[1]-63]; for(my $i = $params[1] - 63; $i < $params[1]; $i++){ if($rddt[$i] < $absMin){ $absMin = $rddt[$i]; } } $returnValue = $rddt[$params[1]] - $absMin; } elsif($params[0] == 4){ my $absMax = 0; for(my $i = $params[1] - 5; $i < $params[1]; $i++){ if($rddt[$i] > $absMax){ $absMax = $rddt[$i]; } } $returnValue = $rddt[$params[1]] - $absMax; } elsif($params[0] == 5){ my $absMax = 0; for(my $i = $params[1] - 50; $i < $params[1]; $i++){ if($rddt[$i] > $absMax){ $absMax = $rddt[$i]; } } $returnValue = $rddt[$params[1]] - $absMax; } #no err check $returnValue = sprintf("%0.2f", $returnValue); return $returnValue; } ################# #main() ################# opendir(DATA_DIR,"data"); my @tickers = grep { $_ ne "." and $_ ne ".." and $_ ne "returns" } re +addir DATA_DIR; closedir(DATA_DIR); foreach(@tickers){ #check if the file exists and has data >= 1200 rows before it pass +es the ticker into the analysis program. my $lines = 0; open(FILE, "data/$_") or die "Can't open $_: $!"; while (sysread FILE, $buffer, 4096) { $lines += ($buffer =~ tr/\n//); } close FILE; #got the line count, now do the check if($lines >= 1200){ my $ttTime0 = new Benchmark; my $bestA = analyzeThis("data/$_"); my $ttTime1 = new Benchmark; my $ttDifference = timediff($ttTime1, $ttTime0); print "\n ttTime: " . timestr($ttDifference) . "\n"; print '************************' , "\n"; } }

Replies are listed 'Best First'.
Re: Re: Confirming what we already knew
by genecutl (Beadle) on Mar 06, 2003 at 00:18 UTC
    While I don't know how much of the difference between the perl and c code this makes up, after a cursory scan I do see a number of inefficiencies in the perl code. The first thing that jumps out at me are all the my calls inside the loops. If you are going to be declaring these variables so many times, it's more efficient to do so outside the loops. Here's a quick benchmark:
    Benchmark::cmpthese(5000, { 'outside' => sub { my $x; my $y; my @z; my $i; for ($i = 0; $i < 1000; $i++) { $x = $i; $y = $x; @z = ($x, $y) } }, 'inside' => sub { for (my $i=0; $i < 1000; $i++) { my $x = $i; my $y = $x; my @z = ($x, $y)} } }); Rate inside outside inside 120/s -- -13% outside 138/s 15% --
    So in this example, declaring the my variables outside of the loop gives a 15% speed up. Another problem I found was calculating the loop limiting condition in the for loop. e.g.,
    for(my $i = 64; $i < (scalar(@rddt) - (2 * $pond)); $i++){
    Since it doesn't look like the size of @rddt or the value of $pond is changing, you should do that calculation outside of the loop. Here are the benchmarks:
    @rddt = (1) x 1000; $pond = 32; Benchmark::cmpthese(500, { 'inside' => sub { for ( my $i = 64 ; $i < ( scalar(@rddt) - ( 2 * $pond ) ) +; $i++ ) { $x++; } }, 'outside' => sub { $limit = scalar(@rddt) - ( 2 * $pond ); for ( my $i = 64 ; $i < $limit ; $i++ ) { $x++; } }, }); inside 273/s -- -39% outside 450/s 65% --
    A 65% speed up here. There are probably lots of other optimizations that you could do in this perl code. Those two were the most obvious.
      *smacks forehead*
      I got sloppy with the "my"s for sure. Ugh. Normally I'm fairly careful about such things - but I think I slipped up here for sure.
      I went through and took out all of the "my" declarations and did put them outside of loops.

      The result of that benchmarked at 268 seconds - *but* this is on my laptop that normally does it in 305 seconds (Athlon M 1G on WinXP with Active State Perl and half a gig of RAM). So a good speed up so far.

      Then changing the calculation to be just before its for loop instead of up top.
      That brought ended up one second slower than the above code.
      So then moving it up above the highest loop made it go from 268 to to 286.
      Then just putting in a number there and having no calculation at all in there brings it down to 264 seconds.
      Doesn't seem like the improvement that I would expect to see after looking at your benchmark... but still an improvement - largely from that stupid "my" thing that I just screwed up on.
Re: Re: Confirming what we already knew
by perrin (Chancellor) on Mar 05, 2003 at 23:47 UTC
    This is the sort of code where I would expect to be faster in C. Lots of numerical comparisons, and not much else. However, there are a couple of things that jump out at me. One is your mvsrt() routine. I think you could remove that entirely by changing this line:
    my $tv = mvsrt($cv, $oper, $amt);
    to this:
    my $tv = ( ($cv <=> $amt) == ($oper - 1) );
    (Well, that changes the actual operation each value of $oper performs, but you get the idea.)

    Also, you have lots of C-style for loops which could be rewritten to use foreach. For example, change this:

    my $absMax = 0; for(my $i = $params[1] - 5; $i < $params[1]; $i++){ if($rddt[$i] > $absMax){ $absMax = $rddt[$i]; } } $returnValue = $rddt[$params[1]] - $absMax;
    to this:
    my $absMax = 0; # loop over array slice foreach my $rddt_value (@rddt[($params[1] - 5) .. $params[1]]) +{ if($rddt_value > $absMax){ $absMax = $rddt_value; } } $returnValue = $rddt[$params[1]] - $absMax;
    Foreach loops do tend to run quite a bit faster, and you have many of these.
      I read another suggestion and have taken out the "my" declarations within the loops, and I moved the limit declaration outside of the for loops.

      That brought me down from 305 to 264 seconds per ticker. (note that I am on a different machine that before - this is a laptop with an Athlon M 1G processor, half a gig of ram, WinXP, and Active State Perl.

      So using your suggestions to use my $tv = ( ($cv <=> $amt) == ($oper - 1) ); instead of the sub call.
      That, with the previous changes mentioned above, resulted in a new time of 289 seconds... so slower (and from what I can tell it corrupts the algorithm decision - so I'm going to scrap that one).

      So back to the way it was prior, and then replacing the for loops with foreach like you suggest gives me a new time of 237 seconds.

      So in the end I had a drop of nearly 70 seconds from what this code did prior to the optimizations. Over 2000 stocks that would save me over a day and a half of processing... but it is still not a huge difference (in comparison to what I saw in C that is).

      Had my mistakes being corrected brought the speed down to 20-30seconds per stock, I would have been very impressed - but for now, I still think I will use my method of coding it in Perl (perhaps sloppily) and then seeing from there what speed improvements are needed (if any) for it to be useful.

      UPDATE:
      Now that I'm back in on the P4 2G, I ran the updated code on that and it is now at 179 secs - previously at 196.
      I guess slight variations in speed changes come from the random loop variations each time it is run. Also I'm not sure what versions of ActiveState perl is on my laptop compared to here on this machine.
      For this code I've noticed that the Athlon tends to improve more easily than Intel - why that is, I don't know - perhaps cache sizes? No clue.
        That's funny, I just did a little benchmark and got a speedup of 6% by removing mvsrt. Not a huge difference though. Maybe my fake data isn't right.

        I'm not surprised that the C code is still much faster, and that is clearly the way to go with this, but it's cool to see a 22% speedup (305 secs down to 237) just from simple syntax-level changes.

Re^2: Confirming what we already knew
by Aristotle (Chancellor) on Mar 09, 2003 at 05:51 UTC

    Tons of numerics and lots of array lookups - the kind of job that lends itself well to C. That said, I see quite a bit of room in your Perl.

    There's loads of pointless temporary variables and intermediate assignments. Why do @params = @_? Just use @_ directly, there's nothing special about it. List::Util is also likely to hugely speed up parts of your job. Your if/elsif chains are not helping either. The ccond() function f.ex should be written along these lines, using aforementioned module:

    my @ccond = sub { $rddt[$_[0]] - sum(@rrdt[$_[0]-12 .. $_[0]-1]) / 12; }, sub { $rddt[$_[0]] - sum(@rrdt[$_[0]-50 .. $_[0]-1) / 50; }, sub { $rddt[$_[0]] - min(@rddt[$_[0]-5 .. $_[0]-1]); }, # ... );
    and the call becomes
    $cv = sprintf "%0.2f", $ccond[$cond]->($i);

    That way, rather than rippling through the entire if/elsif cascade every time, the correct code block is selected in constant time. An analogous change applies to the other function.

    Obviously, this approach will be much harder to translate into C. As you can see, properly Perlish code would also have been drastically shorter than your offering.

    Will those practices let Perl beat the C version? Not likely. However, I'm fairly confident that given a capable Perl programmer, resorting to C will only be required very rarely. (And note that the min and sum functions from List::Util I used here are written in C. So in a way, you have outsourced your C rewriting to CPAN authors - not a bad deal IMO.)

    Makeshifts last the longest.

Re: Re: Confirming what we already knew
by revdiablo (Prior) on Mar 06, 2003 at 00:03 UTC

    Now that we have your code, is there any chance we can get some sample data? :) I'd really like to play around with this, just for fun more than anything. Not trying to be pushy, just curious.

    PS: just a few lines of example data will suffice. I'd just like to see what form the data is in, then I can synthesize some on my own. It wouldn't exactly be representational of your data, but it'd be something to play with.

      in the end it is all stock data - so the @rddt array is populated with dollar amounts that range in a way that stock prices do. say 20.00 to 30.00 over some arbitrary range - 1200 trading days in this one.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://240717]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-25 13:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found