Re: looping efficiency

Replies are listed 'Best First'.
Re^2: looping efficiency by Anonymous Monk on Dec 30, 2020 at 01:35 UTC
Thanks (to everyone) for the replies. I actually work with sequentially numbered files (so I do stringize the number). I had seen claims that formatting strings reperesented a loss of efficiency (and the particular loops in my snippet appear relatively error-resistant), but of course, everything remains relative to possible alternatives.	[reply]
Re^3: looping efficiency (Benchmark Example) by eyepopslikeamosquito (Archbishop) on Dec 30, 2020 at 02:48 UTC
Don’t Optimize Code -- Benchmark It -- from Ten Essential Development Practices by Damian Conway The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming -- Donald Knuth Without good design, good algorithms, and complete understanding of the program's operation, your carefully optimized code will amount to one of mankind's least fruitful creations -- a fast slow program. -- Michael Abrash Wow, I assumed they were complaining about clarity rather than efficiency! :) For cheap thrills, assuming all you need are the file names 0000 .. 9999 in order, I benchmarked the three offered solutions as shown below. `use strict; use warnings; use Benchmark qw(timethese); sub orig { for my $i ( 0 .. 9 ) { for my $j ( 0 .. 9 ) { for my $k ( 0 .. 9 ) { for my $l ( 0 .. 9 ) { my $filename = "$i$j$k$l"; } } } } } sub yourmum { for ( 0 .. 9999 ) { my $filename = sprintf "%04d", $_; } } sub marshall { for ('0000' ... '9999') { my $filename = $_; } } orig(); yourmum(); marshall(); timethese 50000, { Orig => sub { orig() }, YourMum => sub { yourmum() }, Marshall => sub { marshall() }, };` [download] On my machine, the results were: `Benchmark: timing 50000 iterations of Marshall, Orig, YourMum... Marshall: 25 wallclock secs (25.16 usr + 0.00 sys = 25.16 CPU) @ 19 +87.52/s (n=50000) Orig: 39 wallclock secs (38.83 usr + 0.00 sys = 38.83 CPU) @ 12 +87.73/s (n=50000) YourMum: 40 wallclock secs (40.08 usr + 0.00 sys = 40.08 CPU) @ 12 +47.57/s (n=50000)` [download] If all you need are the file names (not the individual digits), it's no surprise that Marshall's suggestion was the fastest. I also think it is the simplest and clearest if all you need are the file names. Update: see also perlperf - Perl Performance and Optimization Techniques. Added this node to Performance References.	[reply] [d/l] [select]
Re^4: looping efficiency by LanX (Saint) on Dec 30, 2020 at 03:03 UTC
None of the methods shown will have more than a negligible impact compared to creating/accessing those files. Harddisks are extremely slow compared to CPU and RAM. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^5: looping efficiency by eyepopslikeamosquito (Archbishop) on Dec 30, 2020 at 03:36 UTC
Re^4: looping efficiency by afoken (Chancellor) on Dec 31, 2020 at 13:45 UTC
Don’t Optimize Code -- Benchmark It -- from Ten Essential Development Practices by Damian Conway The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming -- Donald Knuth No one has yet linked to Devel::NYTProf. It's a great tool to find out where your code really wastes CPU and time. Just run your script like this ... `perl -d NYTProf yourscript.pl foo bar baz` [download] ... followed by ... `nytprofhtml --open` [download] ... and look for the code that takes the longest time to run. That's where you want to start optimizing. Commit your current version to git/SVN/CVS/whatever, modify, run through NYTProf again until speed improves. Revert and retry if speed goes down. Commit, re-profile, opimize the next top problem from NYTProf. Stop when the code is sufficiently fast. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]
Re^5: looping efficiency (B::Terse OP codes) by eyepopslikeamosquito (Archbishop) on Dec 31, 2020 at 20:46 UTC
Re^4: looping efficiency by Bod (Parson) on Dec 30, 2020 at 12:01 UTC
For cheap thrills, assuming all you need are the file names 0000 .. 9999 in order, I benchmarked the three offered solutions as shown below. WoW!!! Isn't it amazing what random things can be learnt in the Monastery? I had no idea it is so easy to benchmark different ways of doing the same thing...although, thinking about it, it's pretty certain that someone would have thought about doing it and created a module to make it easy. That is one of the beauties of Perl - we are standing on the shoulders of giants (or at least, following the CPAN road).	[reply]
Re^4: looping efficiency by Anonymous Monk on Dec 30, 2020 at 03:51 UTC
Thanks especially for that. I really appreciated the benchmarking (and seeing how to benchmark). Thanks to Marshall, too. I have taken on all of the comments about efficiency, and agree that the exact circumstances deserve most of the weight when considering various alternatives (but I also don't want to "learn" bad techniques in the first place. :) ).	[reply]
Re^3: looping efficiency by LanX (Saint) on Dec 30, 2020 at 02:46 UTC
> I actually work with sequentially numbered files (so I do stringize the number) That's why Perl allows incrementing strings (not only numeric ones) `main::(-e:1): 0 DB<1> $_='0000' DB<2> p $_++ 0000 DB<3> p $_++ 0001 DB<4> p $_++ 0002 DB<5> $_='abcd' DB<6> p $_++ abcd DB<7> p $_++ abce DB<8> p $_++ abcf` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re^4: looping efficiency by Anonymous Monk on Dec 30, 2020 at 03:42 UTC
> That's why Perl allows incrementing strings ( not only numeric ones) I didn't know that (I obviously still fall into the neophyte category), but that appears as the best solution for this particular case. Thank you.	[reply]
Re^5: looping efficiency by Your Mother (Archbishop) on Dec 30, 2020 at 04:46 UTC
Re^6: looping efficiency by Anonymous Monk on Dec 30, 2020 at 13:30 UTC
Re^3: looping efficiency by Marshall (Canon) on Dec 30, 2020 at 05:22 UTC
I had no idea that you were going to generate 10,000 separate files! A single SQLite DB file is likely to be far, far more efficient for whatever you are doing.	[reply]
Re^4: looping efficiency by Anonymous Monk on Dec 30, 2020 at 06:36 UTC
I will have to check that out, but (superficially) I wouldn't think so. Each running of the entire script typically involves 2000-7000 files (not the entire 10,000), but the files range from 3 MB to over 5 Mb, and that would make a fairly massive database. The secondary processing of the files after writing takes advantage of the sequential numbering, and the operations involved don't really lend themselves to database lookups (and involve other external programs). I typically (in the second stage) process subsets 400 to 1000 of the files at a time just to keep the final output files (which involve combining the data in the original files) to a reasonable size.	[reply]
Re^5: looping efficiency by Marshall (Canon) on Dec 30, 2020 at 08:24 UTC
Re^6: looping efficiency by karlgoethebier (Abbot) on Dec 30, 2020 at 13:42 UTC
Some notes below your chosen depth have not been shown here
Re^6: looping efficiency by Anonymous Monk on Dec 30, 2020 at 13:13 UTC
Some notes below your chosen depth have not been shown here


Perl-Sensitive Sunglasses
	PerlMonks