declaring lexical variables in shortest scope: performance?

bliako has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: declaring lexical variables in shortest scope: performance? by tobyink (Canon) on Mar 31, 2020 at 10:54 UTC
As well as what others have said, remember that Perl does have optimizations in place for common idioms. For example `my @things = ( ... ); # some list, whatever @things = sort @things;` [download] You may think that `sort` gets passed `@things`, then returns the sorted list of things, and that gets assigned back to the `@things` array as a list assignment. But you'd be wrong. Perl notices that you're sorting an array and assigning it back to itself, and uses an optimized code path that doesn't involve having to build a new list and do list assignment; it does an in-place sort. Common idioms do get optimized for when possible, so there are benefits to sticking with them. With `for my $var (...) {...}`, Perl knows that `$var` won't be leaking outside the body of the loop, so can at least potentially optimize based on that. toby döt ink	[reply] [d/l] [select]
Re: declaring lexical variables in shortest scope: performance? by haukex (Archbishop) on Mar 31, 2020 at 10:38 UTC
At least on my 5.28, the difference is negligible: `use warnings; use strict; use Benchmark 'cmpthese'; cmpthese(-2, { predecl => sub { my $y; my $x; for $x (1,2,3) { $y+=$x } }, lexical => sub { my $y; for my $x (1,2,3) { $y+=$x } }, }); __END__ Rate predecl lexical predecl 9175035/s -- -1% lexical 9275893/s 1% --` [download] But wakes in me primordial fears Yes, I know the feeling well. But my philosophy has become: first, code so that it works, avoiding only the really obvious performance mistakes (like scanning an array instead of using a hash and the like). Then, if it's fast enough for your puproses, you're done. But if you want to optimize, remember that optimization is a science: measure the performance, identify the hotspots, benchmark the alternatives, modify the code accordingly, measure the difference in performance, and repeat until the performance becomes good enough for your purposes.	[reply] [d/l]
Re^2: declaring lexical variables in shortest scope: performance? by bliako (Monsignor) on Mar 31, 2020 at 11:05 UTC
Thanks for the compare script. Indeed the lexical is a bit faster (probably what tobyink said about internal optimisations) and is confirmed by choroba but when I declare a lexical variable inside the loop `for my $x (1,2,3) {my $z=12; $y+=$x }` in predecl, it's 50% slower :( I keep what you said about optimisation is a science	[reply] [d/l]
Re^3: declaring lexical variables in shortest scope: performance? by choroba (Cardinal) on Mar 31, 2020 at 11:27 UTC
> it's 50% slower Do you mean you added the `my $z=12;` to both the subroutines and the lexical one became 50% slower? I can't reproduce that behaviour. Adding it to only one of the subs slows it (35% in my case), but then we are comparing apples and oranges. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^4: declaring lexical variables in shortest scope: performance? by bliako (Monsignor) on Mar 31, 2020 at 11:58 UTC
Re^5: declaring lexical variables in shortest scope: performance? by LanX (Saint) on Mar 31, 2020 at 12:00 UTC
Some notes below your chosen depth have not been shown here
Re^2: declaring lexical variables in shortest scope: performance? by Anonymous Monk on Mar 31, 2020 at 11:52 UTC
Maybe it is negligible but that is Benchmark pitfall, overhead drowns out what you're measuring, in line sub contents as strings not sub calls	[reply]
Re^3: declaring lexical variables in shortest scope: performance? by haukex (Archbishop) on Mar 31, 2020 at 12:08 UTC
overhead drowns out what you're measuring, in line sub contents as strings not sub calls Can you show the code that demonstrates this?	[reply]
Re: declaring lexical variables in shortest scope: performance? by choroba (Cardinal) on Mar 31, 2020 at 10:40 UTC
As usually, when you are interested in performance, benchmark or profile. `#! /usr/bin/perl use warnings; use strict; use Benchmark qw{ cmpthese }; sub outer { my $s = 0; my $x; for $x (1 .. 5) { $s += $x; } $s } sub inner { my $s = 0; for my $x (1 .. 5) { $s += $x; } $s } outer() == inner() or die 'Error in implementation'; cmpthese(-2, { outer => 'outer()', inner => 'inner()', });` [download] The result on my machine shows inner is about 6% faster. Such a small difference is insignificant and usually has no real impact on real performance. Why is that? Remember that for localises its variable when it's not lexical, i.e. it has to store its previous value before entering the loop and restore it at the loop's end. It seems to take a bit more time than just creating a fresh new lexical variable. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^2: declaring lexical variables in shortest scope: performance? by Anonymous Monk on Mar 31, 2020 at 11:50 UTC
Benchmark pitfall, overhead drowns out what you're measuring, in line sub contents as strings not sub calls	[reply]
Re^3: declaring lexical variables in shortest scope: performance? by bliako (Monsignor) on Mar 31, 2020 at 12:09 UTC
I didn't know that! Is the logic behind replacing the sub with a string expression, to fool the cache? `use Benchmark 'cmpthese'; cmpthese(-2, { predecl => ' my $y; my $x; for $x (1..10000) { $y+=$x } ', lexical => ' my $y; for my $x (1..10000) { $y+=$x } ', }); __END__` [download] `Rate lexical predecl lexical 3996/s -- -0% predecl 4001/s 0% --` [download] vs `use Benchmark 'cmpthese'; cmpthese(-2, { predecl => sub { my $y; my $x; for $x (1..10000) { $y+=$x } }, lexical => sub { my $y; for my $x (1..10000) { $y+=$x } }, });` [download] `Rate predecl lexical predecl 4011/s -- -0% lexical 4015/s 0% --` [download] Which is more or less what haukex and choroba demonstrated.	[reply] [d/l] [select]
Re: declaring lexical variables in shortest scope: performance? by LanX (Saint) on Mar 31, 2020 at 11:38 UTC
In general: Declaration happens at compile time which should be neglectable. Assignment should be the same. (Or even faster°) The only overhead could be destruction or resetting at end of scope. Your example with a `for` loop is a bit unfortunate, because aliasing is complicating things considerably. Together with various optimizations performance gains or loses should be unpredictable, especially between different versions of Perl. So benchmark it, I don't think it's worth the effort. °) BTW, for many years it was commonplace that private variables are faster, not sure if accessing package variables has been optimized in the mean time. ... Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re: declaring lexical variables in shortest scope: performance? by GrandFather (Saint) on Mar 31, 2020 at 20:51 UTC
In C/C++ the equivalent outer/inner code would perform identically. The compiler allocates space for all the local variables that might be needed on the stack on entry to the sub, usually by effectively incrementing the stack pointer. The "overhead" to create space for the local variables is typically the execution time of one processor instruction on each entry to the sub. Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond	[reply]
Re: declaring lexical variables in shortest scope: performance? (on Code Optimization and Performance References) by eyepopslikeamosquito (Archbishop) on Apr 01, 2020 at 07:56 UTC
Don't diddle code to make it faster -- find a better algorithm -- The Elements of Programming Style Don’t Optimize Code -- Benchmark It -- from Ten Essential Development Practices by Damian Conway It's important to be realistic: most people don't care about program performance most of the time -- The Computer Language Benchmarks Game The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming -- Donald Knuth Don’t pessimize prematurely. All other things being equal, notably code complexity and readability, certain efficient design patterns and coding idioms should just flow naturally from your fingertips and are no harder to write than the pessimized alternatives. This is not premature optimization; it is avoiding gratuitous pessimization. -- Andrei Alexandrescu and Herb Sutter Rule 1: Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is. Rule 2: Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest. Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.) Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures. Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming. Note: Pike's rules 1 and 2 restate Tony Hoare's "Premature optimization is the root of all evil". Ken Thompson rephrased Pike's rules 3 and 4 as "When in doubt, use brute force". Rules 3 and 4 are instances of KISS. Rule 5 was stated by Fred Brooks in The Mythical Man-Month and is often shortened to "write stupid code that uses smart objects" (see also data structures vs code). -- Rob Pike Without good design, good algorithms, and complete understanding of the program's operation, your carefully optimized code will amount to one of mankind's least fruitful creations -- a fast slow program. -- Michael Abrash A couple of related general guidelines from On Coding Standards and Code Reviews: Correctness, simplicity and clarity come first. Avoid unnecessary cleverness. If you must rely on cleverness, encapsulate and comment it. Don't optimize prematurely. Benchmark before you optimize. Comment why you are optimizing. On Interfaces and APIs cautions that library interfaces are very difficult to change once they become widely used - a fundamentally inefficient interface cannot be easily fixed later by optimizing. So it is not "premature optimization" to consider efficiency when designing public library interfaces. See Also Data Structures vs Code : Some quotes from famous programmers Re: Threads or no Threads (Threading, Forking, Signals, Event Loop and Concurrency References) The 1021 Problem (Part I) : complex problem where the running time was reduced from 50 million years to one year via a long series of optimizations The 1021 Problem (Part 2) The 1021 Problem (Part 3) The 1021 Problem (Part 4) Re^2: More Betterer Game of Life : reduced running time from 1635 seconds to 17 seconds ... where tweaking the code, via a long series of micro-optimizations, reduced the running time from 1635 secs to 450 secs (3.6 times faster), while finding a better algorithm reduced it from 450 secs to 17 secs (26.5 times faster) Re^2: What's Perl good at or better than Python (Game of Life, LLiL, Rosetta and Performance References) : The C++ version of the simple GoL algorithm was 450/36 = 12.5 times faster than the Perl version; for the complex algorithm C++ was 17/0.08 = 212.5 times faster; C++ memory use was 2.8 times lower than Perl for the simple algorithm, 10.1 times lower for the complex one Re^3: Advice on learning Perl and graphics (Static vs Dynamic Typing and JIT) : static vs dynamic typing (static typing usually results in compiled code that executes faster); Java vs C++ These experiences convinced me of don't assume measure and especially find a better algorithm! Perl Performance References perlperf : Perl Performance and Optimization Techniques perlperf - PROFILING TOOLS perlfaq: How do I profile my Perl programs? Devel::NYTProf : Fantastic Perl code profiler, can be used as a line profiler or block/subroutine profiler or both Performance Profiling with Devel::NYTProf : talk by Tim Bunce (youtube) Re: Windows Perl with sqlite3 - kcott warns of many failures when installing `Devel::NYTProf` on MSWin Benchmark : Perl core benchmarking module Memoize : Make functions faster by trading space for time MCE by marioroy - Many-Core Engine for Perl providing parallel processing capabilities High Performance and Parallel Computing References List of performance analysis tools (wikipedia) Parallel computing (wikipedia) Parallel programming model (wikipedia) Concurrency (computer science) (wikipedia) OpenMP (wikipedia) OpenMP Little Book Threading Building Blocks (wikipedia) - aka Intel TBB List of C++ multi-threading libraries (wikipedia) VTune - Intel VTune, part of the Intel oneAPI Base Toolkit Intel oneAPI - a unified API used across different computing accelerator (coprocessor) architectures, including GPUs, AI accelerators and field-programmable gate arrays Bank Queuing Model (Chunking) by marioroy (MCE chunking "attribute" came in a flash while standing in line at the bank) Bank Queuing Model (Chunk ID) (The chunk_id value increments by one ... data is read serially one chunk at a time) MCE Sandbox 2023-08 by marioroy - deliberately designed for necroposts (added in replies as he learns new things) [OT] The Long List is Long resurrected by marioroy (2024) (mentions Taichi Lang, a DSL embedded in Python) High Performance Parallel Runtimes: Design and Implementation (Book) Processing a File with OpenMP by Jim Cownie (author of High Performance Parallel Runtimes book) -- see comments there from marioroy :) Grep RE2 C++ OpenMP demonstration by marioroy using RE2 google RE2 Regex Library - uses Abseil, a lot faster than `std::regex` parallel runtimes (github) High Performance and Parallel Computing (Illinois Tech) Extra Performance/Optimization References Added Later Bitwise Operations: Most Significant Set Bit by coldr3ality (2024) - good replies from hippo, hv, davido ... Re: Most Significant Set Bit (Bit Twiddling References) Benchmark: Fastest way to lookup a point in a set : example using Benchmark to compare different ways of looking up a point in a set Re: Confused by RegEx count by choroba (2024) : example Benchmarking transliteration `tr///` vs substitution `s///` Re^3: looping efficiency (Benchmark Example) : example using Benchmark (replies mention Devel::NYTProf and `perl -MO=Terse` using B::Terse) Other: Re^4: Fastest way to lookup a point in a set : BrowserUk use case where Perl hashes were an order of magnitude faster than an SQLite memory-based database Tie::Hash::DBD - CPAN module to tie a plain hash to a database table, by Tux Re: need help with judy array searching (Judy Array References) : references on using Judy arrays in Perl Memory efficient way to deal with really large arrays? by sectokia (2020, similar to my Fastest way to lookup a point in a set) Small Hash a Gateway to Large Hash? by lsherwood (2014, is building a small hash based on results of accessing a large hash likely to help speed up my script?) write hash to disk after memory limit by hailholyghost (2015, 8 GB physical memory, script uses 17 GB, wants to free memory to OS to minimize swapping) Big cache by Liebranca (2022, seeks general advice on speeding up local databases on local computer ... to avoid reading a million small files on startup) Re: Data structures in Perl. A C programmer's perspective. (vector vs linked list performance) (C++ vector vs linked list - the penalty for cache misses tends to dominate CPU usage time with modern CPUs) Re^3: [OT:] Is this Curriculum right? (more on C++ vector vs linked list) How to do popcount (aka Hamming weight) in Perl (popcount References) by me (2017) Re^6: Hash versus chain of elsifs by me (2021) Re: Perl 5 Optimizing Compiler by chromatic (2012) making a loop script with a remote URL call faster by brandonm78 (2022) (plus necropost reply by marioroy) Trading compile time for faster runtime? by melez (2022) Re: Trading compile time for faster runtime? by dave_the_m (2022) Optimization tips by sroux (2022) (asks for optimization tips to speed up some awful old code without having to rewrite it) Need to speed up many regex substitutions and somehow make them a here-doc list by xnous (2022) (performance of bulk regex substitutions in bash/sed vs perl) Perl regex speed by malaigo (2022) (uni prof asks: why isn't latest Apple Silicon implementation faster than Intel one on Perl regex? (assuming it is arm64 and not emulated x86_64)) Windows Perl with sqlite3 by miner7777 (2023) (root cause was a corrupted Database file) does a hash element as a loop parameter cause significant slowdown? by misterperl (2023) (ChatGPT makes a dubious performance suggestion to speed up `for my $c (1..$n1)`) Optimizing with Caching vs. Parallelizing (MCE::Map) by nickt (2020) (with MCE-related replies from marioroy) MCE Sandbox 2023-08 by marioroy (2023) (writing fast code using: Perl MCE + Inline::C, Math::Prime::Util, C/C++ libprimesieve, Codon) Perl slower than java by Christian888 (2010) Does "preallocating hash improve performance"? Or "using a hash slice"? by vr (2017) HPC Computing Question by doubleqq (2014) Anyway to Have Strong-Like Typing by jmmitc06 (2014) what's faster than .= by xafwodahs (2003) - with responses from TimToady From BrowserUk (2012-2015): Bidirectional lookup algorithm? (Updated: further info.) [OT] The statistics of hashing. Heap structure for lookup? Re^6: Heap structure for lookup? How to find out, why my perl code is slow. by anonymonk (2018) Does perl read the entire pl file into memory? by harangzsolt33 (2019) Two spookily similar nodes posted late 2021 (both requesting XS C code, both by new monks who won't show us their code): Can someone please write a working JSON module by cnd Anyone with XS experience willing to create a high performance data type for Perl? by beautyfulman (unfortunately this node was later vandalised by beautyfulman who left in a huff) Some old classics: How can I speed up my Perl program? (SO 2008) Nicholas Clark classic talk: When perl is not quite fast enough (has the PDF of his talk slides vanished from the web?) Compile perl for performance by learnedbyerror (2018) - replies could not find Nicholas Clark talk slides either :-( Optimize Perl by anon (2004) Optimizing existing Perl code (in practise) by JaWi (2002) STOP Trading Memory for Speed by PetaMem (2002) Wasting time thinking about wasted time by brian_d_foy (2004) Other: can we call c++ in perl to process PDL arrays? by toothedsword (2019) Performant Path Iteration by learnedbyerror (2018) On CPAN: PerlBench by brian_d_foy Dumbbench by brian_d_foy Some external references: Why not Translate Perl to C? by MJD booking.com blog about making perlguts faster by Eric Herman Mathematical: Computing pi to multiple precision by ambrus (2012) - see also much later reply by marioroy (2022) Memory: Re: Perl Memory problem ... (Memory Tools References) Sorting: Re^5: Create sort function from a text file (Sorting References: Schwartzian, GRT, Orcish, External, Parallel) Multi-threading: Anything by marioroy Rosetta Code: Long List is Long (long thread: check especially the many replies from marioroy) Re: Rosetta Test: Long List is Long - Abseil Re: Rosetta Code: Long List is Long - JudySL code by marioroy (2023) - JudySL array implementation Solving the Long List is Long challenge, finally? by marioroy (2023) - Perl versions in replies use Sort::Packed, Tie::Hash::DBD, DB_File, IPC::MMA, Crypt::xxHash, Tokyo Cabinet, Kyoto Cabinet, Tkrzw sharding, Tkrzw llil4tkh, Tkrzw llil4tkh2 Compiler switches/flags: Re: Windows precompiled binaries or DIY compile by NERDVANA (2023) - gcc compiler flags for building fast Perl, e.g. `CFLAGS="-O2 -march=native -mtune=native"` I/O: The 1021 Problem (Part 3) Risque Romantic Rosetta Roman Race PDL and Array Processing References See: Re^2: Organizational Culture (Part II): Meta Process (BioPerl/PDL/AI/Embedded/Data Science References) RPerl References rperl.org RPerl (Facebook) RPerl About using rperl How to compile RPerl successfully? Perl 5 Optimizing Compiler Trading compile time for faster runtime? rurban Will_the_Chill See Also** Re: Threads or no Threads (Threading, Forking, Signals, Event Loop and Concurrency References) Updated: Added Donald Knuth premature optimization quote and Alexandrescu/Sutter premature pessimization quotes and Rob Pike quotes. Mentioned efficiency of interfaces. Added more references.	[reply] [d/l] [select]
Re^2: declaring lexical variables in shortest scope: performance? by bliako (Monsignor) on Apr 01, 2020 at 08:13 UTC
fine with all these but if there is a long loop, then every little flop counts.	[reply]
Re: declaring lexical variables in shortest scope: performance? by vr (Curate) on Mar 31, 2020 at 17:53 UTC
Isn't it the case that lexical loop iterator doesn't create a "lexical pad"? If a block, which creates scope, can be written so it doesn't have to create a "pad", then it's surely faster, no? `use strict; use warnings; use Benchmark 'cmpthese'; cmpthese -2, { 1 => sub { my ( $s, $x, $y ) = 0; # for ( 1 .. 1e6 ) { # doesn't matter, if written so for my $i ( 1 .. 1e6 ) { $x = rand; # suppose intermediate $y = rand; # variables are required $s += $x + $y; } $s }, 2 => sub { my ( $s, $i ) = 0; # for $i ( 1 .. 1e6 ) { # doesn't matter, neither for ( 1 .. 1e6 ) { my $x = rand; my $y = rand; $s += $x + $y; } $s }, }; __END__ Rate 2 1 2 6.69/s -- -42% 1 11.6/s 74% --` [download]	[reply] [d/l]
Re: declaring lexical variables in shortest scope: performance? by dsheroh (Monsignor) on Apr 01, 2020 at 07:46 UTC
Perl is not, never has been, and almost certainly never will be a "high performance" language. If you're really that concerned about saving every flop, your first move should not be to question where to put `my`, your first move should be to switch to a different language. As long as you avoid the big mistakes like using inefficient algorithms, optimizing for correct operation and readability is generally going to be more than sufficient. Micro-optimizing things like the placement of variable declarations is almost never worth the effort, regardless of language, because the time you spend doing the optimization will generally be many, many orders of magnitude larger than the handful of microseconds that will actually be saved by the code speedup.	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.


Clear questions and runnable code get the best and fastest answer
	PerlMonks