in reply to Syntax Perl Version support $c = () = $a =~ /\./g
$c = () = $a =~ /\./g
That counts the number of the characters '.' in the $a string.
A more efficient way to do that is:
$c = $a =~ tr/.//
But note that while the first verion will work with strings of any length the second verion will only work with characters.
Re^2: Syntax Perl Version support $c = () = $a =~ /\./g
by eyepopslikeamosquito (Archbishop) on Jul 17, 2018 at 22:40 UTC
|
| [reply] |
Re^2: Syntax Perl Version support $c = () = $a =~ /\./g
by h2 (Beadle) on Jul 17, 2018 at 22:02 UTC
|
jwkrahn, I had to test this, and lo, as you said, tr/.// is about 80% or so faster. I mean, of course, I'd have to run it thousands of times, but there it is, much faster. Thanks for that tip. I'm not clear on what you meant by this:
But note that while the first version will work with strings of any length the second version will only work with characters.
The data is a string of varying lengths, and usually is a decimal number.
| [reply] |
|
tr/// (sometimes spelled y///, especially in code golf) works on characters specifically. m// (sometimes spelled //) works on regular expression matches, which may concern one or more characters (or in special cases zero, such as split //, $foo;).
Perl takes text very seriously. There is a load to know about processing text in Perl, but the basics are pretty quick to grasp. The full story is not complete without at least these manual pages, although for this specific topic the first few should suffice.:
You might hope you never need to read perlebcdic, but there's that too.
| [reply] [d/l] [select] |
|
tr/// (sometimes spelled y///
Hm, I have been using perl for something like 20 years and even though I have never (as far as I can remember) used tr at all I always knew it existed and what it does.
"tr" must be short for "translate", but today I learn there is even another version of it - you can trade "tr" for "y" - well who would have guessed.
As I am (a bit) interested in perl-arcana:
Does anyone know where this "y" comes from?
Is is a reference to another language who's users larry wanted to feel at home?
Can anyone please enlighten me?
Many thanks!
| [reply] |
|
4x faster now!
by h2 (Beadle) on Jul 18, 2018 at 19:04 UTC
|
While testing this again with tr/ in mind, I decided to see how much faster it would be if I got rid of all the regex in the tests, and replaced them with tr/ with the numeric values and count, and ended up with a 4x (!) speed improvement over the pure regex sequence of tests.
I also discovered that there is no obvious difference between
(my $c = $_[0] =~ tr/.// ) <= 1
## and
($_[0] =~ tr/.// ) <= 1
which I assume means I stumbled onto another secret operator (the wrapping parentheses), which suggests to me that I should study these more to become more aware of this area of Perl. | [reply] [d/l] |
|
c:\@Work\Perl\monks>perl -wMstrict -le
"sub not_too_many_dots { return $_[0] =~ tr/.// <= 1; }
;;
for my $s ('', qw(. .. ... ....)) {
printf qq{'$s' %stoo many dots \n},
not_too_many_dots($s) ? 'NOT ' : ''
;
}
"
'' NOT too many dots
'.' NOT too many dots
'..' too many dots
'...' too many dots
'....' too many dots
See perlop (update: in particular Operator Precedence and Associativity). Of course, parenthetic grouping disambiguation never hurts, and many recommend it as a general BP to support readability/maintainability.
Update 1: I suspect the speedup you're seeing is due to operating directly upon an element of the aliased @_ subroutine argument array rather than burning the computrons needed to create lexical variables. See perlsub. (This would be in addition to using tr/// rather than m// for counting individual characters.)
Update 2: If you want to know what Perl thinks about the precedence and associativity of the operators you're using, use the O and B::Deparse modules. The -p flag produces full, explcit parenthetic grouping. (The useless assignments just produce some more grouping examples.)
c:\@Work\Perl\monks>perl -wMstrict -MO=Deparse,-p -le
"sub not_too_many_dots { return $_[0] =~ tr/.// <= 1; }
;;
for my $s ('', qw(. .. ... ....)) {
my $g = my $f = not_too_many_dots($s);
printf qq{'$s' %stoo many dots \n}, $f ? 'NOT ' : '';
print $g;
}
"
BEGIN { $^W = 1; }
BEGIN { $/ = "\n"; $\ = "\n"; }
sub not_too_many_dots {
use strict 'refs';
return((($_[0] =~ tr/.//) <= 1));
}
use strict 'refs';
foreach my($s) (('', ('.', '..', '...', '....'))) {
(my $g = (my $f = not_too_many_dots($s)));
printf("'${s}' %stoo many dots \n", ($f ? 'NOT ' : ''));
print($g);
}
-e syntax OK
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
AnomalousMonk, re the speed up, I test using a loop, and to make it apples to apples, one version uses only regex to determine the numeric quality, and the other uses only tr/ + length.
AT 10k iterations, for a full valid number, the regex only version takes about 0.02 seconds, and the tr length version takes about 0.008 seconds. So i have to take back the 4x faster for tr y length, it's about 2.5x faster, because I had to add in a few more conditions to handle the length tests that failed for pre perl 5.012. Neither version is assigning any values to a variable. Since most things being tested are valid, this speed up is significant, albeit trivial in the larger sense.
That's for a full successful match, the times are of course less if it fails, though the order of the sequence of tests matters. I have bunch of variants of the below, but these roughly are apples to apples. Note that if I could have used only length($_[0]) instead of defined and length, that knocks a noticeable percent off the total, but as I learned, that test only became possible in Perl 5.012.
return 1 if (defined $_[0] && $_[0] =~ /^[\d\.]+$/ && $_[0] =~ /\d/ &&
+ ( () = $_[0] =~ /\./g ) <= 1);
return 1 if ( defined $_[0] && length($_[0]) == ($_[0] =~ tr/012345678
+9.//) && ( $_[0] =~ tr/0123456789//) >= 1 && ($_[0] =~ tr/.//) <= 1);
Generally the things I spend time testing and optimizing are core methods and tools that might actually knock milliseconds off execution time, which is something not super visible on new hardware, but quite noticeable on very old systems, low powered ARM devices, and so on. But I also like finding ways that are simply faster and more efficient in general, since the little things add up.
In both cases however the tests are much better than what I was using, since that allows non numeric numbers (/^[\d\.]+$/) like ..4.3. so both are improvements, but the tr version is roughly as fast as the original single but inaccurate regex, and is accurate. Thanks for the pointer to tr/ I would not have thought that ended up returning how many things it had found.
It was not so much the precedence that I discovered, but the fact that these tests actually also return a count of how many instances were replaced, for tr, or returned to fill an array that could then be counted as the total result of the statement that was something I was only faintly aware of as something that Perl does. I had used this feature without thinking much about it with things like if ($a = returns_something_or_nothing())... but I hadn't actually thought of that as a general principle that can be used for other things.
| [reply] [d/l] [select] |
|
|
G'day h2,
"... and ended up with a 4x (!) speed improvement ..."
You might like to take a look at
"perlperf - Perl Performance and Optimization Techniques"
which discusses this, amongst other things, and includes benchmarks.
While y/// and s/// are not always interchangeable,
when they can provide the same functionality,
I've generally found y/// to be measurably faster than s///.
| [reply] [d/l] [select] |
|
I was lucky and discovered NTYProf quite early, so along with loop testing, I was able to get rid of major bottlenecks during development, which was and is kind of excellent. There's some optimization tricks I was not aware of in that perlperf page so those should help too, I've been using microtimers in loops which achieve the same result but I'll check out some of the other optimization tools, thanks. As noted above, sadly, the improvements once I made both test versions fully apples to apples turned out to be roughly 'only' 2.5x faster for tr and length vs regex. But equally obviously, anything that results in that big of a difference is worth understanding better, since usually you hope for 5, 10% improvements, not 250%.
| [reply] |
|
|
|