That's not a working regex. The best ones work something like this:
$number=1234567; # with commas, should be "1,234,567"
$number =~ s/(\d)(?=(\d{3})+(\D|$))/$1\,/g;
Notice the missing inner repetition in the previous post? And I'm doing this with lookahead, so I can
scan from left to right. The "1 while ..." solutions scan effectively from right
to left, so they may be slower. Actually, I'd be interested in the various benchmarks
on these. {grin}
-- Randal L. Schwartz, Perl hacker | [reply] [d/l] |
great regex! me too have forgoten about the look-ahead assertion :(
my function that does number beautifying has no less than 8 lines :((
still.. there's a little problem when handling floats:
the digits after the dot shouldn't be 'beautified' :(
.. i tried to enhance a little your line but the problem still remains because of the fixed-width look-behind:
$number =~ s/(?<!\.\d)(\d)(?=(\d{3})+(\D|$))/$1,/g;
(this example works for numbers with 5 digits after the dot... variations may me done by modifying the 'quantity' of \d from the look-behind assertion)
couldn't think at anything better now.. maybe you have another bright idea for this too o=)
so, after inserting your wizcraft, my lame tool looks something like this:
$number =~ s/(\d+)(\.\d+)?/bea_int($1).$2/eg;
sub bea_int {
my $kk = $_[0];
$kk =~ s/(\d)(?=(\d{3})+(\D|$))/$1\,/g;
return $kk;
}
--
AltBlue... w8ing 4 a better solution o=Q | [reply] [d/l] [select] |
The easy way to deal with open ended floating pointed nums is to simply remove the floating part until the commas are added, then reattach.
$integer =~ s/(.*)(\.\d\d+)$/$1/;
$float = $2;
# do stuff
$integer .= $float;
| [reply] [d/l] |
This regex continues adding commas after a decimal point if there are at least 4 decimal places. It also does repeated lookahead checks for multiples of 3 digits. I came up with a better regex using a single lookahead for the multiple of 3 digits and using \G to stay in sync thereafter, and not continuing past a decimal point. Here's what I ended up with:
$number =~ s/(^\D*\d{1,3}(?=(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g;
For example, "12345678.9012345" becomes "12,345,678.9012345" instead of "12,345,678.9,012,345". Of course, after I came up with this, I saw the current perlfaq5 and saw an extremely similar example there from Benjamin Goldberg:
s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g;
Oh well, I guess it wasn't a new idea after all! His version adds the (?>...) non-backtracking construct, which is definitely an improvement over mine. He only matched a sign at the beginning, which is reasonable. I had [+-]? at one point, but it occurred to me that a monetary value might already have a dollar sign prefixed to the number, so I used \D* instead. I think it's a wash to use \d+? instead of \d{1,3} as I had -- it prefers checking for 1 digit first instead of 3 digits, but there's no way to know which of the 3 will match. If I changed mine to \d{1,3}? then it would be functionally equivalent. | [reply] [d/l] [select] |
Thanks for the insight. I'm ignorant of the secrets of lookahead,
lookbehind, inner repetitions, and all the other associated
voodoo with things like this, so while this regex certainly
works, the breakdown of what it all means is a mystery to
me. Would "Mastering Regular Expressions" be a good
teacher for this sort of thing?
Thanks, merlyn (++), and everybody else for their help and insight.
| [reply] |
The currently available Mastering Regular Expressions doesn't cover any of the really cool Perl 5 regex stuff. Jeffrey Friedl is in the process of rewriting
the book for a second edition, and has been working with the Perl developers to
uncover inconsistencies in the implementation (what normal people would
call "bugs" {grin}) and gaps in the documentation.
Don't hold your breath though. I know this effort will probably take one to two
years of nights and weekends. "Been there Done that" x $n
-- Randal L. Schwartz, Perl hacker
| [reply] |
sub commify {
local($_) = shift;
1 while s/^(-?\d+)(\d{3})/$1,$2/;
return $_;
}
Cheers,
KM | [reply] [d/l] |
use strict;
use Benchmark;
for ( 1..5 ) # Do five tests.
{
$_ = int( rand(10_000) ) ** int( rand(3) + 2 );
print $_, "\n";
timethese( 1_000_000, {
'KM' => sub { 1 while s/^(-?\d+)(\d{3})/$1,$2/ },
'Merlyn' => sub { s/(\d)(?=(\d{3})+(\D|$))/$1\,/g }
});
print "\n", "- " x 39, "-\n";
}
Results:
82755409
Benchmark: timing 1000000 iterations of KM, Merlyn...
KM: 1 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 958772.77/s
(n=1000000)
Merlyn: 1 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 2036659.88/s
(n=1000000)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
293198635825936
Benchmark: timing 1000000 iterations of KM, Merlyn...
KM: 1 wallclock secs ( 1.05 usr + 0.00 sys = 1.05 CPU) @ 949667.62/s
(n=1000000)
Merlyn: 0 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 2036659.88/s
(n=1000000)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
602425897921
Benchmark: timing 1000000 iterations of KM, Merlyn...
KM: 0 wallclock secs ( 1.05 usr + 0.00 sys = 1.05 CPU) @ 949667.62/s
(n=1000000)
Merlyn: 0 wallclock secs ( 0.47 usr + 0.00 sys = 0.47 CPU) @ 2123142.25/s
(n=1000000)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.80935247108226e+015
Benchmark: timing 1000000 iterations of KM, Merlyn...
KM: 1 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 958772.77/s
(n=1000000)
Merlyn: 1 wallclock secs ( 0.46 usr + 0.00 sys = 0.46 CPU) @ 2169197.40/s
(n=1000000)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
106294343553
Benchmark: timing 1000000 iterations of KM, Merlyn...
KM: 1 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 958772.77/s
(n=1000000)
Merlyn: 1 wallclock secs ( 0.48 usr + 0.00 sys = 0.48 CPU) @ 2083333.33/s
(n=1000000)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| [reply] [d/l] |
Nice, but the real benchmark is testing them inside much larger strings:
"12123123 sadljaskjdfl skadj flkasjdf lksadjf klsadjfasdk 12718237192
+378"
since both regex were designed to work with large strings of sequences of digits in various places within the string.
-- Randal L. Schwartz, Perl hacker | [reply] [d/l] |
For you merlyn, I made it run on a string. Plus, I ran it on a faster machine since mine is busy.
use strict;
use Benchmark;
for ( 1..5 ) # Do five tests.
{
$_ = int( rand(10_000) ) ** int( rand(3) + 2 );
$_ = "For $_ Merlyn " . reverse($_) . " plus the constants ".
"8634641234541275032000523 and 8,634,641,234,541,275,032,000,
+523";
print $_, "\n";
timethese( 1_000_000, {
'KM' => sub { 1 while s/^(-?\d+)(\d{3})/$1,$2/ },
'Merlyn' => sub {s/(\d)(?=(\d{3})+(\D|$))/$1\,/g}
});
print "\n", "- " x 39, "-\n";
}
Output:
For 2699449 Merlyn 9449962 plus the constants 8634641234541275032000523 and 8,63
4,641,234,541,275,032,000,523
Benchmark: timing 1000000 iterations of KM, Merlyn...
KM: 1 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 960614.79/s
(n=1000000)
Merlyn: 0 wallclock secs ( 0.51 usr + 0.00 sys = 0.51 CPU) @ 1956947.16/s
(n=1000000)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
For 78836641 Merlyn 14663887 plus the constants 8634641234541275032000523 and 8,
634,641,234,541,275,032,000,523
Benchmark: timing 1000000 iterations of KM, Merlyn...
KM: 1 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ 942507.07/s
(n=1000000)
Merlyn: 0 wallclock secs ( 0.58 usr + 0.00 sys = 0.58 CPU) @ 1721170.40/s
(n=1000000)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
For 126128378375 Merlyn 573873821621 plus the constants 863464123454127503200052
3 and 8,634,641,234,541,275,032,000,523
Benchmark: timing 1000000 iterations of KM, Merlyn...
KM: 2 wallclock secs ( 1.02 usr + 0.00 sys = 1.02 CPU) @ 979431.93/s
(n=1000000)
Merlyn: 1 wallclock secs ( 0.53 usr + 0.00 sys = 0.53 CPU) @ 1883239.17/s
(n=1000000)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
For 8665653464 Merlyn 4643565668 plus the constants 8634641234541275032000523 an
d 8,634,641,234,541,275,032,000,523
Benchmark: timing 1000000 iterations of KM, Merlyn...
KM: 2 wallclock secs ( 1.12 usr + 0.00 sys = 1.12 CPU) @ 891265.60/s
(n=1000000)
Merlyn: 0 wallclock secs ( 0.53 usr + 0.00 sys = 0.53 CPU) @ 1886792.45/s
(n=1000000)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
For 36300625 Merlyn 52600363 plus the constants 8634641234541275032000523 and 8,
634,641,234,541,275,032,000,523
Benchmark: timing 1000000 iterations of KM, Merlyn...
KM: 2 wallclock secs ( 0.98 usr + 0.00 sys = 0.98 CPU) @ 1018329.94/s
(n=1000000)
Merlyn: 0 wallclock secs ( 0.54 usr + 0.00 sys = 0.54 CPU) @ 1848428.84/s
(n=1000000)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| [reply] [d/l] |
Out of Andrew Johnson's Elements of Perl Programming:
s/(\d{1,3})
(?=
(?:\d\d\d)+
(?!\d)
)
/$1,/gx;
| [reply] [d/l] |
A more recent addition to perlfaq5:
This regex from Benjamin Goldberg will add commas to numbers:
s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g;
It is easier to see with comments:
s/(
^[-+]? # beginning of number.
\d+? # first digits before first comma
(?= # followed by, (but not included in the match) :
(?>(?:\d{3})+) # some positive multiple of three digits.
(?!\d) # an *exact* multiple, not x * 3 + 1 or whatever.
)
| # or:
\G\d{3} # after the last group, get three digits
(?=\d) # but they have to have more digits after them.
)/$1,/xg;
| [reply] [d/l] [select] |