http://qs321.pair.com?node_id=28331

Kozz has asked for the wisdom of the Perl Monks concerning the following question:

[disclaimer: this post is sure to demonstrate my ignorance of regular expressions & substitutions]

I'd already seen vroom's Q&A node about adding commas to a number, but it seemed there ought to be an easier way. I thought that a regexp like this one would do the trick:
$number=1234567; # with commas, should be "1,234,567" $number=~s/(\d)(\d{3})\b/$1\,$2/g;
However, this code will actually change $number to 1234,567. Despite the "g" on the end of the regexp, it still sort of works from beginning-to-end, so it only inserts the comma at the end.
Then I thought, "well, I could do this once for each comma" which would work like this:
while($number=~s/(\d)(\d{3})\b/$1\,$2/g){ # nothing here -- how silly is this? }
So then, obviously, the while loop continues as long as the substitution was successful. But is this terribly silly? Could my same code be modified slightly to work correctly with ONE simple substitution? Or am I simply better off using vroom's sub at the aforementioned node?

Replies are listed 'Best First'.
RE: regexp for adding commas to a number
by merlyn (Sage) on Aug 17, 2000 at 20:42 UTC
    That's not a working regex. The best ones work something like this:
    $number=1234567; # with commas, should be "1,234,567" $number =~ s/(\d)(?=(\d{3})+(\D|$))/$1\,/g;
    Notice the missing inner repetition in the previous post? And I'm doing this with lookahead, so I can scan from left to right. The "1 while ..." solutions scan effectively from right to left, so they may be slower. Actually, I'd be interested in the various benchmarks on these. {grin}

    -- Randal L. Schwartz, Perl hacker

      great regex! me too have forgoten about the look-ahead assertion :( my function that does number beautifying has no less than 8 lines :((
      still.. there's a little problem when handling floats: the digits after the dot shouldn't be 'beautified' :(
      .. i tried to enhance a little your line but the problem still remains because of the fixed-width look-behind:

      $number =~ s/(?<!\.\d)(\d)(?=(\d{3})+(\D|$))/$1,/g;
      (this example works for numbers with 5 digits after the dot... variations may me done by modifying the 'quantity' of \d from the look-behind assertion)

      couldn't think at anything better now.. maybe you have another bright idea for this too o=)

      so, after inserting your wizcraft, my lame tool looks something like this:

      $number =~ s/(\d+)(\.\d+)?/bea_int($1).$2/eg; sub bea_int { my $kk = $_[0]; $kk =~ s/(\d)(?=(\d{3})+(\D|$))/$1\,/g; return $kk; }

      --
      AltBlue... w8ing 4 a better solution o=Q

        The easy way to deal with open ended floating pointed nums is to simply remove the floating part until the commas are added, then reattach.
        $integer =~ s/(.*)(\.\d\d+)$/$1/; $float = $2; # do stuff $integer .= $float;
      This regex continues adding commas after a decimal point if there are at least 4 decimal places. It also does repeated lookahead checks for multiples of 3 digits. I came up with a better regex using a single lookahead for the multiple of 3 digits and using \G to stay in sync thereafter, and not continuing past a decimal point. Here's what I ended up with:
      $number =~ s/(^\D*\d{1,3}(?=(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g;
      For example, "12345678.9012345" becomes "12,345,678.9012345" instead of "12,345,678.9,012,345". Of course, after I came up with this, I saw the current perlfaq5 and saw an extremely similar example there from Benjamin Goldberg:
      s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g;
      Oh well, I guess it wasn't a new idea after all! His version adds the (?>...) non-backtracking construct, which is definitely an improvement over mine. He only matched a sign at the beginning, which is reasonable. I had [+-]? at one point, but it occurred to me that a monetary value might already have a dollar sign prefixed to the number, so I used \D* instead. I think it's a wash to use \d+? instead of \d{1,3} as I had -- it prefers checking for 1 digit first instead of 3 digits, but there's no way to know which of the 3 will match. If I changed mine to \d{1,3}? then it would be functionally equivalent.
      Thanks for the insight. I'm ignorant of the secrets of lookahead, lookbehind, inner repetitions, and all the other associated voodoo with things like this, so while this regex certainly works, the breakdown of what it all means is a mystery to me. Would "Mastering Regular Expressions" be a good teacher for this sort of thing?

      Thanks, merlyn (++), and everybody else for their help and insight.
        The currently available Mastering Regular Expressions doesn't cover any of the really cool Perl 5 regex stuff. Jeffrey Friedl is in the process of rewriting the book for a second edition, and has been working with the Perl developers to uncover inconsistencies in the implementation (what normal people would call "bugs" {grin}) and gaps in the documentation.

        Don't hold your breath though. I know this effort will probably take one to two years of nights and weekends. "Been there Done that" x $n

        -- Randal L. Schwartz, Perl hacker

RE: regexp for adding commas to a number
by KM (Priest) on Aug 17, 2000 at 20:40 UTC
    sub commify { local($_) = shift; 1 while s/^(-?\d+)(\d{3})/$1,$2/; return $_; }

    Cheers,
    KM

RE: regexp for adding commas to a number
by Adam (Vicar) on Aug 17, 2000 at 20:56 UTC
    One Benchmark:
    use strict; use Benchmark; for ( 1..5 ) # Do five tests. { $_ = int( rand(10_000) ) ** int( rand(3) + 2 ); print $_, "\n"; timethese( 1_000_000, { 'KM' => sub { 1 while s/^(-?\d+)(\d{3})/$1,$2/ }, 'Merlyn' => sub { s/(\d)(?=(\d{3})+(\D|$))/$1\,/g } }); print "\n", "- " x 39, "-\n"; }
    Results:
    82755409
    Benchmark: timing 1000000 iterations of KM, Merlyn...
            KM:  1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 958772.77/s
    (n=1000000)
        Merlyn:  1 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 2036659.88/s
     (n=1000000)
    
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    293198635825936
    Benchmark: timing 1000000 iterations of KM, Merlyn...
            KM:  1 wallclock secs ( 1.05 usr +  0.00 sys =  1.05 CPU) @ 949667.62/s
    (n=1000000)
        Merlyn:  0 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 2036659.88/s
     (n=1000000)
    
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    602425897921
    Benchmark: timing 1000000 iterations of KM, Merlyn...
            KM:  0 wallclock secs ( 1.05 usr +  0.00 sys =  1.05 CPU) @ 949667.62/s
    (n=1000000)
        Merlyn:  0 wallclock secs ( 0.47 usr +  0.00 sys =  0.47 CPU) @ 2123142.25/s
     (n=1000000)
    
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    1.80935247108226e+015
    Benchmark: timing 1000000 iterations of KM, Merlyn...
            KM:  1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 958772.77/s
    (n=1000000)
        Merlyn:  1 wallclock secs ( 0.46 usr +  0.00 sys =  0.46 CPU) @ 2169197.40/s
     (n=1000000)
    
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    106294343553
    Benchmark: timing 1000000 iterations of KM, Merlyn...
            KM:  1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 958772.77/s
    (n=1000000)
        Merlyn:  1 wallclock secs ( 0.48 usr +  0.00 sys =  0.48 CPU) @ 2083333.33/s
     (n=1000000)
    
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    
      Nice, but the real benchmark is testing them inside much larger strings:
      "12123123 sadljaskjdfl skadj flkasjdf lksadjf klsadjfasdk 12718237192 +378"
      since both regex were designed to work with large strings of sequences of digits in various places within the string.

      -- Randal L. Schwartz, Perl hacker

        For you merlyn, I made it run on a string. Plus, I ran it on a faster machine since mine is busy.
        use strict; use Benchmark; for ( 1..5 ) # Do five tests. { $_ = int( rand(10_000) ) ** int( rand(3) + 2 ); $_ = "For $_ Merlyn " . reverse($_) . " plus the constants ". "8634641234541275032000523 and 8,634,641,234,541,275,032,000, +523"; print $_, "\n"; timethese( 1_000_000, { 'KM' => sub { 1 while s/^(-?\d+)(\d{3})/$1,$2/ }, 'Merlyn' => sub {s/(\d)(?=(\d{3})+(\D|$))/$1\,/g} }); print "\n", "- " x 39, "-\n"; }
        Output:
        For 2699449 Merlyn 9449962 plus the constants 8634641234541275032000523 and 8,63
        4,641,234,541,275,032,000,523
        Benchmark: timing 1000000 iterations of KM, Merlyn...
                KM:  1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 960614.79/s
        (n=1000000)
            Merlyn:  0 wallclock secs ( 0.51 usr +  0.00 sys =  0.51 CPU) @ 1956947.16/s
         (n=1000000)
        
        - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        For 78836641 Merlyn 14663887 plus the constants 8634641234541275032000523 and 8,
        634,641,234,541,275,032,000,523
        Benchmark: timing 1000000 iterations of KM, Merlyn...
                KM:  1 wallclock secs ( 1.06 usr +  0.00 sys =  1.06 CPU) @ 942507.07/s
        (n=1000000)
            Merlyn:  0 wallclock secs ( 0.58 usr +  0.00 sys =  0.58 CPU) @ 1721170.40/s
         (n=1000000)
        
        - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        For 126128378375 Merlyn 573873821621 plus the constants 863464123454127503200052
        3 and 8,634,641,234,541,275,032,000,523
        Benchmark: timing 1000000 iterations of KM, Merlyn...
                KM:  2 wallclock secs ( 1.02 usr +  0.00 sys =  1.02 CPU) @ 979431.93/s
        (n=1000000)
            Merlyn:  1 wallclock secs ( 0.53 usr +  0.00 sys =  0.53 CPU) @ 1883239.17/s
         (n=1000000)
        
        - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        For 8665653464 Merlyn 4643565668 plus the constants 8634641234541275032000523 an
        d 8,634,641,234,541,275,032,000,523
        Benchmark: timing 1000000 iterations of KM, Merlyn...
                KM:  2 wallclock secs ( 1.12 usr +  0.00 sys =  1.12 CPU) @ 891265.60/s
        (n=1000000)
            Merlyn:  0 wallclock secs ( 0.53 usr +  0.00 sys =  0.53 CPU) @ 1886792.45/s
         (n=1000000)
        
        - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        For 36300625 Merlyn 52600363 plus the constants 8634641234541275032000523 and 8,
        634,641,234,541,275,032,000,523
        Benchmark: timing 1000000 iterations of KM, Merlyn...
                KM:  2 wallclock secs ( 0.98 usr +  0.00 sys =  0.98 CPU) @ 1018329.94/s
         (n=1000000)
            Merlyn:  0 wallclock secs ( 0.54 usr +  0.00 sys =  0.54 CPU) @ 1848428.84/s
         (n=1000000)
        
        - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        

        Um, I guess you didn't notice the ^ in KM's. (:

                - tye (but my friends call me "Tye")
Re: regexp for adding commas to a number
by tenatious (Beadle) on Aug 18, 2000 at 07:06 UTC
    Out of Andrew Johnson's Elements of Perl Programming:
    s/(\d{1,3}) (?= (?:\d\d\d)+ (?!\d) ) /$1,/gx;
Re: regexp for adding commas to a number
by Deven (Novice) on Feb 25, 2022 at 04:07 UTC
    A more recent addition to perlfaq5:
    This regex from Benjamin Goldberg will add commas to numbers:
    s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g;
    It is easier to see with comments:
    s/( ^[-+]? # beginning of number. \d+? # first digits before first comma (?= # followed by, (but not included in the match) : (?>(?:\d{3})+) # some positive multiple of three digits. (?!\d) # an *exact* multiple, not x * 3 + 1 or whatever. ) | # or: \G\d{3} # after the last group, get three digits (?=\d) # but they have to have more digits after them. )/$1,/xg;