Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Printing large numbers with commas to show thousands groups.

by jnorden (Novice)
on Dec 27, 2019 at 01:13 UTC ( #11110642=perlmeditation: print w/replies, xml ) Need Help??

Here is yet another answer to this ancient and oft-asked question (for linux at least).

The standard advice is found in perlfaq5, on perlmonks, and elsewhere. Using Number::Format or a commify() sub works well, but isn't very convenient for modifying existing code that was written with printf. It seems unlikely that perl's printf will ever support %'d and friends (see printf(3), scroll to "Flag Characters").

However, almost any linux system has a /usr/bin/printf command with %' support. So, the simple definition

sub cprintf {system('/usr/bin/printf', @_)}
will let you use
cprintf("There are at least %'d ways to do it!\n", 42e6)
to print: There are at least 42,000,000 ways to do it!

For this to work, your LC_NUMERIC environment variable should be set to a locale that has a "thousands separator", such as en_US.UTF-8. It seems to work quite well, at least for simple cases. To modify existing code, just change your printf's to cprintf's and add apostrophe's to the formats as needed. Less work than wrapping each argument with a subroutine call and changing each format to %s. It also helps keep things clear and readable.

It's tempting to override the builtin printf, but that's not easy to do. The CORE documentation lists printf as a special keyword. (Playing with tied file-handles or other tricks might work, but doesn't seem worth it too me.)

On the other hand, if you want to auto-magically commify *all* your %d's, you could use:

sub cprintf { my($fmt)= shift; $fmt=~s/%(-?\d*)d/%'$1d/g; system('/usr/bin/printf', $fmt, @_) }
And then  cprintf("%d", 42e6) will print 42,000,000.

Of course, there are plenty of potential pitfalls to this simple approach, including:
  1. Every cprintf execs a new process. You probably don't want to call cprintf() 42,000,000 times.
  2. You can't use perl's %v format, and perl can't check for missing or extra args to cprintf like it does for printf.
  3. Runtime errors from /usr/bin/printf may be hard to catch or diagnose.
  4. To print to a file, you have to re-direct stdout. Commified printing to a string (sprintf) in this way would be tricky. Backtics invoke the shell, which would be fraught with danger. You'd need IPC::System::Simple or something similar.
  5. /usr/bin/printf has some unique features (some might be useful). See the gnu coreutils docs, or try 'info coreutils printf' ('man printf' isn't very complete).
PS: To emphasize the caution above about backticks, do not use  sub {system("/usr/bin/printf @_")} as your definition, since then system() might invoke a shell as well. For example,
system('/usr/bin/printf', 'Bash uses backtics for cmd substitution, eg: %s', '`ls *`')
will print
Bash uses backtics for cmd substitution, eg: `ls *`
This works because perl puts two strings into **argv and then directly execs /usr/bin/printf. But,
system( '/bin/printf "Bash uses backtics for cmd substitution, eg: %s" `ls *`' )
will do something quite different, since perl will pass the single string to a shell, which will then execute ls. Replace 'ls' with 'rm' (or, horrors, 'rm -rf') and you've got a disaster on your hands!

Happy Holidays, and best wishes for the new year!
-Jeff

Replies are listed 'Best First'.
Re: Printing large numbers with commas to show thousands groups.
by shmem (Chancellor) on Dec 27, 2019 at 13:48 UTC
    Of course, there are plenty of potential pitfalls to this simple approach

    Here's another approach to that without invocation of a sprintf binary, so no shell danger. Simply interpolate the "babycart operator" @{[]} into the format for any %'d conversion, call sprintf, then eval the result as a string. So something like

    cprintf "There are at least %'d ways to do it!\n", 42e6;

    will effectively result in

    printf "There are at least @{[commify('%d')]} ways to do it!\n", 42e6;

    with the difference that, unlike printf, the interpolation is done before the expansion of @{[]}
    (printf evaluates the expansion first, i.e. it calls commify with a literal '%d', and then interpolates the value. Bug?)

    but without the @{[]} evaluation being done at the time the format is assembled.

    All other perl sprintf conversions and flags can be used within the format.

    This does the trick:

    sub commify { local $_ = shift; my $spc = ''; s/^(\s+)// and $spc = $1; # trim and save leading sp +ace my $adj = 0; $adj++ while s/^([-+]?\d+)(\d{3})/$1,$2/; $spc =~ s/.{$adj}//; # adjust space for commas added s/\s{0,$adj}$// if /\s$/; # adjust right padding return $spc . $_; } sub cprintf { (my $format = shift) =~ s{ \%(['+0-9.-]+)?([df]) # capture all valid %d and %f flags +and modifiers }{ my $p = $1; my $c = $2; $p =~ s/'// ? "\@{[commify('%$p$c')]}" : "%$p$c" }gex; my $str = sprintf $format, @_; print eval "\"$str\""; } cprintf "%+'012d\n", 1e6; cprintf "<%-'12.6d>\n", 1e6; cprintf "<%-+12.6'd>\n", 1e6; cprintf "<%+12.6'd>\n", -1e6; cprintf "<%+12.6d>\n", -1e6; cprintf "<%+12.2'f>\n", 1234.5; cprintf "There are at least %'d ways to do it!\n", 42e6; __END__ +00,001,000,000 <1,000,000 > <+1,000,000 > < +1,000,000> < -1,000,000> < -1000000> < +1,234.50> There are at least 42,000,000 ways to do it!

    Of course, neither your nor my approach works for the printf FILEHANDLE FORMAT, LIST form of printf.

    update: fixed format substitution

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
        Hell, yes. There's always somebody who did it better than I could ever do. But! From the OP:
        To modify existing code, just change your printf's to cprintf's and add apostrophe's to the formats as needed. Less work than wrapping each argument with a subroutine call and changing each format to %s. It also helps keep things clear and readable.

        So, my attempt was to provide a solution without shelling out printf. And done, without having to pull in another package and introducing object syntax.

        perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: Printing large numbers with commas to show thousands groups. (updated)
by haukex (Chancellor) on Dec 28, 2019 at 22:10 UTC

    Accessing C's printf is also possible with the latest FFI::Platypus without calling an external binary:

    use warnings; use strict; use POSIX qw/LC_NUMERIC/; use FFI::Platypus 1.07; my $ffi = FFI::Platypus->new( api => 1 ); $ffi->lib(undef); # search current process for symbols (libc etc.) my $loc = $ffi->function( setlocale => ['int','string'] => 'string' ) ->call(LC_NUMERIC, ""); # set according to environment vars print "Locale: $loc\n"; $ffi->attach( [ printf => 'cprintf_i' ] => ['string'] => ['int'] => 'int' ); # variadic function cprintf_i("%'d\n", 42e6); # ##### UPDATE - Wrapped in a sub ##### $ffi->attach( [ snprintf => 'c_snprintf_i' ] => ['opaque','size_t','string'] => ['int'] => 'int' ); sub commify { my $i = shift; die "Bad integer $i" unless $i=~/\A-?[0-9]+\z/; my $in = "$i$i"; my $ptr = $ffi->cast('string' => 'opaque', $in); c_snprintf_i($ptr, length($in), "%'d", 0+$i); my $out = $ffi->cast('opaque' => 'string', $ptr); return $out; } print commify(42e6), "\n";

    This will print "42,000,000" for an en_US locale and "42.000.000" for a de_DE locale (this is the same behavior as /usr/bin/printf).

Re: Printing large numbers with commas to show thousands groups.
by jnorden (Novice) on Dec 31, 2019 at 18:14 UTC

    Thanks for the feedback! I forgot to mention String::Sprintf in my post, thanks for pointing it out as another possibility. I was unaware of FFI::Platypus, which looks really interesting for a lot of possible things. Too bad it isn't included in the core, but I guess portability to windows, etc, is a priority these days. Thanks for making me aware of it.

    After a bit more rumination, I though of another approach. One could post-process output to add commas, instead of modifying the code. This can be applied to the output of any program, or to data or log files. It might seem that deciding which numbers to commify would be an impossible task. It is, of course. But the following simple script seems to work surprisingly well.

    -Jeff

    #! /usr/bin/perl -w # commify.pl: commify *all* of the numbers in a file (or stdin). # Well, almost all. Add commas to strings of at least 5 digits, that # don't follow a decimal pt, comma, or word char, and don't start with # a zero. while (<>) { s/(?<! [.,\w]) ([1-9] \d{4,})/commify_digits($1)/egx; print; } sub commify_digits { local($_)= shift; s/(\d)(?=(\d{3})+$)/$1,/g; return $_; } =pod A few notes: Not commifying 4-digit numbers reduces the occurrence of unwanted commas. For example, we don't want "Dec 31, 2,019". A comma doesn't really add to the readability of a small number anyway. A string of digits beginning with a zero might be an octal number. Even if it isn't, 004,200,000 looks wrong, and 00,245 is confusing. If there is a word char before the digits, it might be part of a name of some sort: var1234567, or file_424242.txt. Or, it might be hex: 0x123456. We don't examine the character that follows the digits, since it may start a suffix: 1459798sec becomes 1,459,798sec. This is far from perfect, but seems to be useful. You can use it for data or log files which contain large numbers. You can also try to commify the output of ls, du, df, etc. For example, here are three views of a large iso image: $ ls -l devuan_dvd.iso -rw-rw-r-- 1 jeff jeff 4658798592 Oct 5 2018 devuan_dvd.iso $ ls -lh devuan_dvd.iso -rw-rw-r-- 1 jeff jeff 4.4G Oct 5 2018 devuan_dvd.iso $ ls -l devuan_dvd.iso | commify -rw-rw-r-- 1 jeff jeff 4,658,798,592 Oct 5 2018 devuan_dvd.iso The longer output of 'ls -l|commify' won't be as nice, since the columns become misaligned. It's not too bad, though, considering that it is produced by just 7 lines of code, The first s/// uses a negative-lookbehind to search for strings of digits, and then passes them to commify_digits. You can modify '[.,\w]' to adjust which digit strings are commified, or change '\d{4,}' to set the minimum length. The s/// in commify_digits is simple, since it gets an entire string of digits. It just adds a comma after each digit that is followed by a multiple of three (to the end of the string). Jeff Norden, Dec 2019. This code is in the public domain. =cut

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://11110642]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2020-04-09 04:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The most amusing oxymoron is:
















    Results (47 votes). Check out past polls.

    Notices?