Here is yet another answer to this ancient and oft-asked question (for linux at least).
The standard advice is found in
perlfaq5,
on perlmonks, and
elsewhere.
Using Number::Format or a commify() sub works well, but isn't very convenient
for modifying existing code that was written with printf. It seems unlikely
that perl's printf will ever support %'d and friends
(see
printf(3), scroll to "Flag Characters").
However, almost any linux system has a /usr/bin/printf command with %'
support. So, the simple definition
sub cprintf {system('/usr/bin/printf', @_)}
will let you use
cprintf("There are at least %'d ways to do it!\n", 42e6)
to print:
There are at least 42,000,000 ways to do it!
For this to work, your LC_NUMERIC environment variable should be set to a
locale that has a "thousands separator", such as en_US.UTF-8. It seems to
work quite well, at least for simple cases. To modify existing code, just
change your printf's to cprintf's and add apostrophe's to the formats as
needed. Less work than wrapping each argument with a subroutine call and
changing each format to %s. It also helps keep things clear and readable.
It's tempting to override the builtin printf, but that's not easy to
do. The
CORE
documentation lists printf as a special keyword. (Playing with tied
file-handles or other tricks might work, but doesn't seem worth it too me.)
On the other hand, if you want to auto-magically commify *all* your %d's, you
could use:
sub cprintf {
my($fmt)= shift;
$fmt=~s/%(-?\d*)d/%'$1d/g;
system('/usr/bin/printf', $fmt, @_)
}
And then cprintf("%d", 42e6) will print 42,000,000.
Of course, there are plenty of potential pitfalls to this simple approach,
including:
-
Every cprintf execs a new process. You probably don't want to call cprintf()
42,000,000 times.
-
You can't use perl's %v format, and perl can't check for missing or extra args
to cprintf like it does for printf.
-
Runtime errors from /usr/bin/printf may be hard to catch or diagnose.
-
To print to a file, you have to re-direct stdout. Commified printing to a
string (sprintf) in this way would be tricky. Backtics invoke the shell,
which would be fraught with danger. You'd need
IPC::System::Simple or something similar.
-
/usr/bin/printf has some unique features (some might be useful). See the
gnu coreutils docs, or try 'info coreutils printf' ('man printf' isn't very
complete).
PS: To emphasize the caution above about backticks, do not use
sub {system("/usr/bin/printf @_")} as your definition, since then
system() might invoke a shell as well. For example,
system('/usr/bin/printf',
'Bash uses backtics for cmd substitution, eg: %s',
'`ls *`')
will print
Bash uses backtics for cmd substitution, eg: `ls *`
This works because perl puts two strings into **argv and then directly execs
/usr/bin/printf. But,
system(
'/bin/printf "Bash uses backtics for cmd substitution, eg: %s" `ls *`'
)
will do something quite different, since perl will pass the single string to a
shell, which will then execute ls. Replace 'ls' with 'rm' (or, horrors,
'rm -rf') and you've got a disaster on your hands!
Happy Holidays, and best wishes for the new year!
-Jeff
Re: Printing large numbers with commas to show thousands groups.
by shmem (Chancellor) on Dec 27, 2019 at 13:48 UTC
|
Of course, there are plenty of potential pitfalls to this simple approach
Here's another approach to that without invocation of a sprintf binary, so no shell danger. Simply interpolate the "babycart operator" @{[]} into the format for any %'d conversion, call sprintf, then eval the result as a string. So something like
cprintf "There are at least %'d ways to do it!\n", 42e6;
will effectively result in
printf "There are at least @{[commify('%d')]} ways to do it!\n", 42e6;
with the difference that, unlike printf, the interpolation is done before the expansion of @{[]}
(printf evaluates the expansion first, i.e. it calls commify with a literal '%d', and then interpolates the value. Bug?)
but without the @{[]} evaluation being done at the time the format is assembled.
All other perl sprintf conversions and flags can be used within the format.
This does the trick:
sub commify {
local $_ = shift;
my $spc = ''; s/^(\s+)// and $spc = $1; # trim and save leading sp
+ace
my $adj = 0;
$adj++ while s/^([-+]?\d+)(\d{3})/$1,$2/;
$spc =~ s/.{$adj}//; # adjust space for commas added
s/\s{0,$adj}$// if /\s$/; # adjust right padding
return $spc . $_;
}
sub cprintf {
(my $format = shift) =~
s{
\%(['+0-9.-]+)?([df]) # capture all valid %d and %f flags
+and modifiers
}{
my $p = $1;
my $c = $2;
$p =~ s/'// ? "\@{[commify('%$p$c')]}" : "%$p$c"
}gex;
my $str = sprintf $format, @_;
print eval "\"$str\"";
}
cprintf "%+'012d\n", 1e6;
cprintf "<%-'12.6d>\n", 1e6;
cprintf "<%-+12.6'd>\n", 1e6;
cprintf "<%+12.6'd>\n", -1e6;
cprintf "<%+12.6d>\n", -1e6;
cprintf "<%+12.2'f>\n", 1234.5;
cprintf "There are at least %'d ways to do it!\n", 42e6;
__END__
+00,001,000,000
<1,000,000 >
<+1,000,000 >
< +1,000,000>
< -1,000,000>
< -1000000>
< +1,234.50>
There are at least 42,000,000 ways to do it!
Of course, neither your nor my approach works for the printf FILEHANDLE FORMAT, LIST form of printf.
update: fixed format substitution
perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
| [reply] [d/l] [select] |
|
| [reply] |
|
| [reply] [d/l] |
|
Re: Printing large numbers with commas to show thousands groups. (updated)
by haukex (Archbishop) on Dec 28, 2019 at 22:10 UTC
|
Accessing C's printf is also possible with the latest FFI::Platypus without calling an external binary:
use warnings;
use strict;
use POSIX qw/LC_NUMERIC/;
use FFI::Platypus 1.07;
my $ffi = FFI::Platypus->new( api => 1 );
$ffi->lib(undef); # search current process for symbols (libc etc.)
my $loc = $ffi->function( setlocale => ['int','string'] => 'string' )
->call(LC_NUMERIC, ""); # set according to environment vars
print "Locale: $loc\n";
$ffi->attach( [ printf => 'cprintf_i' ] => ['string']
=> ['int'] => 'int' ); # variadic function
cprintf_i("%'d\n", 42e6);
# ##### UPDATE - Wrapped in a sub #####
$ffi->attach( [ snprintf => 'c_snprintf_i' ]
=> ['opaque','size_t','string'] => ['int'] => 'int' );
sub commify {
my $i = shift;
die "Bad integer $i" unless $i=~/\A-?[0-9]+\z/;
my $in = "$i$i";
my $ptr = $ffi->cast('string' => 'opaque', $in);
c_snprintf_i($ptr, length($in), "%'d", 0+$i);
my $out = $ffi->cast('opaque' => 'string', $ptr);
return $out;
}
print commify(42e6), "\n";
This will print "42,000,000" for an en_US locale and "42.000.000" for a de_DE locale (this is the same behavior as /usr/bin/printf). | [reply] [d/l] [select] |
Re: Printing large numbers with commas to show thousands groups.
by jnorden (Novice) on Dec 31, 2019 at 18:14 UTC
|
Thanks for the feedback! I forgot to mention String::Sprintf in my post, thanks for pointing it out as another possibility. I was unaware of FFI::Platypus, which looks really interesting for a lot of possible things. Too bad it isn't included in the core, but I guess portability to windows, etc, is a priority these days. Thanks for making me aware of it.
After a bit more rumination, I though of another approach. One could post-process output to add commas, instead of modifying the code. This can be applied to the output of any program, or to data or log files. It might seem that deciding which numbers to commify would be an impossible task. It is, of course. But the following simple script seems to work surprisingly well.
-Jeff
#! /usr/bin/perl -w
# commify.pl: commify *all* of the numbers in a file (or stdin).
# Well, almost all. Add commas to strings of at least 5 digits, that
# don't follow a decimal pt, comma, or word char, and don't start with
# a zero.
while (<>) {
s/(?<! [.,\w]) ([1-9] \d{4,})/commify_digits($1)/egx;
print;
}
sub commify_digits {
local($_)= shift;
s/(\d)(?=(\d{3})+$)/$1,/g;
return $_;
}
=pod
A few notes:
Not commifying 4-digit numbers reduces the occurrence of unwanted
commas. For example, we don't want "Dec 31, 2,019". A comma doesn't
really add to the readability of a small number anyway. A string of
digits beginning with a zero might be an octal number. Even if it
isn't, 004,200,000 looks wrong, and 00,245 is confusing. If there is
a word char before the digits, it might be part of a name of some
sort: var1234567, or file_424242.txt. Or, it might be hex: 0x123456.
We don't examine the character that follows the digits, since it may
start a suffix: 1459798sec becomes 1,459,798sec.
This is far from perfect, but seems to be useful. You can use it for
data or log files which contain large numbers. You can also try to
commify the output of ls, du, df, etc. For example, here are three
views of a large iso image:
$ ls -l devuan_dvd.iso
-rw-rw-r-- 1 jeff jeff 4658798592 Oct 5 2018 devuan_dvd.iso
$ ls -lh devuan_dvd.iso
-rw-rw-r-- 1 jeff jeff 4.4G Oct 5 2018 devuan_dvd.iso
$ ls -l devuan_dvd.iso | commify
-rw-rw-r-- 1 jeff jeff 4,658,798,592 Oct 5 2018 devuan_dvd.iso
The longer output of 'ls -l|commify' won't be as nice, since the
columns become misaligned. It's not too bad, though, considering that
it is produced by just 7 lines of code,
The first s/// uses a negative-lookbehind to search for strings of
digits, and then passes them to commify_digits. You can modify
'[.,\w]' to adjust which digit strings are commified, or change
'\d{4,}' to set the minimum length. The s/// in commify_digits is
simple, since it gets an entire string of digits. It just adds a
comma after each digit that is followed by a multiple of three (to the
end of the string).
Jeff Norden, Dec 2019. This code is in the public domain.
=cut
| [reply] [d/l] |
|
|