http://qs321.pair.com?node_id=11122413


in reply to Re^5: Influencing the Gconvert macro
in thread Influencing the Gconvert macro

Ok, the main calculation in the perl source is I think this statement from sv.c:

/* Determine the buffer size needed for the various * floating-point formats. * * The basic possibilities are: * * <---P---> * %f 1111111.123456789 * %e 1.111111123e+06 * %a 0x1.0f4471f9bp+20 * %g 1111111.12 * %g 1.11111112e+15 * * where P is the value of the precision in the format, or + 6 * if not specified. Note the two possible output formats +of * %g; in both cases the number of significant digits is < += * precision. * * For most of the format types the maximum buffer size ne +eded * is precision, plus: any leading 1 or 0x1, the radix * point, and an exponent. The difficult one is %f: for a * large positive exponent it can have many leading digits +, * which needs to be calculated specially. Also %a is slig +htly * different in that in the absence of a specified precisi +on, * it uses as many digits as necessary to distinguish * different values. * * First, here are the constant bits. For ease of calculat +ion * we over-estimate the needed buffer size, for example by * assuming all formats have an exponent and a leading 0x1 +. * * Also for production use, add a little extra overhead fo +r * safety's sake. Under debugging don't, as it means we're * more likely to quickly spot issues during development. */ float_need = 1 /* possible unary minus */ + 4 /* "0x1" plus very unlikely carry */ + 1 /* default radix point '.' */ + 2 /* "e-", "p+" etc */ + 6 /* exponent: up to 16383 (quad fp) */ #ifndef DEBUGGING + 20 /* safety net */ #endif + 1; /* \0 */

.. after which if we are subject to locale it goes and checks the actual length of the utf8 representation of the radix point and adjusts that "+ 1" for the default. The above adds up to 35, which is pretty close to the difference between 91 and 127.

The origin of the gcc warning looks like it might be gimple-ssa-sprintf.c or a close relative, in which case the "#define target_mb_len_max() 6" may well explain the difference between 127 and 133.

So this looks pretty safe to me - and you'd certainly need a debugging perl to get close to exercising the limits.

That just leaves the question of whether we can give the compiler enough hints for it to come to the same conclusion, or whether we'd only be able to shut it up with a sledgehammer preprocessor directive.

Hugo

Replies are listed 'Best First'.
Re^7: Influencing the Gconvert macro
by syphilis (Archbishop) on Oct 02, 2020 at 03:55 UTC
    That just leaves the question of whether we can give the compiler enough hints for it to come to the same conclusion, or whether we'd only be able to shut it up with a sledgehammer preprocessor directive.

    I'm puzzled as to how/why this check is even being run.

    I've just built perl-5.33.2 with the usual configure args , making no attempt to influence the setting of Gconvert.
    But I've applied this patch to sv.c:
    --- sv.c 2020-09-29 22:29:16.781395700 +1000 +++ sv.c_mod 2020-10-02 11:35:20.728840400 +1000 @@ -13115,7 +13115,7 @@ && intsize != 'q' ) { WITH_LC_NUMERIC_SET_TO_NEEDED_IN(in_lc_numeric, - SNPRINTF_G(fv, ebuf, sizeof(ebuf), precis) + PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ); elen = strlen(ebuf); eptr = ebuf;
    That works fine but I'm not happy about the double-rounding that takes place when nvtype is 'double' 'long double'.
    We really want fv to be an NV, not a long double.
    And then we would need the sprintf() formatting to accommodate the nvtype - "g" versus "Lg".

    UPDATE: Duh ... there is no double-rounding ... but I think I still need to attend to the issue of "g" or "Lg" formatting.

    And it still produces that awful noise (see below my sig).
    The command that produces that noise is:
    cc -c -DPERL_CORE -fwrapv -fno-strict-aliasing -pipe -fstack-protector +-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS= +64 -std=c89 -O2 -Wall -Werror=pointer-arith -Wextra -Wc++-compat -Wwr +ite-strings -Werror=declaration-after-statement sv.c
    So I've tried (unsuccessfully) to reproduce those warnings by compiling the following C program:
    #include <stdio.h> int main(void) { char ebuf[127]; long double fv = 0.3L; int precis = 54; sprintf(ebuf, "%.*g", precis, (double) fv); printf("%s\n", ebuf); return 0; }
    I compiled it by running the same command (minus the perl-specific "-D..." switches) and it compiles noiselessly.
    So I guess that the noise must be introduced by something in those perl-specific switches.

    Do you know how to reproduce the warnings when compiling that C script ?

    Incidentally, AFAICS, that patch effectively removes Gconvert from the perl source entirely - except for Win32API-File, where the Gconvert call in cpan\Win32API-File\const2perl.h could be replaced with sprintf(), anyway.
    For Windows, Gconvert is already hard coded to sprintf().

    Cheers,
    Rob
    In file included from sv.c:32:0: sv.c: In function ‘Perl_sv_vcatpvfn_flags’: sv.c:13118:54: warning: ‘%.*g’ directive writing between 1 and 133 byt +es into a region of size 127 [-Wformat-overflow=] PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ perl.h:6791:13: note: in definition of macro ‘WITH_LC_NUMERIC_SET_TO_N +EEDED_IN’ block; + \ ^~~~~ sv.c:13118:21: note: in expansion of macro ‘PERL_UNUSED_RESULT’ PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ sv.c:13118:54: note: assuming directive output of 132 bytes PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ perl.h:6791:13: note: in definition of macro ‘WITH_LC_NUMERIC_SET_TO_N +EEDED_IN’ block; + \ ^~~~~ sv.c:13118:21: note: in expansion of macro ‘PERL_UNUSED_RESULT’ PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ In file included from /usr/include/stdio.h:862:0, from perlio.h:41, from iperlsys.h:50, from perl.h:3934, from sv.c:32: /usr/include/x86_64-linux-gnu/bits/stdio2.h:33:10: note: ‘__builtin___ +sprintf_chk’ output between 2 and 134 bytes into a destination of siz +e 127 return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ __bos (__s), __fmt, __va_arg_pack ()); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from sv.c:32:0: sv.c:13118:54: warning: ‘%.*g’ directive writing between 1 and 133 byt +es into a region of size 127 [-Wformat-overflow=] PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ perl.h:6791:13: note: in definition of macro ‘WITH_LC_NUMERIC_SET_TO_N +EEDED_IN’ block; + \ ^~~~~ sv.c:13118:21: note: in expansion of macro ‘PERL_UNUSED_RESULT’ PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ sv.c:13118:54: note: assuming directive output of 132 bytes PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ perl.h:6791:13: note: in definition of macro ‘WITH_LC_NUMERIC_SET_TO_N +EEDED_IN’ block; + \ ^~~~~ sv.c:13118:21: note: in expansion of macro ‘PERL_UNUSED_RESULT’ PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ In file included from /usr/include/stdio.h:862:0, from perlio.h:41, from iperlsys.h:50, from perl.h:3934, from sv.c:32: /usr/include/x86_64-linux-gnu/bits/stdio2.h:33:10: note: ‘__builtin___ +sprintf_chk’ output between 2 and 134 bytes into a destination of siz +e 127 return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ __bos (__s), __fmt, __va_arg_pack ());
      Do you know how to reproduce the warnings when compiling that C script ?

      I didn't want to die wondering (even if it killed me), so I eventually came up with this C program that reproduces the warnings:
      /* try.c */ #include <stdio.h> #include <stdlib.h> void foo(double); int main(void) { /* The value assigned to 'd' has no * * effect on the warning message. */ double d = 0.; foo(d); } void foo(double d) { char buf[127]; sprintf (buf, "%.*g\n", 126, d); printf("%s\n", buf); }
      Build with: gcc -o try.exe try.c -Wformat-overflow
      That compilation produces the following noise:
      try.c: In function ‘foo’: try.c:17:17: warning: ‘%.*g’ directive writing between 1 and 133 bytes + into a region of size 127 [-Wformat-overflow=] sprintf (buf, "%.*g", 126, d); ^~~~ try.c:17:16: note: assuming directive output of 132 bytes sprintf (buf, "%.*g", 126, d); ^~~~~~ try.c:17:2: note: ‘sprintf’ output between 2 and 134 bytes into a dest +ination of size 127 sprintf (buf, "%.*g", 126, d); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      which is essentially the same as the warnings I received when compiling perl.

      Apparently, the perl compilation process (with -O2 optimization) determines that the number of digits being requested is 126.
      It is then correctly calculated that the number of bytes written will be between 1 and 133 - which allows for the decimal point, a possible leading '-', a possible 'e', and a possible exponent of (maximum) size of 4.
      In those warnings, you'll see that the "1 and 133" changes to "2 and 134" - when the terminating NULL byte is included in the count.

      I haven't investigated just how the perl source compilation process makes the determination that the digit count has to be 126. It might just be a bug in the -O2 optimization - certainly, no warnings are emitted if the optimization level is reduced to less that -O2.

      As I mentioned previously, if the number of digits specified in the "%g" formatting is higher than 91, then the processing switches to a different block of code, so the buffer size of 127 is certainly large enough.
      I haven't looked into how or why that change occurs when digits > 91. It's not often that people will request more digits than 91 - so I'm not presently inclined to wade through the whys and wherefores of that processing path. It seems to be working correctly, and IMO that's good enough for now, at least.

      Update: Now that I understand how the warning is being created, I think it should be fairly simple to amend the perl source so that this warning is eliminated.

      Cheers,
      Rob
        I think it should be fairly simple to amend the perl source so that this warning is eliminated

        For anyone interested, here's the patch to sv.c that fixes the problem in a way that doesn't produce any warnings:

        Update: The patch meets the objective, but there's still scope for improvement:
        1) "précis + 7" is overkill. All that's needed to avoid the compilation warning is "précis + 1";
        2) the condition "&& float_need < sizeof(ebuf)" is not needed as it's implied by the next condition;
        3) the variable float_need is always set at 35 which feels wrong. AFAICS, "sizeof(ebuf) - float_need" always evaluates to 92 (ie 127 - 35).

        --- /cygdrive/c/comp/perl-5.33.2/sv.c 2020-10-04 18:15:27.028350200 ++1100 +++ sv.c 2020-10-04 18:13:55.508841200 +1100 @@ -13109,13 +13109,17 @@ /* check, in manner not involving wrapping, that it w +ill * fit in ebuf */ && float_need < sizeof(ebuf) - && sizeof(ebuf) - float_need > precis + && sizeof(ebuf) - float_need > precis + 7 && !(width || left || plus || alt) && !fill && intsize != 'q' ) { WITH_LC_NUMERIC_SET_TO_NEEDED_IN(in_lc_numeric, - SNPRINTF_G(fv, ebuf, sizeof(ebuf), precis) +#if defined(USE_LONG_DOUBLE) + PERL_UNUSED_RESULT(sprintf(ebuf, "%.*Lg", (int)pr +ecis, (NV) fv)) +#else + PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pr +ecis, (NV) fv)) +#endif ); elen = strlen(ebuf); eptr = ebuf;
        AFAICT changing précis to précis + 7 does not change functionality, and it's enough to eradicate the warning that the other changes would otherwise introduce
        Here's the patch to t/op/sprintf2.t that would have detected the original issue:
        --- sprintf2.t_orig 2020-10-04 18:24:56 +1100 +++ sprintf2.t 2020-10-04 18:24:37 +1100 @@ -1178,4 +1178,25 @@ is($off2, 1, "offset after 0"); } +# %g formatting was broken on Ubuntu, Debian and perhaps other system +s +# for a long time. Here we verify that no such breakage still exists. + +if($Config{nvsize} == 8) { + cmp_ok(sprintf("%.54g", 0.3), 'eq', '0.29999999999999998889776975 +3748434595 763683319091796875', + "sprintf( \"%.54g\", 0.3 ) renders correctly"); +} +elsif($Config{nvtype} eq 'long double' && ($Config{longdblkind} == 3 +|| $Config {longdblkind} == 4)) { + cmp_ok(sprintf("%.64g", 0.3), 'eq', '0.30000000000000000001084202 +1724855044 3400745280086994171142578125', + "sprintf( \"%.64g\", 0.3 ) renders correctly"); +} +elsif($Config{nvtype} eq 'long double' && $Config{longdblkind} >= 5 & +& $Config{ longdblkind} <= 8) { + # oops ... TODO (for double-double) +else { + cmp_ok(sprintf("%.115g", 0.3), 'eq', + '0.2999999999999999999999999999999999903703502780638207347 +2011028707 5363407309491758923059023800306022167205810546875', + "sprintf( \"%.115g\", 0.3 ) renders correctly"); +} + + done_testing();
        Update:I probably should point out that the "original issue" afflicts only perls whose nvsize is 8, AFAICT.
        Therefore, those tests that I've added to sprintf2.t for other types of NV are, in fact,, unrelated to that "original issue".

        Cheers,
        Rob