tachyon-II has asked for the wisdom of the Perl Monks concerning the following question:
Last time I struggled with 64 bit perls and this it was in the context of getting uint32 behaviour from pure perl in order to get Math::Random::MT::Perl working. With a little help from a generous monk this was duly solved with an & 0xffffffff to constrain the 64 bits to 32.
In the course of updating Digest::JHash (a fast 32 bit hashing algorithm) I have now struck the same problem, but this time from the C/XS side. The issue is that once you start hashing using left bitshifts on unexpectely 64 bit wide integers you blow out from 32 bits into the upper 32 bits which results in the behaviour commonly called "does not work (TM)" The problem per se is that I need a *reliable* *portable* uint32_t. Now <stdint.h>, <inttypes.h> and <sys/types.h> all define your basic uint32_t BUT using inconsistent names and with no guarantee of only 32 bits. I spent some time working with the author of Math::Random::MT trying to get a portable way of declaring a uint32_t that you can then be assured is exactly 32 bits - no more, no less. We were unable to find a really portable solution. <stdint.h> is not even included with VCC on MSWin32 - but its only be part of the standard since C99 ;-)
I struck me that I might be missing something obvious. The common thread across perls on many systems is of course perl. If for some reason there was a uint32 typdef lodged in the guts of perl I could just use that. The U32 type seems to fit the bill but will this be portable to 64 bit perls? From my reading the only guarantee is that it will be at least 32 bits wide. I really want exactly 32 bits.
Does anyone have a good solution to this problem?
If there is not a solid 32bit type then is was contemplating a macro that is a noop on 32 bit systems, or does & 0xfffffff on 64 bit systems. What is the best way of detecting a 64 bit perl, or more particularly an accidentally 64 bit wide uint32 in the context of a header if/else.
Re: Perl XS portable uint32_t
by brian_d_foy (Abbot) on Jun 06, 2008 at 15:02 UTC
|
Most of the pain of taking over the maintenance of Crypt::Rijndael was solving this same problem. In that distro, see the rijndael.h file to see what I've done. Basically, I test for each operating system, architecture, or compiler and do the special thing for it to define UINT8 and UINT32. A lot of people have helped by sending in the magical #defines and right header files for their system. Also google my Crypt::Rijndael posts on use.perl for threads asking the same question.
I now have several virtual machines set up so I can test on the problem systems, mostly Solaris and various Windows environments. That helped a lot since I didn't have to guess if something might work and send it off. I could paly with the system. I also used HP's Test Drive stuff to check VMS. Too bad Sourceforge turned off their compile farm, which is the only reason I was still using them. ;(
Good Luck :)
| [reply] |
Re: Perl XS portable uint32_t
by salva (Canon) on Jun 06, 2008 at 14:13 UTC
|
#define INTSIZE 4 /**/
#define LONGSIZE 4 /**/
#define SHORTSIZE 2 /**/
BTW, the same information is available on the Perl side using the Config module.
You could also explicitly discard the upper bits from a possible 64bits word using...
v &= 0xffffffff;
and hope that the optimizer removes the superfluous instructions from the final code on 32 bit architectures
| [reply] [d/l] [select] |
Re: Perl XS portable uint32_t
by almut (Canon) on Jun 06, 2008 at 19:45 UTC
|
If there is not a solid 32bit type then I was contemplating a macro
that is a noop on 32 bit systems, or does & 0xfffffff on 64 bit systems.
I think simply doing & 0xffffffff on 64-bit systems (whenever
a value larger than 0xffffffff could be the result of an individual
operation) is not such a bad idea... in case unsigned long happens to
be wider than 32-bit (which can be determined easily, either at build- or
at runtime). At least I would guess it's less headaches getting it to work portably,
than trying to make provisions to always use the appropriate int type,
where the compiler is making sure that exactly 32-bit are being used.
The additional and-operations (53 of which are needed here)
are typically very fast, so the associated performance penalty on
64-bit platforms is likely going to be acceptable.
In fact, I just benchmarked it. With the test routines computing the
jhash for 100 random strings of length 100000, I get on average:
Rate u32 u64
u32 54.9/s -- -13%
u64 63.3/s 15% --
where 'u32' is the version with the additional & 0xffffffff instructions,
producing correct results, while 'u64' is the original version producing incorrect results with 64-bit ints.
(Perl v5.8.8 built for x86_64-linux-thread-multi; gcc 4.1.0)
With a string length of only 100 chars, the difference reduces to
about 4%, because the function calling overhead becomes larger, relatively...
The ~14% is close to the 'true' slow down attributable to the additional and-operations —
in other words, it's roughly the asymptotic limit, no longer increasing
significantly with greater string lengths.
| [reply] [d/l] [select] |
|
DEFINE => ($Config{use64bitint} ? '-DUSING_64_BIT_INT' : ''),
Then in the C you can use this to provide 2 versions of the MIX macro. I may well be wrong (can't test) but it seems to me that none of the & MASK32 are required in the main jhash code, provided you use a macro setup as shown below. The a b c ints will probably overflow into the low bits of the 64 bit space but because all the operations are addition the low order bits will be the valid representation in 32 bit space. Could you give this a try:
/* Need to constrain U32 to only 32 bits on 64 bit systems
* For efficiency we only use the & 0xffffffff if required
*/
#define USING_64_BIT_INT /* save messing with Makefile.PL define */
#if defined(USING_64_BIT_INT)
#define MIX(a,b,c) \
{ \
a &= 0xffffffff; b &= 0xffffffff; c &= 0xffffffff; \
a -= b; a -= c; a ^= (c>>13); a &= 0xffffffff; \
b -= c; b -= a; b ^= (a<<8); b &= 0xffffffff; \
c -= a; c -= b; c ^= (b>>13); c &= 0xffffffff; \
a -= b; a -= c; a ^= (c>>12); a &= 0xffffffff; \
b -= c; b -= a; b ^= (a<<16); b &= 0xffffffff; \
c -= a; c -= b; c ^= (b>>5); c &= 0xffffffff; \
a -= b; a -= c; a ^= (c>>3); a &= 0xffffffff; \
b -= c; b -= a; b ^= (a<<10); b &= 0xffffffff; \
c -= a; c -= b; c ^= (b>>15); c &= 0xffffffff; \
}
#else
#define MIX(a,b,c) \
{ \
a -= b; a -= c; a ^= (c>>13); \
b -= c; b -= a; b ^= (a<<8); \
c -= a; c -= b; c ^= (b>>13); \
a -= b; a -= c; a ^= (c>>12); \
b -= c; b -= a; b ^= (a<<16); \
c -= a; c -= b; c ^= (b>>5); \
a -= b; a -= c; a ^= (c>>3); \
b -= c; b -= a; b ^= (a<<10); \
c -= a; c -= b; c ^= (b>>15); \
}
#endif
/* rest of code unchanged with no masking */
| [reply] [d/l] [select] |
|
I would rather not just compile in the & MASK32 if it is not needed.
That's not what I was trying to say :) Rather, I meant to
imply two things: (a) it might be easier to just do the 32-bit masking
yourself, in case you can't easily figure out what the respective magic
non-standard __I32_ulong__t incantation is to get the proper 32-bit
type on that yet unknown weird 64-bit platform/compiler combo, (b) the performance
penalty of doing so is not as big as one might expect.
But you're right, the amount of masking statements actually
required may be simplified (I must admit, I hadn't put too much thought
into the details here). In particular, masking a, b, c centrally
at the beginning of the macro, saves you from having to do so at the
end of various other code paths that you do come along...
Interestingly though, there's not much gain in performance from the reduced
masking — I now get on average:
Rate u32 u64
u32 56.2/s -- -12%
u64 63.7/s 13% --
(under otherwise identical conditions)
Anyhow, I tested your simplified code suggestion, and it's working fine (at
least on x86_64-linux).
Thanks for adding another useful module to CPAN!
| [reply] [d/l] [select] |
|
|
It is easy enough to set a define in the Makefile.PL if use64bitint is set
Just be a little cautious with $Config{use64bitint}. It doesn't always tell you what you want/need to know. On 32-bit systems where perl is built with -Duse64bitint, the 'long' and 'int' sizes can (and generally do, I believe) remain at 4 bytes.
I think I've also seen perls built with -Dusemorebits (the equivalent of building with -Duse64bitint && -Duselongdouble) that have neither use64bitint nor uselongdouble defined.
And finally, it would be possible to have 64-bit longs and ints in play without having built with use64bitint support (ie when 64 bits is the size of the long/int on the particular compiler being used).
There are probably other aspects to consider as well. (See the INSTALL file that ships with the perl source for a more authoritative account.)
Cheers, Rob
| [reply] |
Re: Perl XS portable uint32_t
by syphilis (Archbishop) on Jun 06, 2008 at 13:10 UTC
|
As regards Digest::JHash, isn't it just a matter of determining the size of unsigned long and unsigned int (preferably during pre-processing), and then proceeding accordingly ?
That is, I'm suggesting that the questions you asked (which I don't feel competent to answer, btw) are irrelevant to Digest::JHash. All it really needs to know are the sizes of unsigned longs and unsigned ints - and sizeof() can provide that answer.
Cheers, Rob Update: Aaah ... but sizeof() can't provide that info during pre-processing - which therefore adds a layer of complexity. | [reply] |
|
1111 << 1 = 1110 (4 bit)
1111 << 1 = 00011110 (8 bit)
Now consider what happens if we then go on to perform a rightshift
1110 >> 2 = 0011 (4 bit)
00011110 >> 2 = 00000111 (8 bit)
^
Oops
So after two identical operations the results on 4 vs 8 bit architecture now differ. Essentially by having the spare high order bits we do not lose those bits to the big bit bucket in the sky, so when we right shift they reappear. As a result any algorithm that uses much bitshifting will not work as desired if the int being used is not exacty the design width.
Unfortunately you can't use sizeof in a preprocessor directive to do the setup one way on a 32 bit machine and another way on a 64 bit one.
Cheers
tachyon | [reply] [d/l] |
|
The example you provided doesn't match what I'm finding with my 64-bit Microsoft Platform SDK for Windows Server 2003 R2 compiler on Vista 64. This compiler has 32-bit longs and ints - yet those high order bits are, I think, being lost to the "big bit bucket in the sky":
use warnings;
use Inline C => Config =>
BUILD_NOISY => 1;
use Inline C => <<'EOC';
void foo() {
unsigned long x = 0xffffffff;
printf("long: %d\nint: %d\n", sizeof(long), sizeof(int));
printf("%x\n", x);
x <<= 2;
printf("%x\n", x);
x >>= 2;
printf("%x\n", x);
}
EOC
foo();
__END__
Outputs:
long: 4
int: 4
ffffffff
fffffffc
3fffffff
Maybe this behaviour is not reliable across the full range of compilers/systems/architectures. (I honestly wouldn't know.)
Unfortunately you can't use sizeof in a preprocessor directive to do the setup one way on a 32 bit machine and another way on a 64 bit one
Yes - I was finding that out for myself (probably as you were writing your reply :-)
However, you can have the Makefile.PL query $Config{intsize} and $Config{longsize}. And the Makefile.PL can then define symbols (based on those config values) that the pre-processor can make use of.
Cheers, Rob | [reply] [d/l] |
|
|
Re: Perl XS portable uint32_t
by chrstphrchvz (Scribe) on Jun 24, 2018 at 09:17 UTC
|
This is extremely old, but I don't believe anyone set the record straight regarding the false premise that a uint32_t does not guarantee exactly 32-bits.
A uint32_t is guaranteed to be exactly 32-bits. If an architecture has no way of meeting that requirement, then a compiler must not make uint32_t available; it would be in violation of C99 if it did.
Just thought this needed to be said. Carry on…
| [reply] [d/l] [select] |
|
|