Perl XS portable uint32

tachyon-II has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Perl XS portable uint32_t by brian_d_foy (Abbot) on Jun 06, 2008 at 15:02 UTC
Most of the pain of taking over the maintenance of Crypt::Rijndael was solving this same problem. In that distro, see the rijndael.h file to see what I've done. Basically, I test for each operating system, architecture, or compiler and do the special thing for it to define UINT8 and UINT32. A lot of people have helped by sending in the magical #defines and right header files for their system. Also google my Crypt::Rijndael posts on use.perl for threads asking the same question. I now have several virtual machines set up so I can test on the problem systems, mostly Solaris and various Windows environments. That helped a lot since I didn't have to guess if something might work and send it off. I could paly with the system. I also used HP's Test Drive stuff to check VMS. Too bad Sourceforge turned off their compile farm, which is the only reason I was still using them. ;( Good Luck :) -- brian d foy <brian@stonehenge.com> Subscribe to The Perl Review	[reply]
Re: Perl XS portable uint32_t by salva (Canon) on Jun 06, 2008 at 14:13 UTC
along the perl.h header file installed on you system, you will find config.h (for instance, in my Debian box it is at `/usr/lib/perl/5.10.0/CORE/config.h`) containing all the information captured by Configure when perl was built. Specifically it has information about the size of most common C integer types. For instance: `#define INTSIZE 4 // #define LONGSIZE 4 // #define SHORTSIZE 2 /**/` [download] BTW, the same information is available on the Perl side using the Config module. You could also explicitly discard the upper bits from a possible 64bits word using... `v &= 0xffffffff;` [download] and hope that the optimizer removes the superfluous instructions from the final code on 32 bit architectures	[reply] [d/l] [select]
Re: Perl XS portable uint32_t by almut (Canon) on Jun 06, 2008 at 19:45 UTC
If there is not a solid 32bit type then I was contemplating a macro that is a noop on 32 bit systems, or does & 0xfffffff on 64 bit systems. I think simply doing `& 0xffffffff` on 64-bit systems (whenever a value larger than `0xffffffff` could be the result of an individual operation) is not such a bad idea... in case `unsigned long` happens to be wider than 32-bit (which can be determined easily, either at build- or at runtime). At least I would guess it's less headaches getting it to work portably, than trying to make provisions to always use the appropriate int type, where the compiler is making sure that exactly 32-bit are being used. The additional `and`-operations (53 of which are needed here) are typically very fast, so the associated performance penalty on 64-bit platforms is likely going to be acceptable. In fact, I just benchmarked it. With the test routines computing the jhash for 100 random strings of length 100000, I get on average: `Rate u32 u64 u32 54.9/s -- -13% u64 63.3/s 15% --` [download] where 'u32' is the version with the additional `& 0xffffffff` instructions, producing correct results, while 'u64' is the original version producing incorrect results with 64-bit ints. (Perl v5.8.8 built for x86_64-linux-thread-multi; gcc 4.1.0) With a string length of only 100 chars, the difference reduces to about 4%, because the function calling overhead becomes larger, relatively... The ~14% is close to the 'true' slow down attributable to the additional `and`-operations — in other words, it's roughly the asymptotic limit, no longer increasing significantly with greater string lengths. Read more... The benchmark code (748 Bytes) Read more... The modified jhash() routine (4 kB)	[reply] [d/l] [select]
Re^2: Perl XS portable uint32_t by tachyon-II (Chaplain) on Jun 07, 2008 at 08:01 UTC
Hello again almut. I would rather not just compile in the & MASK32 if it is not needed. It is easy enough to set a define in the Makefile.PL if use64bitint is set: `DEFINE => ($Config{use64bitint} ? '-DUSING_64_BIT_INT' : ''),` [download] Then in the C you can use this to provide 2 versions of the MIX macro. I may well be wrong (can't test) but it seems to me that none of the & MASK32 are required in the main jhash code, provided you use a macro setup as shown below. The a b c ints will probably overflow into the low bits of the 64 bit space but because all the operations are addition the low order bits will be the valid representation in 32 bit space. Could you give this a try: /* Need to constrain U32 to only 32 bits on 64 bit systems * For efficiency we only use the & 0xffffffff if required / #define USING_64_BIT_INT / save messing with Makefile.PL define / #if defined(USING_64_BIT_INT) #define MIX(a,b,c) \ { \ a &= 0xffffffff; b &= 0xffffffff; c &= 0xffffffff; \ a -= b; a -= c; a ^= (c>>13); a &= 0xffffffff; \ b -= c; b -= a; b ^= (a<<8); b &= 0xffffffff; \ c -= a; c -= b; c ^= (b>>13); c &= 0xffffffff; \ a -= b; a -= c; a ^= (c>>12); a &= 0xffffffff; \ b -= c; b -= a; b ^= (a<<16); b &= 0xffffffff; \ c -= a; c -= b; c ^= (b>>5); c &= 0xffffffff; \ a -= b; a -= c; a ^= (c>>3); a &= 0xffffffff; \ b -= c; b -= a; b ^= (a<<10); b &= 0xffffffff; \ c -= a; c -= b; c ^= (b>>15); c &= 0xffffffff; \ } #else #define MIX(a,b,c) \ { \ a -= b; a -= c; a ^= (c>>13); \ b -= c; b -= a; b ^= (a<<8); \ c -= a; c -= b; c ^= (b>>13); \ a -= b; a -= c; a ^= (c>>12); \ b -= c; b -= a; b ^= (a<<16); \ c -= a; c -= b; c ^= (b>>5); \ a -= b; a -= c; a ^= (c>>3); \ b -= c; b -= a; b ^= (a<<10); \ c -= a; c -= b; c ^= (b>>15); \ } #endif / rest of code unchanged with no masking */ [download]	[reply] [d/l] [select]
Re^3: Perl XS portable uint32_t by almut (Canon) on Jun 07, 2008 at 10:27 UTC
I would rather not just compile in the & MASK32 if it is not needed. That's not what I was trying to say :) Rather, I meant to imply two things: (a) it might be easier to just do the 32-bit masking yourself, in case you can't easily figure out what the respective magic non-standard `__I32_ulong__t` incantation is to get the proper 32-bit type on that yet unknown weird 64-bit platform/compiler combo, (b) the performance penalty of doing so is not as big as one might expect. But you're right, the amount of masking statements actually required may be simplified (I must admit, I hadn't put too much thought into the details here). In particular, masking `a, b, c` centrally at the beginning of the macro, saves you from having to do so at the end of various other code paths that you do come along... Interestingly though, there's not much gain in performance from the reduced masking — I now get on average: `Rate u32 u64 u32 56.2/s -- -12% u64 63.7/s 13% --` [download] (under otherwise identical conditions) Anyhow, I tested your simplified code suggestion, and it's working fine (at least on x86_64-linux). Thanks for adding another useful module to CPAN!	[reply] [d/l] [select]
Re^4: Perl XS portable uint32_t by tachyon-II (Chaplain) on Jun 08, 2008 at 04:15 UTC
Re^3: Perl XS portable uint32_t by syphilis (Archbishop) on Jun 07, 2008 at 23:07 UTC
It is easy enough to set a define in the Makefile.PL if use64bitint is set Just be a little cautious with $Config{use64bitint}. It doesn't always tell you what you want/need to know. On 32-bit systems where perl is built with -Duse64bitint, the 'long' and 'int' sizes can (and generally do, I believe) remain at 4 bytes. I think I've also seen perls built with -Dusemorebits (the equivalent of building with -Duse64bitint && -Duselongdouble) that have neither use64bitint nor uselongdouble defined. And finally, it would be possible to have 64-bit longs and ints in play without having built with use64bitint support (ie when 64 bits is the size of the long/int on the particular compiler being used). There are probably other aspects to consider as well. (See the INSTALL file that ships with the perl source for a more authoritative account.) Cheers, Rob	[reply]
Re: Perl XS portable uint32_t by syphilis (Archbishop) on Jun 06, 2008 at 13:10 UTC
As regards Digest::JHash, isn't it just a matter of determining the size of unsigned long and unsigned int (preferably during pre-processing), and then proceeding accordingly ? That is, I'm suggesting that the questions you asked (which I don't feel competent to answer, btw) are irrelevant to Digest::JHash. All it really needs to know are the sizes of unsigned longs and unsigned ints - and sizeof() can provide that answer. Cheers, Rob Update: Aaah ... but sizeof() can't provide that info during pre-processing - which therefore adds a layer of complexity.	[reply]
Re^2: Perl XS portable uint32_t by tachyon-II (Chaplain) on Jun 06, 2008 at 13:33 UTC
Hi Rob, Unfortunately most hashing behaviour depends on the idiosyncrcies of overflow wrap and truncation implicit in an (32 bit) int. Consider this case (using a 4 bit "int" on a 4 and 8 bit machine) `1111 << 1 = 1110 (4 bit) 1111 << 1 = 00011110 (8 bit) Now consider what happens if we then go on to perform a rightshift 1110 >> 2 = 0011 (4 bit) 00011110 >> 2 = 00000111 (8 bit) ^ Oops` [download] So after two identical operations the results on 4 vs 8 bit architecture now differ. Essentially by having the spare high order bits we do not lose those bits to the big bit bucket in the sky, so when we right shift they reappear. As a result any algorithm that uses much bitshifting will not work as desired if the int being used is not exacty the design width. Unfortunately you can't use sizeof in a preprocessor directive to do the setup one way on a 32 bit machine and another way on a 64 bit one. Cheers tachyon	[reply] [d/l]
Re^3: Perl XS portable uint32_t by syphilis (Archbishop) on Jun 06, 2008 at 14:15 UTC
The example you provided doesn't match what I'm finding with my 64-bit Microsoft Platform SDK for Windows Server 2003 R2 compiler on Vista 64. This compiler has 32-bit longs and ints - yet those high order bits are, I think, being lost to the "big bit bucket in the sky": `use warnings; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'EOC'; void foo() { unsigned long x = 0xffffffff; printf("long: %d\nint: %d\n", sizeof(long), sizeof(int)); printf("%x\n", x); x <<= 2; printf("%x\n", x); x >>= 2; printf("%x\n", x); } EOC foo(); __END__ Outputs: long: 4 int: 4 ffffffff fffffffc 3fffffff` [download] Maybe this behaviour is not reliable across the full range of compilers/systems/architectures. (I honestly wouldn't know.) Unfortunately you can't use sizeof in a preprocessor directive to do the setup one way on a 32 bit machine and another way on a 64 bit one Yes - I was finding that out for myself (probably as you were writing your reply :-) However, you can have the Makefile.PL query $Config{intsize} and $Config{longsize}. And the Makefile.PL can then define symbols (based on those config values) that the pre-processor can make use of. Cheers, Rob	[reply] [d/l]
Re^4: Perl XS portable uint32_t by tachyon-II (Chaplain) on Jun 06, 2008 at 15:30 UTC
Re^5: Perl XS portable uint32_t by syphilis (Archbishop) on Jun 06, 2008 at 22:30 UTC
Re: Perl XS portable uint32_t by chrstphrchvz (Scribe) on Jun 24, 2018 at 09:17 UTC
This is extremely old, but I don't believe anyone set the record straight regarding the false premise that a `uint32_t` does not guarantee exactly 32-bits. *A `uint32_t` is guaranteed* to be exactly 32-bits.** If an architecture has no way of meeting that requirement, then a compiler must not make `uint32_t` available; it would be in violation of C99 if it did. Just thought this needed to be said. Carry on�	[reply] [d/l] [select]


"be consistent"
	PerlMonks

Perl XS portable uint32_t