http://qs321.pair.com?node_id=959598

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Questions arising from the console log below:

  1. Why does reading bits from already allocated memory cause a memory allocation?

    Yes, I am indirecting through a substr ref, but I'm only reading.

  2. Why does unpack '%32b*', $v; fail?

    It works just fine here:

    [0] Perl> print unpack '%32b*', chr( 0b10101010 );; 4

Any relevant info gratefully received. Other workarounds for the vec > 2**31 offsets bug?

The following is an annotated console log from my perl REPL.

C:\test>p1 ## Start new session 2.5 MB allocated to process ## Create a 2**32 bit bitvector [0] Perl> $v = chr(0); $v x= 512*1024*1024;; ## Just over 512 MB allocated to process ## As vec cannot address offsets > 2**31, ## create a reference to the second half of the bitvector [0] Perl> $r = \substr $v, 256*1024*1024;; ## No further memory allocated; still just over 512 MB allocated to pr +ocess ## Set some bits in the first byte of the second half of the vector ## As we can't use offsets that size, address the bits as a byte (offs +et/8) [0] Perl> vec( $v, 256*1024*1024, 8 ) = 0b10101010;; ## No further memory allocated; still just over 512 MB allocated to pr +ocess ## Now check that we can see the bits we just set ## via the reference we created (offsets == actual offset - 2**31 avoi +ding the bug [0] Perl> print vec( $$r, $_, 1 ) for 0..7;; 0 1 0 1 0 1 0 1 ## Yes! we can see the bits. (They are in the reverse order to intuiti +on, but that's okay!) ######## BUT ... the memory usage jumped from 512MB to somewhat over 7 +68 MB! ######## ## Q1 ## Why? Why does reading bits from already allocated memory caus +e a memory allocation? ## Q1 ## ## Check the length of the vector [0] Perl> print length $v;; 536870912 ## Still 512 MB ## Try counting the bits [0] Perl> print unpack "%32b*", $v;; 0 ## Q2a ## ZERO! WTF? ## Q2a ## ## Try counting them via the reference [0] Perl> print unpack "%32b*", $$r;; 0 ## Q2b ## ZERO! WTF? ## Q2b ## ## Check the byte we set is non-zero [0] Perl> print vec( $v, 256*1024*1024, 8 );; 170 ## Still set. 170 == 0b10101010 ## Set a bi in the bottom half of the vector without reference tricker +y [0] Perl> vec( $v, 7, 1 ) = 1;; ## Now count the bits again [0] Perl> print unpack "%32b*", $v;; 0 ## Q2c ## WTF? ## Check the bit was actually set [0] Perl> print vec( $v, 7, 1 );; 1 ## Yup! ## Check the bits either side are unset [0] Perl> print vec( $v, 6, 1 );; 0 [0] Perl> print vec( $v, 8, 1 );; 0 ## Yup! ## Try doing the same thing again to see it anything changed. ## (Maybe Perl was busy taking a telemarketeers phone call) [0] Perl> print unpack "%32b*", $v;; 0 ## Q2d ## NOPE! Just doesn;t see the bits? ## Q2d ## ## Check via the reference (again). [0] Perl> print unpack "%32b*", $$r;; 0

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re: More 64-bit Perl bugs?
by Eliya (Vicar) on Mar 14, 2012 at 15:53 UTC

    As for the additional memory, when you Devel::Peek::Dump() the reference (with a shorter string), you can see that a PV has been created after having accessed the string (via "magic").  My first guess would be this is some caching behavior to increase performance with repeated accesses.

    use Devel::Peek; $v = chr(0); $v x= 50; $r = \substr $v, 25; Dump $r; vec( $v, 25, 8 ) = 0b10101010; print vec( $$r, $_, 1 ) for 0..7; Dump $r; __END__ SV = IV(0x7991e0) at 0x7991f0 REFCNT = 1 FLAGS = (ROK) RV = 0x771998 SV = PVLV(0x7a1ce0) at 0x771998 REFCNT = 1 FLAGS = (GMG,SMG) IV = 0 NV = 0 PV = 0 MAGIC = 0x7914f0 MG_VIRTUAL = &PL_vtbl_substr MG_TYPE = PERL_MAGIC_substr(x) TYPE = x TARGOFF = 25 TARGLEN = 25 TARG = 0x7991c0 SV = PV(0x76fc20) at 0x7991c0 REFCNT = 2 FLAGS = (POK,pPOK) PV = 0x7a4f50 "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ +0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"\0 CUR = 50 LEN = 56 0 1 0 1 0 1 0 1 SV = IV(0x7991e0) at 0x7991f0 REFCNT = 1 FLAGS = (ROK) RV = 0x771998 SV = PVLV(0x7a1ce0) at 0x771998 REFCNT = 1 FLAGS = (GMG,SMG,pPOK) IV = 0 NV = 0 PV = 0x7917e0 "\252\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ +0"\0 # <--- CUR = 25 LEN = 32 MAGIC = 0x7914f0 MG_VIRTUAL = &PL_vtbl_substr MG_TYPE = PERL_MAGIC_substr(x) TYPE = x TARGOFF = 25 TARGLEN = 25 TARG = 0x7991c0 SV = PV(0x76fc20) at 0x7991c0 REFCNT = 2 FLAGS = (POK,pPOK) PV = 0x7a4f50 "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ +0\252\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"\0 CUR = 50 LEN = 56

      You're right. And it seems to have been that way since (at least) 5.8.9.

      Damn, that's dumb :(


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

Re: More 64-bit Perl bugs?
by Eliya (Vicar) on Mar 14, 2012 at 17:52 UTC
    Other workarounds for the vec > 2**31 offsets bug?

    My crude approach to extending the addressable bit range to 2**32-1 would be to divide the offset by 2 and operate in units of 2 bits. Something like this:

    sub setbit { # args: var, offset, value (0/1) if ($_[1] % 2) { # odd offset if ($_[2]) { vec($_[0], $_[1]/2, 2) |= 0b10; # set } else { vec($_[0], $_[1]/2, 2) &= 1; # clear } } else { # even offset if ($_[2]) { vec($_[0], $_[1]/2, 2) |= 1; # set } else { vec($_[0], $_[1]/2, 2) &= 0b10; # clear } } } my $v = ""; setbit($v, 2**32-1, 1); # set bit setbit($v, 2**32-1, 0); # clear bit

    (and likewise for reading a bit)

    The idea could in principle be extended to other unit sizes (4, 8, ...), though the bit fiddling would then be somewhat more involved.

      That's a neat, 'sideways-look' solution to the problem.

      Except that I need to allocate > 4GB which means it'd just move the goalposts.

      I agree that by moving to (say) 8-bit unit size, would get me to 16 GB, but my target is 64 GB, which would require 10 bits, which means moveing to 16-bit unit size. At that point, I'd probably be better off treating the vector as an array of 64-bit units.

      The big loss is the convenience of the lvalue sub.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?