Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Perl Hashes in C?

by flexvault (Monsignor)
on Aug 13, 2015 at 11:33 UTC ( [id://1138415]=note: print w/replies, xml ) Need Help??


in reply to Perl Hashes in C?

BrianP

I downloaded the file, but it was in '.jpg' format and not a raw file. I used it in the following script, and then I copied it until the file size was larger than 217MB. The result of the 2 runs are below. I may be wrong but I don't think you need to convert to an integer and then use pack. The result will be the same as the raw 6 bytes ( 48 bits ) for the pixel.

use strict; use warnings; my %Image = (); keys %Image = 4096 * 128; my $rdamt = 4096 * 6; my $buffer; # my $file = '249465.pprgb-srgb.absv.4000.cs.jpg'; # Original + File downloaded my $file = '249465.pprgb-srgb.absv.4000.cs.jpg2'; # New File + multiple copies my $compraw = -s $file; my $pixels = int ( $compraw / 6 ); open ( my $in, "<", "./$file") or die "$!\n"; while( 1 ) { my $size = sysread( $in, $buffer, $rdamt ); # No bufferi +ng if ( $size == 0 ) { last; } while( $buffer ) { if ( length( $buffer ) < 6 ) { last; } # Throw away + odd number of pixels my $key = substr( $buffer, 0, 6, '' ); $Image{$key}++; } } close $in; my $uniq = keys %Image; my $factor = sprintf("%.2f", $uniq / $pixels ); print "Found $uniq colors in $pixels pixels $factor\%\n"; __END__

This is the results of run1, almost all unique pixels. About 1 second.

> time pyrperl uniqcolors.plx Found 153601 colors in 153606 pixels 1.00% ( File size: 921,638 ) real 0m0.098s user 0m0.092s sys 0m0.008s

This is using the larger file. We got some more uniques since the '.jpg' file was not a multiple of 6 bytes. About 15-16 seconds.

> time pyrperl uniqcolors.plx Found 460775 colors in 41473710 pixels 0.01% ( File size: 248,842,260 + ) real 0m15.015s user 0m14.937s sys 0m0.080s

Try it on your real data and let us know if it helps.

Regards...Ed

"Well done is better than well said." - Benjamin Franklin

Replies are listed 'Best First'.
Re^2: Perl Hashes in C?
by Anonymous Monk on Aug 13, 2015 at 17:32 UTC

    Ed,

    Your hard core, direct, laser focused code got the answer spot on the first go in the delightfully brief time of < 37 seconds:

    Running C:\bin\bb.pl Thu Aug 13 11:06:26 2015 Found 27645898 colors in 36152320 pixels 0.76% Elapsed time = 36.82 sec

    Franklin would have marveled at the design efficiency.

    I have been attempting to arrive at the same 6 byte Quanta size for hours using (*!_))#% pack/unpack. UINT16s work perfectly as do UINT64s. How is it possible that nobody has ever thought of a UINT48?

    32 bits is too wimpy; 4.3GB is not enough. But 4.3G*4.3G BYTES is clearly OverTheTop! 18446744073709551616 ????

    Wouldn't 65536 * 4294967296 Bytes be just about right? Surely 281474976710656 B "is more memory than anybody will ever need"?? 281 TER! Has a ring to it.

    A 24 bit processor would be ideally suited for GRAPHICS PROCESSING.

    I hate to be a pest (but it does not usually stop me). While I still have a residual amount of hair left, might I ask you if you could point out my obvious comprehensional deficit with UNPACK?

    I have all 217MB in memory. 8 byte quanta are too large and 4 byte are too small so I am stuck with 2 byte type "S", uint16_t. The inane method it all I can get to work, BIT shifting and ANDing:

    @ushort=unpack("S*", $buf); # < Ushort[108456960] == RIGHT Number! for($ii=0; $ii < scalar @ushort; $ii+=3) { ($rr, $gg, $bb) = @ushort[$ii .. $ii+2]; # Array slice MORONIC> $bs = $rr | ($gg << 16) | ($bb << 32); # << WORKS ;( $rgb2c{$bs}++; # Increment count for this color }

    This works, but as another Monk pointed out, finely slicing and dicing then bit shifting the diminutive chunks and ORing them back together is hardly as satisfying as using 6 byte, native Quanta.

    I usually need the individual colors so I need this type of code, just not here. This is a case of wanting to find my error rather than fixing a blocking bug.

    How hard can it be to unpack 3 of the units from my array and smash them into the $RGB key I need with NO MONKEY BUSINESS? I tried every permutation of type S I could think of. Type Q worked fine except that it gave 1 1/3 pixel at a time. Is there a way to Unpack 3 UINT16s at a time with UNPACK()??

    WORKS!> @q =unpack("Q*", $buf); $sq = scalar(@q) || -1; FAIL! @uint48=unpack("S3", $buf); $s48 = scalar(@uint48) || -1; FAIL! @uint48=unpack("S@3", $buf); $s48 = scalar(@uint48) || -1; FAIL! @uint48=unpack("S[3]", $buf); $s48 = scalar(@uint48) || -1 +; FAIL! @uint48=unpack("(SSS)*", $buf); $s48 = scalar(@uint48) || +-1;
    And other, Quixotic attempts at 48 BITness!

    If you can't UNPACK 'em, PACK THEM!

    I tried packing 3 shorts, a quad with a pair of NULLs chasing and many other schemes:

    #$quad=pack('Q', $rr, $gg, $bb, 0x0000); #$q2=pack('Q1', $rr, $gg, $bb, 0x0000); # Q2=0x0000000000000000 #$q4=pack('S4', $rr, $gg, $bb, 0x0000); # #$q5=pack("SSSS", $rr, $gg, $bb, 0x0000); # #$q3=pack('Q*', $rr, $gg, $bb, 0x0000); # Q3=0x0000000000000000 #$q4=pack("Q", $rr, $gg, $bb, 0x0000); # Q4=0x0000000000000000 #$q5=pack("S*", $rr, $gg, $bb, 0x0000); # Q5=0x0000000000000000 #$q5=pack("Q*", @ushort[$ii .. $ii+2]);

    I always got zero or some error or something unprintable.

    Obviously reading a buffer-full and carving 6 byte slices works. And, reading 3 uint16s and clumsily bit-stitching them together gets the job done. But reading the whole file and unpacking an entire array of finished products in 1 line would be the most elegant and likely the fastest.

    Where is DATATYPE "G"?

    @UINT48=unpack("G*", $buf); # NATIVE, 48BIT UNSIGNED GRAPHIC INTS!

    It is unlikely that either K or R had digital cameras offering RAW file output so they can be forgiven for overlooking the obvious utility of UINT48.

    Perhaps what K&R missed the Wall Gank can substantiate?

    Thank you, Brian

      BrianP,

        32 bits is too wimpy; 4.3GB is not enough. But 4.3G*4.3G BYTES is clearly OverTheTop! 18446744073709551616 ????
      You have to go back to the math. 8bit or 128bit machines can get the same answer, it's knowing how the bits need to be put together :-)

        Wouldn't 65536 * 4294967296 Bytes be just about right?
      For you: Yes, for me, 32bits are fine for 98% of my work. All of my servers have at least 16GB, and many have many times that amount. But I can use 32bit Perl for 98% of the work (smaller footprint), and 64 bit Perl for the rest. I also have 32bit Perl with 64bit Integers.

      You are used to working with decimal numbers, but pack/unpack can be used to convert between binary, octal, decimal and hexadecimal. To use 48bit RGB, just think of the 6 octets as 3 16 bit numbers. Then this works:

      my $myNum = 65000 * (2**32); my ( $R,$G,$B ) = unpack("nnn", $myNum ); print "\$myNum: $myNum\n(\$R,\$G,\$B): $R\t$G\t$B\n";
      A lot of monks here are better at the math than I, but I can hold my own most of the time!

      For the future, ask specific questions that show your problem and when possible show the code that's demonstrating the problem. For your initial problem, you didn't have to worry about endianess, but you may have to consider it if your working with different architectures.

      Good Luck...Ed

      Regards...Ed

      "Well done is better than well said." - Benjamin Franklin

        Ed,

        This is the part I had working. Extracting uint16 is easy

        >> ( $R,$G,$B ) = unpack("nnn", $myNum );
        My existing code:
        @ushort=unpack("S*", $buf); # Extract oodles of UINT16s
        What I can't figure out is:
        @UINT48 = unpack("???????", $BUF)
        where each UINT48 is 48bits, 6 bytes, 1 contiguous chunk 75% as large as a (long long), 150% as long as a long, 4 byte 32 bit integer, the same size as the quanta I need.

        I need 1 contiguous 6byte REDGREENBLUE

        You are calling them "n"

        >> n An unsigned short (16-bit) in "network" (big-endian) order. I use 'S' <c>>>S An unsigned short value.<<
        I have verified that the S values agree with Photoshop color picker.

        And, I broke your masterpiece tinkering with the buffer size. I was trying various sizes from 4k to 32M to find the optimal size for my large 4TB hard drives and RAID. Interesting timing results, but some wrong answers without a 4096 pixel buffer! Dang!

        With a 500+ MB/sec SSD, there is no need to buffer. With spinning drives, I usually like to grab a cache-full then process while the drive does another read-ahead. I usually work on my SSD anyway.

        Sysread must always return exactly the same byte count for the same file or the sky is falling. The size of the chunks has nothing to do with it. I screwed up something else. I may just leave it at 4096 * 6 and be done with it. It's already darn fast.

        I think I like it as-is!

        This UINT48 is going to bug me until I figure it out. I may have to hack it into Perl myself!

        Thank you, Brian

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1138415]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-24 16:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found