Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Just In! The progression of time vs keys follows a Power curve with an exponent of 2.07! https://picasaweb.google.com/lh/photo/nQNCVJCdTQzY5gL8_SPJFsXYSnR2ZXi9 +H2xA2v_qL9I?feat=directlink In a recent development here, Monk Ed suggested a very quick and slick, low level sysread() approach to hashing 48bit RGB data from large 217MB .RAW files. When passing the hash back to the calling function, the process stalled indefinitely but there was a great deal of memory usage in the background and one CPU saturated. Instrumenting the code, reading the file and hashing the distinct RGB values takes 23 seconds. Immediately after creating the hash of 27,645,898 unique, 48bit colors, extracting the keys to an array without sorting takes 28+ minutes. It takes 73 times longer to extract the keys from a hash than it takes to read the disk and build the entire hash. Here is the relationship between the keys in the hash and the time to unload them to an array without sorting (@ara = keys %hash) Keys -> minutes 27M -> 28.4 15M -> 8.7 9.6M -> 3.5 7.7M -> 1.3 667k -> .018 http://makepp.sourceforge.net/2.0/perl_performance.html says to Avoid hashes Perl becomes quite slow with many small hashes. This does NOT seem to apply here. 27 million items is tooooo long for a lookup table But then there is : http://www.developer.com/lang/perl/article.php/1554501/ Perl-Debunking-the-Speed-Myth.htm Tip 1: Hashes are fastest Use arrays instead of lists of individual variables when you can, and use hashes instead of arrays for lists when you can. Lists are better handled by hashes than arrays in Perl because the hash algorithms have already been optimized. GRISLY DETAILS: ======================================== CCS: Fsize 216913920, pix 36152320 ETime=28.648 min, =97.60%, Event 'CCS: Extract RGB2C keys' ETime=0.380 min, = 1.30%, Event 'CCS: Read_and_hash' ETime=0.325 min, = 1.11%, Event 'CCS: Write_RGBC' 97.60% -> CCS: Extract RGB2C keys 1.30% -> CCS: Read_and_hash 1.11% -> CCS: Write_RGBC Elapsed time = 29.58 min ------------------------- With an 18 MPixel file, file size 108,456,960 B, the key extract time goes to ~9 minutes CCS: Fsize 108456960, pix 18076160 Event 'CCS: Extract RGB2C keys' elapsed time = 8.737 min = 96.14%. Event 'CCS: Read_and_hash' elapsed time = 0.177 min = 1.95%. Event 'CCS: Write_RGBC' elapsed time = 0.174 min = 1.91%. 96.14% -> CCS: Extract RGB2C keys 1.95% -> CCS: Read_and_hash 1.91% -> CCS: Write_RGBC Elapsed time = 9.20 min -------------------------- For 12 M Pixels, time is 3.7 minutes after a hash time of 7 seconds CCS: Fsize 72,304,640, pix 12 MPixel Event 'CCS: Extract RGB2C keys' elapsed time = 3.477 min = 94.10%. Event 'CCS: Read_and_hash' elapsed time = 0.115 min = 3.12%. Event 'CCS: Write_RGBC' elapsed time = 0.103 min = 2.79%. 94.10% -> CCS: Extract RGB2C keys 3.12% -> CCS: Read_and_hash 2.79% -> CCS: Write_RGBC Elapsed time = 3.76 min -------------------------------------------- CCS: Fsize 60466176, pix 10077696 Event 'CCS: Extract RGB2C keys' elapsed time = 2.251 min = 92.94%. Event 'CCS: Read_and_hash' elapsed time = 0.089 min = 3.65%. Event 'CCS: Write_RGBC' elapsed time = 0.082 min = 3.40%. 92.94% -> CCS: Extract RGB2C keys 3.65% -> CCS: Read_and_hash 3.40% -> CCS: Write_RGBC Elapsed time = 2.47 min ----------------------------------------- At a file size of 1.7 M Pixels, and a 2.4 second run time, extractin +g the keys takes 40% longer than reading the file and generating the keys. CCS: Fsize 10077696, pix 1679616 Event 'CCS: Extract RGB2C keys' elapsed time = 0.018 min = 47.85%. Event 'CCS: Read_and_hash' elapsed time = 0.013 min = 34.38%. Event 'CCS: Write_RGBC' elapsed time = 0.007 min = 17.73%. 47.85% -> CCS: Extract RGB2C keys 34.38% -> CCS: Read_and_hash 17.73% -> CCS: Write_RGBC Elapsed time = 2.41 sec =================================================== Tiny, active code segment ... $sr_len = sysread(IN, $buf, $bsize); # SysRead Length last if $sr_len == 0; while($buf) { $rgb=substr($buf, 0, 6, ''); # Nibble 6 bytes $rgb2c{$rgb}++; } &time_event('CCS: Counting RGB hash keys', \%e2at... $cc=scalar keys %rgb2c; # Unique colors printf("CCS: $cc unique colors -> %4.3f%%\n", 100.0*$cc/$num_pix); &time_event('CCS: Extract RGB2C keys', \%e2at,... @rgb = keys %rgb2c; << 1 line takes 28.648 min &time_event('CCS: >open Output file', \%e2at, open(OUT, ">$ofile") or die... binmode OUT; &time_event('CCS: Write_RGBC', \%e2at,... $ii = -1; foreach $rgb (@rgb) { $ii++; $count = $rgb2c{$rgb}; print(OUT $rgb, ($c16 = pack('S', $count))); } &time_event('CCS: Close_RGBC', \%e2at, $debug*0); close OUT; printf("CCS: %d bytes written to fn '$ofile'\n", -s $ofile); ------------------------------------- I noticed memory usage of over 6GB for the Perl.exe process The C program I wrote earlier allocates 2 arrays of, 1 of 217MB for the RGB RAW file and another 33% larger to hold RGBC data with C being just a 4th uint16_t to hold count. It shows ~600MB in windows explorer (win 7/64) There are 32GB RAM with over half free at the time. Reading and writing to a 500+ MB/s SSD with 33 GB free. 513 GB free on one data drive. sysinternals pslist shows: Name Pid Pri Thd Hnd Priv CPU Time Elapsed Time perl 7216 8 1 145 4194303 0:27:19.632 0:27:23.062 pslist -m perl Name Pid VM WS Priv Priv Pk Faults NonP Page perl 7216 4194303 4194303 4194303 4194303 1227036269 88 152 Perl binary: I:\br3\pf.249465>perl -V Summary of my perl5 (revision 5 version 20 subversion 2) configuration +: Platform: osname=MSWin32, osvers=6.3, archname=MSWin32-x64-multi-thread uname='Win32 strawberry-perl 5.20.2.1 #1 Sat Feb 21 18:04:11 2015 +x64' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define use64bitint=define, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags =' -s -O2 -DWIN32 -DWIN64 -DCONSERVATIVE -DPERL +_TEXTMODE_ SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -fwra +pv -fno-st rict-aliasing -mms-bitfields', optimize='-s -O2', cppflags='-DWIN32' ccversion='', gccversion='4.8.3', gccosandvers='' intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1 +2 ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='lo +ng long', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='g++', ldflags ='-s -L"E:\STRAWB~1.PER\perl\lib\CORE" -L"E:\STR +AWB~1.PER\ c\lib"' libpth=E:\STRAWB~1.PER\c\lib E:\STRAWB~1.PER\c\x86_64-w64-mingw32\ +lib E:\STR AWB~1.PER\c\lib\gcc\x86_64-w64-mingw32\4.8.3 libs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 +-ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm + -lversion -lodbc32 -lodbccp32 -lcomctl32 perllibs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdl +g32 -ladva pi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lw +inmm -lver sion -lodbc32 -lodbccp32 -lcomctl32 libc=, so=dll, useshrplib=true, libperl=libperl520.a gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=xs.dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-mdll -s -L"E:\STRAWB~1.PER\perl\lib\CO +RE" -L"E:\ STRAWB~1.PER\c\lib"' Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES HAVE_INTERP_INTERN MULTIPLICITY PERLIO_LAYERS PERL_DONT_CREATE_GVSV PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS PERL_MALLOC_WRAP PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF Built under MSWin32 Compiled at Feb 21 2015 18:08:23 @INC: e:/strawberry.perl/perl/site/lib e:/strawberry.perl/perl/vendor/lib e:/strawberry.perl/perl/lib .

In reply to Perl Hash Performance Hits Brick Wall! by BrianP

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-25 10:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found