Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: using bits to print part of a string (Bitops win by an order of magnitude)

by BrowserUk (Patriarch)
on Mar 16, 2013 at 20:52 UTC ( [id://1023850]=note: print w/replies, xml ) Need Help??


in reply to Re^2: using bits to print part of a string
in thread using bits to print part of a string

I finally got around to benchmarking. (Removing the nulls, left by the bitwise&, in-place using tr is the saving grace!):

#! perl -slw use strict; use Time::HiRes qw[ time ]; my @benches = ( sub { printf 'unpack: '; my $mask = shift; my $templ; while( $mask =~ /((.)\2*)/g ) { $templ .= (qw(x a))[$2] . length $1; } return sub { my $fh = shift; my $count = 0; my $out; $out = join'', unpack( $templ, $_ ), ++$count while <$fh>; $count; } }, sub { printf 'substr: '; my $mask = shift; my $templ; my @mask; while ( $mask =~ /0+/g ) { push @mask, [ $-[0], ( $+[0] - $-[0] ) ]; } return sub { my $fh = shift; my $count = 0; my $out; while( defined( $out = <$fh> ) ) { substr( $out, $mask[-$_][0], $mask[-$_][1],'' ) for 1 +.. @mask; ++$count; } $count; } }, sub { printf 'substrref: '; my $mask = shift; my $templ; my $buf = chr(0); $buf x= 400_000; my @refs; push @refs, \substr( $buf, $-[0], $+[0] - $-[0] ) wh +ile $mask =~ /0+/g; return sub { my $fh = shift; my $count = 0; my $out; while( <$fh> ) { substr( $buf, 0 ) = $_; $out = join'', map $$_, @refs; ++$count; } $count; } }, sub { printf "bitops: "; my $mask = shift; $mask =~ tr[01][\x00\xff]; return sub { my $fh = shift; my $count = 0; $_ &= $mask, tr[\x00][]d, ++$count while <$fh>; $count; } }, ); $|++; our $OPT //= 0; our $FLUSHFILE //= '10gb.csv'; our $TESTFILE //= '1023727.dat'; our $S //= 1; srand $S; my $mask = join '', map int( rand 2 ), 1 .. 400_000; open I, '<', $FLUSHFILE or die $!; 1 while <I>; close I; my $start = time; my $run = $benches[ $OPT ]->( $mask ); open I, '<', $TESTFILE or die $!; my $records = $run->( \*I ); close I; my $stop = time; printf "Took %f seconds for %u records (%f recs/second)\n", $stop - $start, $records, $records / ($stop - $start); __END__ C:\test>for /l %n in (0,1,3) do @1023727 -OPT=%n unpack: Took 164.702357 seconds for 2606 records (15.822482 recs/secon +d) substr: Took 2971.481218 seconds for 2606 records (0.877004 recs/secon +d) substrref: Took 154.501948 seconds for 2606 records (16.867101 recs/se +cond) bitops: Took 12.534998 seconds for 2606 records (207.897916 recs/secon +d)

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
div class=
  • Comment on Re^3: using bits to print part of a string (Bitops win by an order of magnitude)
  • Download Code

Replies are listed 'Best First'.
Re^4: using bits to print part of a string (Bitops win by an order of magnitude)
by McA (Priest) on Mar 17, 2013 at 00:17 UTC

    Hi,

    why have you thrown my regex solution out of the race? :) I've been interested to see it in your benchmark.

    McA

      sub { printf 'regex: '; my $mask = shift; my $re = '^' . join('', map $_ ? '(.)' : '.', split '', $mask +) . '$'; return sub { my $fh = shift; my $out; my $count = 0; $out = join( '', m[$re]o ), ++$count while <$fh>; $count; } }, __END__ C:\test>1023727 -OPT=4 regex: Took 274.157906 seconds for 2606 records (9.505471 recs/second)

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Ok, ok, ok, now I know why you kicked it out. ;-)

        Anyway, thank you for giving it a try. The only thing I've seen for improvment is to build the regex like some of the other solutions, so not every hit character is a regex pattern, but consecutive characters build one regex.

        Investigating this rises a point of criticism in your benchmark: You don't have one mask for all tests. And the way some solutions depend on the pattern of the mask has an influence on the performance, e.g. Think of the substrref solution with a pattern '1' x 400_000.

        Have a nice sunday
        McA

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1023850]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-04-24 19:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found