Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: using bits to print part of a string

by BrowserUk (Patriarch)
on Mar 15, 2013 at 16:09 UTC ( [id://1023731]=note: print w/replies, xml ) Need Help??


in reply to using bits to print part of a string

Using bitwise operators won't work because te masked bytes will be null; which are still characters.

Faster that split would be to use unpack:

#! perl -slw use strict; print unpack 'x3a7x2a4x2a7', $_ while <DATA>; __DATA__ 0121012102??????????12121212???????? 0111011102??????????12111112???????? 0111011102??????????12111112????????

Produces:

C:\test>junk55 1012102??????12121 1011102??????12111 1011102??????12111

Possibly faster still would be to set up an array of substr refs into a single buffer:

#! perl -slw use strict; my $buf = chr(0) x 400_000; my @refs = map { \substr $buf, $_->[0], $_->[1] } [3,7],[12,4],[18,7]; while( <DATA> ) { substr( $buf, 0 ) = $_; print map $$_, @refs; } __DATA__ 0121012102??????????12121212???????? 0111011102??????????12111112???????? 0111011102??????????12111112????????

Produces:

C:\test>junk55 1012102??????12121 1011102??????12111 1011102??????12111

You'll have to benchmark to see if whether the latter which was once faster on some earlier version of perl still is on yours.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: using bits to print part of a string
by choroba (Cardinal) on Mar 15, 2013 at 16:53 UTC
    Anyone willing to improve the benchmark?

    Update: Added regexwise.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Anyone willing to improve the benchmark?

      They couldn't make it much worse :)

      Incorporating one-off set-ups -- or even a test for one-off setups -- into the benchmark subs is like incorporating the build-time of a car in its race time.

      If the substr code is meant to reflect my second option, you've completely misunderstood the purpose of the substr refs and assigning through a fixed scalar buffer.

      Its also traditional to post the results of a typical run.

      I may have a go at producing a more realistic benchmark later tonight. Key ingredients are that you must not exclude the IO, buffer and memory handling when benchmarking IO processing of large files. Yours excludes all of these.

      Hint: You cannot do IO filter bechmarks using the Benchmark module. The only realistic test is to time processing actual files that are big enough that they do not fit into the filecache. And you must ensure that the cache is flushed between runs.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      On one of my machines:

      1..4 ok 1 - array ok 2 - substr ok 3 - pack ok 4 - regex Rate arraywise regexwise packwise bitwise substr +wise arraywise 38.9/s -- -75% -86% -98% - +100% regexwise 153/s 294% -- -43% -93% - +100% packwise 270/s 593% 76% -- -87% - +100% bitwise 2077/s 5239% 1255% 671% -- +-99% substrwise 187563/s 481985% 122274% 69480% 8930% + --
      I guessed that my solution is as slow as I am... ;-)

      McA

      BTW: There is a problem with your substrwise test.

      The first time through, $mask is undefined, so you set up @mask and set smask.

      But on the second and subsequent times through, $mask is defined, so the non-state variable: @mask is left empty, so you don't do any actual work.

      That probably explains the surprising apparent efficiency of substrwise in the figures McA posted.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Aaaarrggghh, and I just started to change every regex substitution in my code to a combination of index and substr to get cutting edge performance... ;-)

        McA

        Thanks for catching the problem.

        BTW, I had to change the condition in substrref to

        while $mask =~ /1+/g
        to get the same results as from the other methods.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      I finally got around to benchmarking. (Removing the nulls, left by the bitwise&, in-place using tr is the saving grace!):

      #! perl -slw use strict; use Time::HiRes qw[ time ]; my @benches = ( sub { printf 'unpack: '; my $mask = shift; my $templ; while( $mask =~ /((.)\2*)/g ) { $templ .= (qw(x a))[$2] . length $1; } return sub { my $fh = shift; my $count = 0; my $out; $out = join'', unpack( $templ, $_ ), ++$count while <$fh>; $count; } }, sub { printf 'substr: '; my $mask = shift; my $templ; my @mask; while ( $mask =~ /0+/g ) { push @mask, [ $-[0], ( $+[0] - $-[0] ) ]; } return sub { my $fh = shift; my $count = 0; my $out; while( defined( $out = <$fh> ) ) { substr( $out, $mask[-$_][0], $mask[-$_][1],'' ) for 1 +.. @mask; ++$count; } $count; } }, sub { printf 'substrref: '; my $mask = shift; my $templ; my $buf = chr(0); $buf x= 400_000; my @refs; push @refs, \substr( $buf, $-[0], $+[0] - $-[0] ) wh +ile $mask =~ /0+/g; return sub { my $fh = shift; my $count = 0; my $out; while( <$fh> ) { substr( $buf, 0 ) = $_; $out = join'', map $$_, @refs; ++$count; } $count; } }, sub { printf "bitops: "; my $mask = shift; $mask =~ tr[01][\x00\xff]; return sub { my $fh = shift; my $count = 0; $_ &= $mask, tr[\x00][]d, ++$count while <$fh>; $count; } }, ); $|++; our $OPT //= 0; our $FLUSHFILE //= '10gb.csv'; our $TESTFILE //= '1023727.dat'; our $S //= 1; srand $S; my $mask = join '', map int( rand 2 ), 1 .. 400_000; open I, '<', $FLUSHFILE or die $!; 1 while <I>; close I; my $start = time; my $run = $benches[ $OPT ]->( $mask ); open I, '<', $TESTFILE or die $!; my $records = $run->( \*I ); close I; my $stop = time; printf "Took %f seconds for %u records (%f recs/second)\n", $stop - $start, $records, $records / ($stop - $start); __END__ C:\test>for /l %n in (0,1,3) do @1023727 -OPT=%n unpack: Took 164.702357 seconds for 2606 records (15.822482 recs/secon +d) substr: Took 2971.481218 seconds for 2606 records (0.877004 recs/secon +d) substrref: Took 154.501948 seconds for 2606 records (16.867101 recs/se +cond) bitops: Took 12.534998 seconds for 2606 records (207.897916 recs/secon +d)

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      div class=

        Hi,

        why have you thrown my regex solution out of the race? :) I've been interested to see it in your benchmark.

        McA

Re^2: using bits to print part of a string
by Anonymous Monk on Mar 15, 2013 at 16:49 UTC
    Thanks! So, could you unpack (har, har) the part where you do "x3a7x2a4x2a7", I am still not getting these templates.
      "x3a7x2a4x2a7"

      Skip 3 bytes; grab seven bytes, skip 2 bytes; ...


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Nice! I get it!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1023731]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-19 08:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found