Re: using bits to print part of a string

Using bitwise operators won't work because te masked bytes will be null; which are still characters.

Faster that split would be to use unpack:

#! perl -slw
use strict;

print unpack 'x3a7x2a4x2a7', $_ while <DATA>;

__DATA__
0121012102??????????12121212????????
0111011102??????????12111112????????
0111011102??????????12111112????????
[download]

Produces:

C:\test>junk55
1012102??????12121
1011102??????12111
1011102??????12111
[download]

Possibly faster still would be to set up an array of substr refs into a single buffer:

#! perl -slw
use strict;

my $buf = chr(0) x 400_000;
my @refs = map {
    \substr $buf, $_->[0], $_->[1]
} [3,7],[12,4],[18,7];

while( <DATA> ) {
    substr( $buf, 0 ) = $_;
    print map $$_, @refs;
}
__DATA__
0121012102??????????12121212????????
0111011102??????????12111112????????
0111011102??????????12111112????????
[download]

Produces:

C:\test>junk55
1012102??????12121
1011102??????12111
1011102??????12111
[download]

You'll have to benchmark to see if whether the latter which was once faster on some earlier version of perl still is on yours.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re: using bits to print part of a string Select or Download Code

Replies are listed 'Best First'.
Re^2: using bits to print part of a string by choroba (Cardinal) on Mar 15, 2013 at 16:53 UTC
Anyone willing to improve the benchmark? Read more... (3 kB) Update: Added `regexwise`. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re^3: using bits to print part of a string by BrowserUk (Patriarch) on Mar 15, 2013 at 17:16 UTC
Anyone willing to improve the benchmark? They couldn't make it much worse :) Incorporating one-off set-ups -- or even a test for one-off setups -- into the benchmark subs is like incorporating the build-time of a car in its race time. If the substr code is meant to reflect my second option, you've completely misunderstood the purpose of the substr refs and assigning through a fixed scalar buffer. Its also traditional to post the results of a typical run. I may have a go at producing a more realistic benchmark later tonight. Key ingredients are that you must not exclude the IO, buffer and memory handling when benchmarking IO processing of large files. Yours excludes all of these. Hint: You cannot do IO filter bechmarks using the Benchmark module. The only realistic test is to time processing actual files that are big enough that they do not fit into the filecache. And you must ensure that the cache is flushed between runs. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: using bits to print part of a string by McA (Priest) on Mar 15, 2013 at 17:14 UTC
On one of my machines: `1..4 ok 1 - array ok 2 - substr ok 3 - pack ok 4 - regex Rate arraywise regexwise packwise bitwise substr +wise arraywise 38.9/s -- -75% -86% -98% - +100% regexwise 153/s 294% -- -43% -93% - +100% packwise 270/s 593% 76% -- -87% - +100% bitwise 2077/s 5239% 1255% 671% -- +-99% substrwise 187563/s 481985% 122274% 69480% 8930% + --` [download] I guessed that my solution is as slow as I am... ;-) McA	[reply] [d/l]
Re^3: using bits to print part of a string.(bug) by BrowserUk (Patriarch) on Mar 16, 2013 at 04:19 UTC
BTW: There is a problem with your substrwise test. The first time through, `$mask` is undefined, so you set up `@mask` and set `smask`. But on the second and subsequent times through, `$mask` is defined, so the non-state variable: `@mask` is left empty, so you don't do any actual work. That probably explains the surprising apparent efficiency of substrwise in the figures McA posted. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^4: using bits to print part of a string.(bug) by McA (Priest) on Mar 16, 2013 at 17:12 UTC
Aaaarrggghh, and I just started to change every regex substitution in my code to a combination of `index` and `substr` to get cutting edge performance... ;-) McA	[reply] [d/l] [select]
Re^5: using bits to print part of a string.(bug) by BrowserUk (Patriarch) on Mar 16, 2013 at 20:56 UTC
Re^4: using bits to print part of a string.(bug) by choroba (Cardinal) on Mar 17, 2013 at 08:05 UTC
Thanks for catching the problem. BTW, I had to change the condition in substrref to `while $mask =~ /1+/g` [download] to get the same results as from the other methods. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^3: using bits to print part of a string (Bitops win by an order of magnitude) by BrowserUk (Patriarch) on Mar 16, 2013 at 20:52 UTC
I finally got around to benchmarking. (Removing the nulls, left by the bitwise&, in-place using tr is the saving grace!): #! perl -slw use strict; use Time::HiRes qw[ time ]; my @benches = ( sub { printf 'unpack: '; my $mask = shift; my $templ; while( $mask =~ /((.)\2)/g ) { $templ .= (qw(x a))[$2] . length $1; } return sub { my $fh = shift; my $count = 0; my $out; $out = join'', unpack( $templ, $_ ), ++$count while <$fh>; $count; } }, sub { printf 'substr: '; my $mask = shift; my $templ; my @mask; while ( $mask =~ /0+/g ) { push @mask, [ $-[0], ( $+[0] - $-[0] ) ]; } return sub { my $fh = shift; my $count = 0; my $out; while( defined( $out = <$fh> ) ) { substr( $out, $mask[-$_][0], $mask[-$_][1],'' ) for 1 +.. @mask; ++$count; } $count; } }, sub { printf 'substrref: '; my $mask = shift; my $templ; my $buf = chr(0); $buf x= 400_000; my @refs; push @refs, \substr( $buf, $-[0], $+[0] - $-[0] ) wh +ile $mask =~ /0+/g; return sub { my $fh = shift; my $count = 0; my $out; while( <$fh> ) { substr( $buf, 0 ) = $_; $out = join'', map $$_, @refs; ++$count; } $count; } }, sub { printf "bitops: "; my $mask = shift; $mask =~ tr[01][\x00\xff]; return sub { my $fh = shift; my $count = 0; $_ &= $mask, tr[\x00][]d, ++$count while <$fh>; $count; } }, ); $\|++; our $OPT //= 0; our $FLUSHFILE //= '10gb.csv'; our $TESTFILE //= '1023727.dat'; our $S //= 1; srand $S; my $mask = join '', map int( rand 2 ), 1 .. 400_000; open I, '<', $FLUSHFILE or die $!; 1 while <I>; close I; my $start = time; my $run = $benches[ $OPT ]->( $mask ); open I, '<', $TESTFILE or die $!; my $records = $run->( \I ); close I; my $stop = time; printf "Took %f seconds for %u records (%f recs/second)\n", $stop - $start, $records, $records / ($stop - $start); __END__ C:\test>for /l %n in (0,1,3) do @1023727 -OPT=%n unpack: Took 164.702357 seconds for 2606 records (15.822482 recs/secon +d) substr: Took 2971.481218 seconds for 2606 records (0.877004 recs/secon +d) substrref: Took 154.501948 seconds for 2606 records (16.867101 recs/se +cond) bitops: Took 12.534998 seconds for 2606 records (207.897916 recs/secon +d) [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. div class=	[reply] [d/l]
Re^4: using bits to print part of a string (Bitops win by an order of magnitude) by McA (Priest) on Mar 17, 2013 at 00:17 UTC
Hi, why have you thrown my regex solution out of the race? :) I've been interested to see it in your benchmark. McA	[reply]
Re^5: using bits to print part of a string (Bitops win by an order of magnitude) by BrowserUk (Patriarch) on Mar 17, 2013 at 04:01 UTC
Re^6: using bits to print part of a string (Bitops win by an order of magnitude) by McA (Priest) on Mar 17, 2013 at 09:57 UTC
Some notes below your chosen depth have not been shown here
Re^2: using bits to print part of a string by Anonymous Monk on Mar 15, 2013 at 16:49 UTC
Thanks! So, could you unpack (har, har) the part where you do "x3a7x2a4x2a7", I am still not getting these templates.	[reply]
Re^3: using bits to print part of a string by BrowserUk (Patriarch) on Mar 15, 2013 at 16:52 UTC
"x3a7x2a4x2a7" Skip 3 bytes; grab seven bytes, skip 2 bytes; ... With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^4: using bits to print part of a string by Anonymous Monk on Mar 15, 2013 at 16:57 UTC
Nice! I get it!	[reply]


Perl-Sensitive Sunglasses
	PerlMonks