Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw


( #480=superdoc: print w/replies, xml ) Need Help??

If you've discovered something amazing about Perl that you just need to share with everyone, this is the right place.

This section is also used for non-question discussions about Perl, and for any discussions that are not specifically programming related. For example, if you want to share or discuss opinions on hacker culture, the job market, or Perl 6 development, this is the place. (Note, however, that discussions about the PerlMonks web site belong in PerlMonks Discussion.)

Meditations is sometimes used as a sounding-board — a place to post initial drafts of perl tutorials, code modules, book reviews, articles, quizzes, etc. — so that the author can benefit from the collective insight of the monks before publishing the finished item to its proper place (be it Tutorials, Cool Uses for Perl, Reviews, or whatever). If you do this, it is generally considered appropriate to prefix your node title with "RFC:" (for "request for comments").

User Meditations
lexical vs. local file handles
3 direct replies — Read more / Contribute
by jo37
on Apr 18, 2020 at 14:26

    I use used bareword filehandles. In the course of this discussion I was convinced not to do so.

    I've seen many Perl programmers advocating the usage of lexical scalars instead. Though there are good reasons for this, it has a drawback. Filehandles in Perl are special on a syntactical level. The compiler is capable of catching errors like this:

    open FH, '<', 'myfile'; print FH, "something\n";

    by emitting an error:

    No comma allowed after filehandle at ...

    Whereas strange things happen at runtime if you use:

    open my $fh, '<', 'myfile'; print $fh, "something\n";
    NB: Someone else pointed to this subtle difference but sadly I don't remember who and where it was. She/he should be credited here.

    So I stay with barewords and try to avoid the problems caused by the usage of global variables using this idiom:

    open local *FH, '<', 'myfile'; while (<FH>) { # do something } close FH;

    Some features:

    • usages of equally named file handles do not affect each other in different scopes
    • the usage in a sub does no harm if the sub is in the same package and it uses the same idiom
    • the usage in a sub of a different package does not harm.
    So, as long as I follow this pattern im my own package, I feel kind of safe. Side effects of localizing a glob are easily circumvented.

    The question remains: Do I miss something here? Do you see any pitfalls using this approach?

    Here is an example demonstrating the issues I'm aware of.

    #!/usr/bin/perl use Test2::V0; package Foobar; our $foobar = "foobar\n"; our @foobar; sub use_fh { # unlocalized use of FH in separate package open FH, '<', \$foobar; @foobar = <FH>; close FH; } package main; my $foo = "foo\n"; my $bar = "bar\n"; my $baz = "baz\n"; my @baz; sub use_localized_fh { # protect caller's FH open local *FH, '<', \$baz; @baz = <FH>; close FH; } sub close_fh { # unlocalized use of FH close FH; } # open now, use later open FH, '<', \$bar; my @foo; # create new scope { # use localized FH, protecting handle opened on \$bar open local *FH, '<', \$foo; # call sub that uses localized FH use_localized_fh; # call sub in other package that uses FH Foobar::use_fh; # FH still intact @foo = <FH>; close FH; } is \@baz, [$baz], 'got $baz in sub'; is \@Foobar::foobar, [$Foobar::foobar], 'got $Foobar::foobar in foreig +n sub'; is \@foo, [$foo], 'good: got $foo'; { open local *FH, '<', \$foo; # call sub that closes FH close_fh; @foo = <FH>; close FH; } is \@foo, [], 'bad: FH was closed in sub'; # FH at this scope is still untouched my @bar = <FH>; close FH; is \@bar, [$bar], 'good: got $bar'; done_testing;


MCE::Flow + Caching via File::Map
1 direct reply — Read more / Contribute
by marioroy
on Apr 14, 2020 at 21:00

    Dearest Monks of the Monastery,

    Recently, I tried computing the longest Collatz progression here. I was pleasantly surprised by File::Map's performance. Our fellow monk Laurent_R posted an update to his original code for computing the Collatz sequences. And what a speedup it is.

    Here, I want to try Laurent's code and run parallel. Yes, with caching. The first thing I do typically is apply optimizations to the serial implementation. Because you know, just think of any domino impact running parallel might have. See my update to Laurent's code. That went well and so will take there and use File::Map here. This is exciting for me because this is a great use case for File::Map for running the algorithm in parallel. But with all things, a serial version using File::Map is needed for comparison.

    Update 1: Map using map_anonymous, previously map_file.
    Update 2: Use 16-bit signed integer for pack/unpack.

    Note: The OS must have ~ 3.8 GiB of available memory to compute size 1e9.


    #!/usr/bin/env perl use strict; use warnings; use File::Map qw/map_anonymous unmap/; my $size = shift || 1e6; $size = 1e6 if $size < 1e6; # minimum $size = 1e9 if $size > 1e9; # maximum map_anonymous my $cache, $size * 2, 'shared'; # init cache with -1's, then set 0, 1, 2 substr($cache, 0, $size * 2, ( my $neg1 = pack('s', -1) ) x $size); substr($cache, $_ * 2, 2, pack('s', $_)) for 0..2; my @seqs; sub collatz_seq { my ( $seq_beg, $seq_end ) = @_; my ( $n, $steps, $tmp ); for my $input ( $seq_beg..$seq_end ) { $n = $input, $steps = 0; while ( $n != 1 ) { $steps += unpack('s', $tmp), last if ($n < $size && ($tmp = substr($cache, $n * 2, 2)) n +e $neg1); $n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 ) : ( $steps += 1, $n = $n >> 1 ); } substr($cache, $input * 2, 2, pack('s', $steps)) if $input < $ +size; push @seqs, [ $input, $steps ] if $steps > 400; } } collatz_seq(2, $size); unmap $cache; @seqs = ( sort { $b->[1] <=> $a->[1]} @seqs )[ 0..19 ]; printf "Collatz(%5d) has sequence length of %3d steps\n", @$_ for @seqs;


    This is the serial implementation converted to run parallel. The collatz_seq function is identical, no changes there.

    #!/usr/bin/env perl use strict; use warnings; use File::Map qw/map_anonymous unmap/; use MCE::Flow; use MCE::Candy; my $size = shift || 1e6; $size = 1e6 if $size < 1e6; # minimum $size = 1e9 if $size > 1e9; # maximum map_anonymous my $cache, $size * 2, 'shared'; # init cache with -1's, then set 0, 1, 2 substr($cache, 0, $size * 2, ( my $neg1 = pack('s', -1) ) x $size); substr($cache, $_ * 2, 2, pack('s', $_)) for 0..2; # local to workers and the manager process my @seqs; sub collatz_seq { my ( $seq_beg, $seq_end ) = @_; my ( $n, $steps, $tmp ); for my $input ( $seq_beg..$seq_end ) { $n = $input, $steps = 0; while ( $n != 1 ) { $steps += unpack('s', $tmp), last if ($n < $size && ($tmp = substr($cache, $n * 2, 2)) n +e $neg1); $n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 ) : ( $steps += 1, $n = $n >> 1 ); } substr($cache, $input * 2, 2, pack('s', $steps)) if $input < $ +size; push @seqs, [ $input, $steps ] if $steps > 400; } } my $chunk_size; $chunk_size = int( $size / MCE::Util::get_ncpu() / 80 + 1 ); $chunk_size += 1 if $chunk_size % 2; mce_flow_s { max_workers => MCE::Util::get_ncpu(), chunk_size => $chunk_size, bounds_only => 1, gather => MCE::Candy::out_iter_array(\@seqs), }, sub { my ($mce, $chunk_ref, $chunk_id) = @_; collatz_seq(@{ $chunk_ref }); @seqs > 20 ? MCE->gather($chunk_id, ( sort { $b->[1] <=> $a->[1] } @seqs +)[ 0..19 ]) : MCE->gather($chunk_id, @seqs); @seqs = (); }, 2, $size; MCE::Flow->finish; unmap $cache; @seqs = ( sort { $b->[1] <=> $a->[1]} @seqs )[ 0..19 ]; printf "Collatz(%5d) has sequence length of %3d steps\n", @$_ for @seqs;


    Caching using File::Map obviously will have overhead plus having to serialize/unserialize using pack/unpack.

    $ time perl 1e7 AMD 3970x, Docker Container, Ubuntu + Perlbrew Perl 5.30.1 1e7 13.130s (a) original, accepts argument 1e7 12.394s (b) a + replaced division with >> 1 1e7 12.261s (c) b + removed 1 level of branching 1e7 9.170s (d) c + reduced loop iterations 1e7 7.681s (e) d + less caching collatz3_filemap 8.889s 1 core collatz3_parallel 8.982s 1 core collatz3_parallel 4.548s 2 cores collatz3_parallel 2.378s 4 cores collatz3_parallel 1.233s 8 cores collatz3_parallel 0.661s 16 cores collatz3_parallel 0.408s 32 cores Collatz(8400511) has sequence length of 686 steps Collatz(8865705) has sequence length of 668 steps Collatz(6649279) has sequence length of 665 steps Collatz(9973919) has sequence length of 663 steps Collatz(6674175) has sequence length of 621 steps Collatz(7332399) has sequence length of 616 steps Collatz(7532665) has sequence length of 616 steps Collatz(5649499) has sequence length of 613 steps Collatz(8474249) has sequence length of 611 steps Collatz(6355687) has sequence length of 608 steps Collatz(8847225) has sequence length of 606 steps Collatz(9533531) has sequence length of 606 steps Collatz(6635419) has sequence length of 603 steps Collatz(9953129) has sequence length of 601 steps Collatz(7464846) has sequence length of 598 steps Collatz(7464847) has sequence length of 598 steps Collatz(3732423) has sequence length of 597 steps Collatz(5598635) has sequence length of 595 steps Collatz(8397953) has sequence length of 593 steps Collatz(6298465) has sequence length of 590 steps

    Some will say, let's add cores. Some will say, let's improve the algorithm. Few might say, let's try both. It turns out that caching and parallel are possible. It's unbelievable, TBH. Processors are equipped with many CPU cores. I made the time to try and retry. Mainly, for future Monks to the Monastery, way after I'm gone. Years ago the saying was, "IO and Parallel" isn't possible. Input IO in MCE is sequential, not random.

    What I have witnessed tonight is that Meta::Cpan is a treasure box. In other words, a big gigantic box of Legos. I opened the box and picked out File::Map, MCE::Flow and then went over to this wonderful Monastery. There I looked for Laurent_R's code.

    I tried this not knowing what to expect. This is my first time using File::Map with many workers.

    Regards, Mario

Shared DBI handle supporting threads and processes
5 direct replies — Read more / Contribute
by marioroy
on Apr 12, 2020 at 21:58

    Dearest Monks,

    Have you ever wanted to share a DBI handle?

    Let's imagine an environment that consists of 200 compute blades. Each blade has 10 CPU cores, 20 logical cores total with hyperthreading/SMT enabled. That might be a lot of DB connections 4,000 (200 x 20) and IMHO not graceful at all. Better yet, imagine an environment with 400 compute blades. These are the new generations having 32 cores (64 logical cores with SMT enabled). Certainly ( 400 x 64 = 25,600 cores ) may be too much for the DB to handle.

    Fortunately, there is a way. One DB connection per blade, no matter the number of CPU cores, is possible with Perl and MCE::Shared. The code that follows is based on my reply to 1nickt's elegant MCE demonstration.

    First attempt

    Creating a shared DBI handle is not a problem. Unfortunately, it does not work with STMT objects failing due to the STMT object looking and saying, wait a minute, this is not a DBI object. Ah...

    my $dbh = MCE::Shared->share({ module => 'DBI', new => 'connect' }, $d +sn, $user, $password, $params );

    Second attempt

    Another way is writing a wrapper class with the things you need. Think of MCE::Shared as a proxy server. It does nothing more than passing the method name you want to call and arguments over to the shared-manager process. Likewise, returning data on the way back.

    Update: Added missing 'do' method to the shared class. Also, updating a record.


    create table mytable( field1 integer, field2 varchar(24), field3 varchar(24), field4 varchar(24), field5 varchar(24) );


    use strict; use warnings; use Data::GUID; use DBD::Pg; use SQL::Abstract; use Tie::Cycle; use MCE::Loop max_workers => 4; use MCE::Shared; my $sqla = SQL::Abstract->new; my @cols = map {"field$_"} 1..5; # +.html my $ins_sql = $sqla->insert('mytable', { map { $_ => '' } @cols }); my $sel_sql = $sqla->select('mytable', 'count(*)', { field2 => '' }); my $upd_sql = $sqla->update('mytable', { field2 => '' }, { field2 => ' +' }); #--------------------------------------------------------------------# package My::DBI { use DBI; sub new { my ( $class, $dsn, $user, $password, $params ) = @_; my $self = {}; # MCE::Shared will emit the error and exit if fail to connect $self->{DBH} = DBI->connect($dsn, $user, $password, $params); $self->{STMT} = {}; bless $self, $class; } sub prepare_cached { my ( $self, $key, $sql ) = @_; $self->{STMT}{$key} = $self->{DBH}->prepare_cached($sql); 1; } sub do { my $self = shift; $self->{DBH}->do(@_); } sub execute { my ( $self, $key ) = ( shift, shift ); if ( my $stmt = $self->{STMT}{$key} ) { $stmt->execute(@_); } } sub fetchrow_array { my ( $self, $key ) = ( shift, shift ); if ( my $stmt = $self->{STMT}{$key} ) { $stmt->execute(@_); $stmt->fetchrow_array; } } sub finish { my ( $self, $key ) = @_; if ( $key ) { $self->{STMT}{$key}->finish if $self->{STMT}{$key}; } else { $self->{STMT}{$_}->finish for keys %{ $self->{STMT} }; } return 1; } sub disconnect { my ( $self ) = @_; $self->finish; $self->{DBH}->disconnect; 1; } }; #--------------------------------------------------------------------# my $dsn = 'DBI:Pg:dbname=test_db;host=localhost;port=5432'; my $sdb = MCE::Shared->share( { module => 'My::DBI' }, $dsn, $ENV{USER}, undef, { AutoCommit => 1, RaiseError => 1, PrintError => 1 }, ); $sdb->prepare_cached('ins_sth', $ins_sql); $sdb->prepare_cached('sel_sth', $sel_sql); $sdb->prepare_cached('upd_sth', $upd_sql); mce_loop { my ($mce, $chunk, $chunk_id) = @_; for my $record( @{$chunk} ) { $sdb->execute('ins_sth', @{$record}); my $field2_old = $record->[1]; my $field2_new1 = Data::GUID->new->as_base64; my $field2_new2 = Data::GUID->new->as_base64; # update using a prepared statement $sdb->execute('upd_sth', $field2_new1, $field2_old); # update using the dbh handle inside the shared class my ($query, @bind) = $sqla->update( 'mytable', { field2 => $field2_new2 }, { field2 => $field2_new1 }, ); $sdb->do($query, undef, @bind); # pass any arguments for execute inside the shared class my ($count) = $sdb->fetchrow_array('sel_sth', $field2_new2); # count is 1 due to selecting field2 = $field2_new2 my $msg = sprintf 'wid %s; chnk %s; ins %s; cnt %s', $mce->wid, $chunk_id, $record->[0], $count; MCE->say($msg); } } get_sample_data(); # ^^ do not pass @{ get_sample_data() } to mce_loop # it will not work if @{ [ has 1 element ] } # pass the array ref instead, MCE accepts it MCE::Loop->finish; $sdb->disconnect; #--------------------------------------------------------------------# sub get_sample_data { tie my $value1, 'Tie::Cycle', [ 40 .. 49 ]; return [ map { [ $value1, map { Data::GUID->new->as_base64 } 0..3] } 1..1000 ]; }

    Add to the My::DBI class any DBI/STMT methods that your application uses. The code is straight forward I hope. The fetchrow_array is typically preceded with an execute. So the method in the shared class handles both execute and fetchrow_array. This is important. Likewise, be sure to pass the execute arguments when calling fetchrow_array in the application.

    Well, the wrapper class works very well. The number of CPU cores keeps increasing every couple of years. Meaning that new problems emerge and so do possibilities.

    Kind regards, Mario

Longest Collatz using MCE::Flow, Inline::C, and GCC compiler intrinsics
1 direct reply — Read more / Contribute
by marioroy
on Apr 11, 2020 at 22:52

    Dearest Monks,

    My mind has been amused with Collatz conjecture. See 1nickt's post about obtaining the top 20 sequences. Below is code for obtaining the longest progression. Here I try a GCC compiler intrinsic to further increase performance. That went well and so updated my prior post adding collatz_count_c2 there.

    use strict; use warnings; use feature 'say'; use MCE::Flow; use Inline C => Config => CCFLAGSEX => '-O2 -fomit-frame-pointer', clean_after_build => 0; use Inline C => <<'END_OF_C_CODE'; #include <stdlib.h> #include <stdint.h> #if defined(_WIN32) #define strtoull _strtoui64 #endif void collatz_longest_c1( SV* _beg_n, SV* _end_n ) { uint64_t beg_n, end_n, i, n, steps; uint64_t number = 0, highest = 0; Inline_Stack_Vars; #ifdef __LP64__ beg_n = SvUV( _beg_n ); end_n = SvUV( _end_n ); #else beg_n = strtoull( SvPV_nolen( _beg_n ), NULL, 10 ); end_n = strtoull( SvPV_nolen( _end_n ), NULL, 10 ); #endif for ( i = end_n; i >= beg_n; i-- ) { n = i, steps = 0; /* count using the T(x) notation */ do { n % 2 ? ( steps += 2, n = (3 * n + 1) >> 1 ) : ( steps += 1, n = n >> 1 ); } while ( n != 1 ); if ( steps >= highest ) { number = i, highest = steps; } } Inline_Stack_Reset; Inline_Stack_Push( sv_2mortal( newSVuv(number ) ) ); Inline_Stack_Push( sv_2mortal( newSVuv(highest) ) ); Inline_Stack_Done; } void collatz_longest_c2( SV* _beg_n, SV* _end_n ) { uint64_t beg_n, end_n, i, n, steps; uint64_t number = 0, highest = 0; Inline_Stack_Vars; #ifdef __LP64__ beg_n = SvUV( _beg_n ); end_n = SvUV( _end_n ); #else beg_n = strtoull( SvPV_nolen( _beg_n ), NULL, 10 ); end_n = strtoull( SvPV_nolen( _end_n ), NULL, 10 ); #endif /* based on GCC compiler intrinsics demonstration by Alex Shirley +*/ /* +re-optimization */ /* +nclz-t29238 */ for ( i = beg_n; i <= end_n; i++ ) { n = i, steps = 0; if ( n % 2 == 0 ) { steps += __builtin_ctz(n); /* account for all evens */ n >>= __builtin_ctz(n); /* always returns an odd */ } /* when we enter we're always working on an odd number */ do { n = 3 * n + 1; steps += __builtin_ctz(n) + 1; /* account for odd and even + */ n >>= __builtin_ctz(n); /* always returns an odd */ } while ( n != 1 ); if ( steps > highest ) { number = i, highest = steps; } } Inline_Stack_Reset; Inline_Stack_Push( sv_2mortal( newSVuv(number ) ) ); Inline_Stack_Push( sv_2mortal( newSVuv(highest) ) ); Inline_Stack_Done; } END_OF_C_CODE sub collatz_longest { my ( $beg_n, $end_n ) = @_; my ( $number, $highest ) = ( 0, 0 ); my ( $i, $n, $steps ); for ( $i = $end_n; $i >= $beg_n; $i-- ) { $n = $i, $steps = 0; # count using the T(x) notation $n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 ) : ( $steps += 1, $n = $n >> 1 ) while $n != 1; $number = $i, $highest = $steps if ( $steps >= $highest ); } return ( $number, $highest ); } no warnings 'once'; #*collatz = \&collatz_longest; # choose collatz here #*collatz = \&collatz_longest_c1; # using T(x) notation *collatz = \&collatz_longest_c2; # using compiler intrinsics my $m = shift || '1e7'; my ( @sizes, $chunk_size ); $chunk_size = int( $m / MCE::Util::get_ncpu() / 80 + 1 ); $chunk_size += 1 if $chunk_size % 2; mce_flow_s { max_workers => MCE::Util::get_ncpu(), chunk_size => $chunk_size, gather => \@sizes, bounds_only => 1, }, sub { MCE->gather([ collatz( @{ $_[1] } ) ]); }, 1, $m; MCE::Flow->finish; # Output the longest progression for the initial starting number. # my $highest = ( sort { $b->[1] <=> $a->[1] } @sizes )[ 0 ]->[ 1 ]; say "Longest Collatz (index and value)"; say "@$_" for ( sort { $a->[0] <=> $b->[0] } grep { $_->[1] == $highest } @sizes )[ 0..0 ];


    The times include launching Perl, loading modules, spawning workers, reaping workers, and output (~ 0.100 seconds).

    This outputs the longest progression for the number of steps to reach 1.
    These numbers are the lowest ones with the indicated step count.

    1e7 : 8400511 685 1 core collatz_longest 1m16.034s collatz_longest_c1 0m01.868s collatz_longest_c2 0m00.778s 2 cores collatz_longest 0m37.912s collatz_longest_c1 0m00.965s collatz_longest_c2 0m00.422s 4 cores collatz_longest 0m19.799s collatz_longest_c1 0m00.516s collatz_longest_c2 0m00.239s 8 cores collatz_longest 0m10.042s collatz_longest_c1 0m00.285s collatz_longest_c2 0m00.147s 16 cores collatz_longest 0m05.196s collatz_longest_c1 0m00.178s collatz_longest_c2 0m00.109s 32 cores collatz_longest 0m02.717s collatz_longest_c1 0m00.137s collatz_longest_c2 0m00.105s collatz_longest_c1 (Inline C), collatz_longest (Perl) 32 cores 1e8 : 63728127 949 Inline C 0m00.738s Perl 0m30.554s 1e9 : 670617279 986 Inline C 0m07.198s Perl 5m51.938s 1e10 : 9780657630 1132 Inline C 1m17.059s 1e11 : 75128138247 1228 Inline C 13m51.122s collatz_longest_c2 (Inline C) 32 cores 1e8 : 63728127 949 Inline C 0m00.340s 1e9 : 670617279 986 Inline C 0m03.023s 1e10 : 9780657630 1132 Inline C 0m33.152s 1e11 : 75128138247 1228 Inline C 6m10.355s

    I can now sleep knowing that MCE can handle this. Just be sure to use sequence generation in MCE (i.e. mce_flow_s) with the bounds_only option.

    Regards, Mario

RFC: Perl<->JSON<->YAML<->Dumper : roundtripping and possibly with unicode
4 direct replies — Read more / Contribute
by bliako
on Apr 09, 2020 at 10:55

    Here is a collection of subroutines for converting between:

    • Perl variables (nested data structures),
    • JSON strings,
    • YAML strings,
    • Data::Dumper output strings

    I really needed one just recently after my older implementation broke because of unicode content and Data::Dumper's obsession with escaping unicode. And here is what I have whipped up for my own use and anyone else's after posting Convert JSON to Perl and back with unicode and getting pointers from haukex, kcott, an anonymous monk and Corion who solved (hopefully for eternity) how to make Data::Dumper unicode escaping optional,(see Corion's answer Re: Convert JSON to Perl and back with unicode).

    Because I don't leave a challenge unchallenged here is the code, in the hope to be released as a module with your comments and suggestions.

    And here is a test script which demonstrates usage and tests unicoded content:

    #!perl -T
    use 5.006;
    use strict;
    use warnings;
    use utf8;
    binmode STDERR, ':encoding(UTF-8)';
    binmode STDOUT, ':encoding(UTF-8)';
    binmode STDIN,  ':encoding(UTF-8)';
    # to avoid wide character in TAP output
    # do this before loading Test* modules
    use open ':std', ':encoding(utf8)';
    use Test::More;
    #use Test::Deep;
    my $num_tests = 0;
    use Data::Roundtrip;
    use Data::Dumper qw/Dumper/;
    my $abc = "abc-αβγ";
    my $xyz = "χψζ-xyz";
    my $json_string = <<EOS;
    $json_string =~ s/\s*$//;
    my $yaml_string = <<EOS;
    $abc: $xyz
    #$yaml_string =~ s/\s*$//;
    my $perl_var = {$abc => $xyz};
    # perl2json
    my $result = Data::Roundtrip::perl2json($perl_var);
    ok(defined $result, "perl2json() called."); $num_tests++;
    ok($result eq $json_string, "perl2json() checked (got: '$result', expected: '$json_string')."); $num_tests++;
    # json2perl
    $result = Data::Roundtrip::json2perl($json_string);
    ok(defined $result, "json2perl() called."); $num_tests++;
    for (keys %$result){
    	ok(exists $perl_var->{$_}, "json2perl() key exists."); $num_tests++;
    	ok($perl_var->{$_} eq $result->{$_}, "json2perl() values are the same."); $num_tests++;
    for (keys %$perl_var){
    	ok(exists $result->{$_}, "json2perl() key exists (other way round)."); $num_tests++;
    # this fails:
    #cmp_deeply($perl_var, $result, "json2perl() checked (got: '".Dumper($result)."', expected: ".Dumper($perl_var).")."); $num_tests++;
    # perl2yaml
    $result = Data::Roundtrip::perl2yaml($perl_var);
    ok(defined $result, "perl2yaml() called."); $num_tests++;
    ok($result eq $yaml_string, "perl2yaml() checked (got: '$result', expected: '$yaml_string')."); $num_tests++;
    # yaml2perl
    $result = Data::Roundtrip::yaml2perl($yaml_string);
    ok(defined $result, "yaml2perl() called."); $num_tests++;
    for (keys %$result){
    	ok(exists $perl_var->{$_}, "yaml2perl() key exists."); $num_tests++;
    	ok($perl_var->{$_} eq $result->{$_}, "yaml2perl() values are the same."); $num_tests++;
    for (keys %$perl_var){
    	ok(exists $result->{$_}, "yaml2perl() key exists (other way round)."); $num_tests++;
    # yaml2json
    $result = Data::Roundtrip::yaml2json($yaml_string);
    ok(defined $result, "yaml2json() called."); $num_tests++;
    ok($result eq $json_string, "perl2yaml() checked (got: '$result', expected: '$json_string')."); $num_tests++;
    # json2yaml
    $result = Data::Roundtrip::json2yaml($json_string);
    ok(defined $result, "json2yaml() called."); $num_tests++;
    ok($result eq $yaml_string, "perl2yaml() checked (got: '$result', expected: '$yaml_string')."); $num_tests++;
    # perl2dump and dump2perl with unicode quoting (default Data::Dumper behaviour)
    my $adump = Data::Roundtrip::perl2dump($perl_var, {'terse'=>1});
    ok(defined $adump, "perl2dump() called."); $num_tests++;
    ok($adump=~/\\x\{3b1\}/, "perl2dump() unicode quoted."); $num_tests++;
    # dump2perl
    $result = Data::Roundtrip::dump2perl($adump);
    ok(defined $result, "dump2perl() called."); $num_tests++;
    for (keys %$result){
    	ok(exists $perl_var->{$_}, "perl2dump() and dump2perl() key exists."); $num_tests++;
    	ok($perl_var->{$_} eq $result->{$_}, "perl2dump() and dump2perl() values are the same."); $num_tests++;
    for (keys %$perl_var){
    	ok(exists $result->{$_}, "perl2dump() and dump2perl() key exists (other way round)."); $num_tests++;
    # perl2dump and dump2perl WITHOUT unicode quoting
    $adump = Data::Roundtrip::perl2dump($perl_var, {'terse'=>1, 'dont-bloody-escape-unicode'=>1});
    ok(defined $adump, "perl2dump() called."); $num_tests++;
    ok($adump!~/\\x\{3b1\}/, "perl2dump() unicode not quoted."); $num_tests++;
    # dump2perl
    $result = Data::Roundtrip::dump2perl($adump);
    ok(defined $result, "dump2perl() called."); $num_tests++;
    for (keys %$result){
    	ok(exists $perl_var->{$_}, "perl2dump() and dump2perl() key exists."); $num_tests++;
    	ok($perl_var->{$_} eq $result->{$_}, "perl2dump() and dump2perl() values are the same."); $num_tests++;
    for (keys %$perl_var){
    	ok(exists $result->{$_}, "perl2dump() and dump2perl() key exists (other way round)."); $num_tests++;

    bw, bliako

Optimizing with Caching vs. Parallelizing (MCE::Map)
9 direct replies — Read more / Contribute
by 1nickt
on Apr 05, 2020 at 11:17

    Mon cher ami Laurent_R recently blogged about his solution to the "extra credit" problem in the Perl Weekly Challenge #54. He showed a solution using memoizing, or caching, to reduce the number of repeated calculations made by a program.

    I wondered about the strategy. Obviously calculating the sequences for numbers up to 1,000,000 without some optimization would be painfully or maybe unworkably slow. But the task looks computation-intensive, so I wanted to see whether more cycles would be more beneficial than caching.

    Here is the solution presented by Laurent:

    This runs on my system pretty quickly:

    real 0m22.596s user 0m21.530s sys 0m1.045s

    Next I ran the following version using mce_map_s from MCE::Map. mce_map_s is an implementation of the parallelized map functionality provided by MCE::Map, optimized for sequences. Each worker is handed only the beginning and end of the chunk of the sequence it will process, and workers communicate amongst themselves to keep track of the overall task. When using mce_map_s, pass only the beginning and end of the sequence to process (also, optionally, the step interval and format).

    use strict; use warnings; use feature 'say'; use Data::Dumper; use MCE::Map; my @output = mce_map_s { my $input = $_; my $n = $input; my @result = $input; while ( $n != 1 ) { $n = $n % 2 ? 3 * $n + 1 : $n / 2; push @result, $n; } return [ $input, scalar @result ]; } 1, 1000000; MCE::Map->finish; @output = sort { $b->[1] <=> $a->[1] } @output; say sprintf('%s : length %s', $_->[0], $_->[1]) for @output[0..19];

    This program, with no caching, runs on my system about five times faster (I have a total of 12 cores):

    real 0m4.322s user 0m27.992s sys 0m0.170s

    Notably, reducing the number of workers to just two still ran the program in less than half the time than Laurent's single-process memoized version. Even running with one process, with no cache, was faster. This is no doubt due to the fact MCE uses chunking by default. Even with one worker the list of one million numbers was split by MCE into chunks of 8,000.

    Next, I implemented Laurent's cache strategy, but using MCE::Shared::Hash. I wasn't really surprised that the program then ran much slower than either previous version. The reason, of course, is that this task pretty much only makes use of the CPU, so while throwing more cycles at it it a huge boost, sharing data among the workers - precisely because the task is almost 100% CPU-bound - only slows them down. Modern CPUs are very fast at crunching numbers.

    I was curious about how busy the cache was in this case, so I wrapped the calls to assign to and read from the hash in Laurent's program in a sub so I could count them. The wrappers look like:

    my %cache; my $sets = my $gets = 0; sub cache_has { $gets++; exists $cache{$_[0]} } sub cache_set { $sets++; $cache{$_[0]} = $_[1] } sub cache_get { $gets++; $cache{$_[0]} }

    The result:

    Sets: 659,948 Gets: 16,261,635
    That's a lot of back and forth.

    So the moral of the story is that while caching is often useful when you are going to make the same calculations over and over, sometimes the cost of the caching exceeds the cost of just making the calculations repeatedly.

    Hope this is of interest!

Perl joke heard on television
3 direct replies — Read more / Contribute
by Anonymous Monk
on Mar 28, 2020 at 07:41
    Stacy Herbert: You know the flu is way more complicated than this corona virus. I think it's like four strands of RNA. It's so simple apparently the code for it fits on one single page, and this simple little tiny virus is taking down our hyper-complex globalized just-in-time system.

    Max Keiser: Yeah I think the COVID-19 is written in Perl, and the flu is written in C++.

    Keiser Report E1520 Gold: Problems with Exchange for Physical
Perl Automateaching -- part 1: brainstorming
2 direct replies — Read more / Contribute
by Discipulus
on Mar 15, 2020 at 13:20
    Hello monks!

    I'm away from active perl coding since last june but I have this idea hunting me and, being forced to stay at home, as many others europeans, maybe is a good ocasion to start coding again on this.

    The main goal is to produce a module able to ask perl question and evaluate answers given by the user. The code provided by the user will not arrive on STDIN but instead I would opt to a document based approach: the user is provided with a file to edit and the possibility to submit it to be reviewed, multiple time if needed.

    The module name will be something like Perl::Tutor or Camel::Tamer or Camel::Trainer or Perl::Teacher or I can use the automateaching word.. but for the moment let's assume the main object will be $tutor for brevity.

    The module will provide some general methods to build up the configuration (path to perl executable, folder to save works..), others to ask questions and to read user's input but these are trivial and I dont want to bother you with such details (for the moment ;).

    The part where I want to hear from you is the judging process of the provided perl document. I imagine something like this:

    $tutor->assignement( question => "Create an array with 5 elements and fill it with +first 5 letters of the English alphabet.\n". "Then remove the first and last one elements using + two perl list operator.\n". "Join these two removed elements and assign the re +sult to a scalar named \$result\n". "Both the array and the scalar have to be lexicall +y scoped.\n", file => '', initial_content => '#nothing atm. strict and warnings and ot +her content more on', hints => [ 'declare viariables using "my"', 'see shift and pop documentation' ], documentation => [ ' +ions/my.html', ' +l#Perl-Functions-by-Category', ' +ions/shift.html', ' +ions/pop.html', ], solution => 'some text to provide when the task is succesful +ly completed and to add as comment to the resulting script', tests => \@basics_tests, \@tests, );

    And hic sunt leones infact the hard part is how @tests is constructed and how tests are run. This first meditation is about a general brainstorming on which tools to use and how build up the process of judging. My ideas:

    1) PPI will be very useful and is the main reason for the $tutor being document oriented. This has also the plus that user will end with a lot or recipes to review. PPI is able to find every kind of statement inside a perl program with PPI::Statement::Variable for example.

    2) Perl-Critic which I must confess and didnt love, probably because I dont know it well, can be handy, becuase, if I understand it correctly, $tutor can apply standard and custom policies to the user provided perl program.

    3) Simple scripts can be just inspected by the mean of their output and general funcionality: in this light Test::Script can be the right tool.

    4) Testing.. I'd love to use the powerfull perl testing framework in its whole but will be problematic being standalone scripts and not modules. This is a problem hunting me since years.. More complex tasks given to the pupil possibly can be modules but not at start. Code provided can be copied into a temporary file and modified to be a Modulino but his seems complicate and fragile solution. Well PPI can be used to extract all subs of a script and to wrap the rest into a main one, but is maybe too much an artificious.

    I want to hear from you about the above idea and its possible pitfalls. I still dont know how to implement it: any suggestion will be welcome! Help on Perl-Critic and PPI and on testing implementation in the module will be very welcome!!


    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Code challenge: Route planning on a 2D grid
2 direct replies — Read more / Contribute
by bliako
on Mar 13, 2020 at 13:52

    There are few things more exciting than a code challenge. So here is a problem similar to Highest total sum path problem. I say similar because the initial problem constraints were a bit fuzzy.

    The problem is to plan an orthogonal route on a 2D rectangular grid in order to maximise point collection from the cells of the grid and minimise the distance travelled from a specified starting cell to a finishing cell. Points can be positive or negative integers (or zero). The added twist is that moving from A to B can be done in a "normal" mode where points are collected from intermediate cells (A+1 , .., B-1). Alternatively, moving can be done in a "sliding" mode where the points on said squares are not collected (perhaps to avoid negative points which will reduce the total score). But the distance counts.

    If people want to modify these initial rules either because they can make the problem more generic, more useful or aid the original poster in his/her quest, please feel free to make a suggestion.

    I am not sure how to post a grid here so I will assume that all our Perls native RNG will produce the same sequence given the seed 42. So, here is a grid and a path (again if people know a better way then suggest it):

    srand 42; my $W = 1024; my $H = 1024; my $maxscore = 10; my $Grid = []; for(my $i=0;$i<$W;$i++){ $Grid->[$i] = [(0)x$H]; for(my $j=0;$j<$H;$j++){ $Grid->[$i]->[$j] = $maxscore - int(rand(2*$maxscore+1)) } } # now add a highscore to stand out for just 1 cell in each column my $highscore = 21; for(my $i=0;$i<$W;$i++){ $Grid->[$i]->[int(rand($H))] = $highscore; }

    bw, bliako

Is Perl dead ? YAIPD Thread
2 direct replies — Read more / Contribute
by ait
on Mar 07, 2020 at 21:17

    Of course not! But for those who insist on it:

    It made my day seeing Perl so high on that chart!!

2010-2019 From Perl to ?
3 direct replies — Read more / Contribute
by jeffa
on Feb 18, 2020 at 17:29

    Greetings. It has been a long long while. I mostly stopped professional software development with Perl around 2013. Since then i mostly have worked in the DevOps field, using a number of dynamic languages to create pipelines for various teams. I did manage to accept a Perl gig in 2018 but that turned out to be quite possibly THE worst job i have ever worked. The environment was incredibly oppressive and my cow-orkers were either evil, incompetent or lethargic. Before i took that gig however, i re-discovered my love for synthesizers and electronic music and i had already acquired a large number of desktop modules, effects pedals and nice 24 track mixer. After that awful job in 2018 however, i had it in my mind to stop programming and do something else. I bought a soldering iron and a few kits. A few kits led to a few more kits including a ring modulator and delay pedal. Those kits led to errors, and to correct those errors i needed education. I bought some components and breadboards. I bought a dual power supply kit and successfully put it together without getting shocked and/or getting deathed. I read Art of Electronics and the Forrest M Mimms III field notebooks. I watched tons of Colin's Lab and EEVBlog videos on Youtube. I built my own mixer, power amps, MIDI thru boxes, a MIDI sequencer (with Arduino) and a MIDI synth, a dual distortion+delay pedal and DC to DC distribution box. All in all, this previous decade was a long slow ride to the bottom -- and once there i used it as an opportunity to learn new skills. Not sure how much Perl programming in 2020 and beyond will offer me ... but here we still are. I am going to use this opportunity to relearn computers and programming from the ground up this time. :)


    (the triplet paradiddle with high-hat)
Perl in data science: could a grant from Perl foundation be useful?
2 direct replies — Read more / Contribute
by zubenel0
on Feb 18, 2020 at 14:15

    Recently I was thinking about if it is possible to make Perl a more attractive option for data science. I know that some great initiatives exist like RFC: 101 Perl PDL Exercises for Data Analysis or RFC: 100 PDL Exercises (ported from numpy). On my part, I will try to write a blog post with a particular machine learning task I have chosen. Nevertheless, as Ovid wrote falling short in data science field is a significant drawback of Perl. How to fix this?

    What I thought about as a way to to proceed could be a grant from Perl foundation. It could work only if it would be possible to find someone interested in a project related to Perl and data science and capable to do it. IMO one of the solutions that could help would be to write a book on How to use Perl in Data Science. Again, this idea is not mine as it was mentioned in perlblogs as a desire to have a new PDL book. Maybe with a help from Perl foundation such a project could encompass even more than PDL and include several other modules suited for data science.

    Another interesting idea that I have encountered was to create Perl/XS graphics backend as there is a need to have graphic library which can create 2D/3D chart easily - see the comments on perlblogs. Unfortunately, I know very little about this but I guess that it might be a very hard task... So these are just a couple of examples but actually the main issue is if it is feasible in general - to have a grant for data science using Perl? What do you think? Do you know someone that could be interested in it? Or do you think that this approach is flawed and have some other suggestions?

Let's finish Imager::GIF
1 direct reply — Read more / Contribute
by Anonymous Monk
on Feb 06, 2020 at 16:42
    Imager::GIF - a handy module for animated GIF processing - is a nice thought, with one semi-working method and problematic documentation (Re^2: Imager::GIF seems broken), that needs some help, as the docs say:


      Implement the rest of the transformations (cropping, rotating etc).

    I needed to non-proportionally scale animated GIFs and implemented type=>nonprop in the scale method. Other desirable features include crop, watermark, and sharpening. Please share your mods and methods here.



    Your local file:

    perl -MImager::GIF -le 'for (keys %INC) { print $INC{$_} if /GIF\.pm/ +}'
    My scale method:
    sub scale { my ($self, %args) = @_; my $ratio = $args{scalefactor} // 1; my $qtype = $args{qtype} // 'mixing'; # add qtype support $self->_mangle(sub { my $img = shift; my $ret = $img->scale(%args, qtype => $qtype); my $h = $img->tags(name => 'gif_screen_height'); my $w = $img->tags(name => 'gif_screen_width'); # add non-proportional scaling if ( $args{xpixels} and $args{ypixels} and $args{type} and $args{type} eq 'nonprop') { my $xratio = defined $args{xpixels} ? $args{xpixels} / $w : $ratio; my $yratio = defined $args{ypixels} ? $args{ypixels} / $w : $ratio; $ret->settag(name => 'gif_left', value => int($xratio * $img->tags(name => 'gif +_left'))); $ret->settag(name => 'gif_top', value => int($yratio * $img->tags(name => 'gif +_top'))); $ret->settag(name => 'gif_screen_width', value => int($xr +atio * $w)); $ret->settag(name => 'gif_screen_height', value => int($yr +atio * $h)); } else { # proportional scaling, from the original unless ($ratio) { if (defined $args{xpixels}) { $ratio = $args{xpixels} / $w; } if (defined $args{ypixels}) { $ratio = $args{ypixels} / $h; } } $ret->settag(name => 'gif_left', value => int($ratio * $img->tags(name => 'gif +_left'))); $ret->settag(name => 'gif_top', value => int($ratio * $img->tags(name => 'gif +_top'))); $ret->settag(name => 'gif_screen_width', value => int($ra +tio * $w)); $ret->settag(name => 'gif_screen_height', value => int($ra +tio * $h)); } return $ret; }); }
    Thank you!
Looking for testers who use Microsoft compilers
2 direct replies — Read more / Contribute
by syphilis
on Feb 03, 2020 at 06:10

    As this is essentially a request for some testing to be done, I thought "Meditations" was probably the best place for it.
    I'm not actively seeking wisdom with this post but, of course, receiving wisdom is always fine, even if it hasn't been requested ;-)

    If you have a perl that you've built using a Microsoft Compiler, I'd be most interested to learn of any problems or failures involved in running:
    cpan -i List::Uniqnum
    In fact, feel free to provide feedback for any build of perl that you have.
    The cpantesters smokers have been happily chewing on List-Uniqnum-0.04 for a couple of days, but there are very few Windows smokers out there.
    And, AFAIK, none of those smokers employ Microsoft compilers.

    Of course, Darwin and Solaris are probably also missing from those cpantesters systems - so join in with them, too .... or anything else that takes your fancy.

    I released List::Uniqnum to test changes that I want to make to the dual-life module List::Util's uniqnum() function - in order to improve that function's portability.
    A new release of List::Util (Scalar-List-Utils-1.54) hit cpan over the weekend. It still doesn't utilize the changes I was hoping would be included.
    If you run cpan -i List::Util you'll probably find that it passes all tests and installs cleanly.

    List-Util-1.54's uniqnum function actually works correctly on Linux, unless perl was built with -Duselongdouble or -Dusequadmath - in which case the test suite still passes, but only because it doesn't run tests that will reveal the problem.
    1.54 works fine on Windows, too, but again only if perl's nvtype is double.
    For it to work correctly on Windows if perl's ivtype is long long, it also requires that perl was built with __USE_MINGW_ANSI_STDIO, which only started happening wih the release of 5.26.0.
    Thankfully, Strawberry Perl 5.26 onwards is built with __USE_MINGW_ANSI_STDIO defined.
    Try cpan -i List::Util on a 64-bit-integer build of Strawberry perl-5.24.0 or earlier, and you'll see a test failure.

    If you want to know what's failing with your particular installation of List::Util's uniqnum function, here is something you can run:
    use Config; # for test 5 use strict; use warnings; use List::Util qw(uniqnum); #use List::Uniqnum qw(uniqnum); my $count; # test 1 if(1.4142135623730951 != 1.4142135623730954) { $count = uniqnum(1.4142135623730951, 1.4142135623730954); print "test 1 failed (returned $count)\n" unless $count == 2; } # test 2 if(10.770329614269008063 != 10.7703296142690080625) { $count = uniqnum(10.770329614269008063, 10.7703296142690080625); print "test 2 failed (returned $count)\n" unless $count == 2; } # test 3 if(1005.1022829201930645202916159776901 != 1005.10228292019306452029161597769015) { $count = uniqnum(1005.1022829201930645202916159776901, 1005.10228292019306452029161597769015); print "test 3 failed (returned $count)\n" unless $count == 2; } # test 4 $count = uniqnum(0, -0.0); print "test 4 failed (returned $count)\n" unless $count == 1; # test 5 if($Config{ivsize} == 8) { # These 2 (the first is an IV, the second is an NV) # both exactly represent the value 762939453127 * (2 ** 21) $count = uniqnum(100000000000262144, 1.00000000000262144e+17); print "test 5 failed (returned $count)\n" unless $count == 1; }
    It only announces failures. If there's no output, then everything is good.
    If you install List::Uniqnum, you can then modify the script to test List::Uniqnum.
    If you do that, and it produces some output, please let me know.

Artificial Intelligence experiment
4 direct replies — Read more / Contribute
by PerlGuy(Tom)
on Feb 03, 2020 at 00:05
    I'm not really sure why, life experience I guess, but while studying and practicing Perl programming, an idea for artificial intelligence flashed into my mind.
    Bot 2

Add your Meditation
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others pondering the Monastery: (3)
    As of 2020-07-12 03:56 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found