Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Perl slower than java (perl 12x faster than Java)

by BrowserUk (Patriarch)
on Dec 09, 2010 at 02:50 UTC ( [id://876153]=note: print w/replies, xml ) Need Help??


in reply to Perl slower than java

This produces the same results as your original, but 66 times faster. Which should make it about 12 times faster than your Java code.

Of course, if you fold many of the optimisations--which mostly come down to not doing the same thing multiple times--back into the Java code, that would run more quickly too. Iffy algorithms are iffy in any language.

#! perl -slw use strict; use List::Util qw[ sum ]; use Time::HiRes qw[ time ]; srand( 100 ); my $target = 100; my $population_size = 50; my $chromosome_size = 100; my $mutation_rate = 0.1; my $crossover_rate = 0.7; my @population = (); my %genome = ( '0001' => '1', '0010' => '2', '0011' => '3', '0100' => '4', '0101' + => '5', '0110' => '6', '0111' => '7', '1000' => '8', '1001' => '9', '1010' => '+', '1011' => '-', '1100' => '*', '1101' => '/' ); my $gene_length = gene_length(); my $start = time; initialise_population(); my $generation_counter = 1; my $winner = check_for_winner(); until ($winner) { @population = regenrate_population(); $winner = check_for_winner(); $generation_counter++; } print("Solution reached in generation: $generation_counter"); print("The chromosome: @$winner"); print( get_phenotype($winner, $gene_length) . " = " . get_result($winn +er) ); printf "Took %.3f seconds\n", time()-$start; exit; sub regenrate_population { my @new_population = (); for ( 0 .. ( $population_size - 1 ) ) { $new_population[$_] = get_child( get_nonrandom_chromosome(), get_nonrandom_chromosome() ); } return (@new_population); } sub get_child { my( $chromosome1, $chromosome2 ) = @_; my $new_chromosome; if ( rand(1) < $crossover_rate ) { my $crossover_point = int( rand($chromosome_size) ); $new_chromosome = [ @{$chromosome1}[0..$crossover_point-1], @{$chromosome2}[$crossover_point..$chromosome_size-1], ]; } else { $new_chromosome = rand(1) > 0.5 ? $chromosome1 : $chromosome2 +; } $$new_chromosome[ rand($chromosome_size) ] ^= 0 if rand(1) < $muta +tion_rate; return $new_chromosome; } sub check_for_winner { get_result($_) == $target and return $_ for @population } sub get_fitness_score { my $chromosome = shift(@_); my $result = get_result($chromosome); my $fitness_score = ( $target / ( $target + abs( $target - $result ) ) ) * 3; } sub get_nonrandom_chromosome { my @scores = map { get_fitness_score($_) } @population; my $rulet_position = rand( sum @scores ); my $temp_score = 0; foreach my $i ( 0 .. $#scores ) { $temp_score += $scores[ $i ]; return $population[ $i ] if $temp_score > $rulet_position; } } my %memo; sub get_result { return $memo{ $_[0] } //= evalExpr( get_phenotype( $_[0], $gene_le +ngth ) ); } sub initialise_population { for my $chromosome ( 0 .. $population_size ) { for my $nucleotide ( 0 .. $chromosome_size ) { $population[$chromosome]->[$nucleotide] = int( rand(1) + 0 +.5 ); } } } sub get_phenotype { my( $chromosome, $len ) = @_; my $ep = join'', map { $genome{ $_ } // ''; } unpack "(a$len)*", join'', @$chromosome; $ep =~ s[^\D+][]; $ep =~ s[(\d)(\d+)][$1]g; $ep =~ s[(\D)(\D+)][$1]g; $ep =~ s[\D+$][]; return $ep; } sub gene_length { my $len = length( each %genome ); while( my $key = each %genome ) { die "Invalid genotype" unless length( $key ) == $len; } return $len; } sub evalExpr { local $_ = shift; s[(?<=[^*/+-])([*/+-])][<$1>]g; 1 while s[([^>]+)<([*/])>([^<]+)][$2 eq '*' ? $1 * $3 : $1 / $3]e +; 1 while s[([^>]+)<([+-])>([^<]+)][$2 eq '+' ? $1 + $3 : $1 - $3]e +; return $_; } __END__ Solution reached in generation: 5 The chromosome: 0 0 0 0 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 0 1 1 1 1 1 0 +0 0 1 0 0 0 0 0 1 1 0 0 1 1 1 1 1 0 0 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 + 1 1 1 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 0 0 0 +1 0 1 0 9*4*3-8 = 100 Took 7.711 seconds Solution reached in generation: 5 The chromosome: 0 0 0 0 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 0 1 1 1 1 1 0 +0 0 1 0 0 0 0 0 1 1 0 0 1 1 1 1 1 0 0 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 + 1 1 1 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 0 0 0 +1 0 1 0 9*4*3-8 = 100 Took 0.115 seconds

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Perl slower than java (perl 12x faster than Java)
by Christian888 (Acolyte) on Dec 10, 2010 at 22:45 UTC

    I could not follow all of your code, but it seems it always produces the same solution, which may mean that it does not do what it is supposed to.

    Anyhow, taking on all the advice I got here (I have to admit I'm impressed with the answers I got - Thanks guys) I optimised the code and the current version runs over 100 times faster than the first one!!! I am very pleased with it actually :-)

    Having said that, I just cannot see why would one generation in the first version take about 2.5 seconds (on my machine) and the current one only 25 milliseconds. It doesn't seem that the new version does 100 times less calculation.

    It would be nice if someone opened my eyes. Anyway... Here is the latest optimised code. Thanks

    Christian

    use strict; use warnings; use List::Util qw[ sum ]; use Time::HiRes qw[ time ]; #use Math::Complex; #use Win32; my $total_time = time(); my $target = 100; my $population_size = 100; my $chromosome_size = 100; my $mutation_rate = 0.1; my $crossover_rate = 0.7; my @population = (); my @phenotypes = (); my @results = (); my %genome = ( '0001' => '1', '0010' => '2', '0011' => '3', '0100' => '4', '0101' => '5', '0110' => '6', '0111' => '7', '1000' => '8', '1001' => '9', '1010' => '+', '1011' => '-', '1100' => '*', '1101' => '/' ); my $gene_length = get_gene_length( \%genome ); my $duration; my $generation_counter = 1; my $winner; do { my $generation_time = time(); @population = @{ regenrate_population( \@population ) }; $winner = check_for_winner( \@population ); printf( "Generation $generation_counter took %.3f seconds to compl +ete\n", time() - $generation_time ); $generation_counter++; } until ($winner); print("Solution reached in generation: $generation_counter \n"); print("The chromosome: @{$winner->{chromosome}} \n"); print( $winner->{phenotype} . " = " . $winner->{result} . " \n" ); printf( "Total time to reach solution %.3f seconds \n", time() +- $total_time ); sub regenrate_population { my $old_population = shift(@_); my $new_population = []; if ( $old_population->[0] ) { my $population_fitness_score=get_population_fitness_score($old +_population); for my $individual ( 0 .. ($population_size) ) { my $chromosome1 = get_nonrandom_chromosome($old_population +,$population_fitness_score); my $chromosome2 = get_nonrandom_chromosome($old_population +,$population_fitness_score); $new_population->[$individual]->{chromosome} = get_child( $chromosome1, $chromosome2 ); $new_population->[$individual]->{phenotype} = get_phenotype( $new_population->[$individual]->{chromoso +me} ); $new_population->[$individual]->{result} = get_result( $new_population->[$individual]->{phenotype} +); $new_population->[$individual]->{fitness_score} = get_fitness_score( $new_population->[$individual]->{resu +lt} ); } } else { for my $individual ( 0 .. $population_size ) { for my $nucleotide ( 0 .. $chromosome_size ) { $new_population->[$individual]->{chromosome}->[$nucleo +tide] = int( rand(1) + 0.5 ); } $new_population->[$individual]->{phenotype} = get_phenotype( $new_population->[$individual]->{chromoso +me} ); $new_population->[$individual]->{result} = get_result( $new_population->[$individual]->{phenotype} +); $new_population->[$individual]->{fitness_score} = get_fitness_score( $new_population->[$individual]->{resu +lt} ); } } return ($new_population); } sub check_for_winner { my $winner; my $population = shift(@_); foreach my $individual (@$population) { if ( $individual->{result} == $target ) { $winner = $individual; last; } } return ($winner); } sub get_nonrandom_chromosome { my $population=shift(@_); my $population_fitness_score = shift(@_); my $rulet_position = rand($population_fitness_score); my $temp_score = 0; my $nonrandom_chromosome; foreach my $individual (@$population) { $temp_score += $individual->{fitness_score}; if ( $temp_score > $rulet_position ) { $nonrandom_chromosome = $individual->{chromosome}; last; } } return ($nonrandom_chromosome); } sub get_child { my $chromosome1 = shift(@_); my $chromosome2 = shift(@_); my $new_chromosome; if ( rand(1) < $crossover_rate ) { my $crossover_point = int( rand($chromosome_size) ); $new_chromosome = [ @$chromosome1[ 0 .. $crossover_point ], @$chromosome2[ ( $crossover_point + 1 ) .. ($chromosome_si +ze) ] ]; } else { if ( rand(1) > 0.5 ) { $new_chromosome = $chromosome1; } else { $new_chromosome = $chromosome2; } } if ( rand(1) < $mutation_rate ) { my $nucleotide_pos = int( rand($chromosome_size) ); if ( $$new_chromosome[$nucleotide_pos] ) { $$new_chromosome[$nucleotide_pos] = 0; } else { $$new_chromosome[$nucleotide_pos] = 1; } } return ($new_chromosome); } sub get_population_fitness_score { my $population = shift(@_); my $population_fitness_score = sum( map { $_->{fitness_score} } (@$population) ); return ($population_fitness_score); } sub get_result { my $phenotype = shift(@_); my $result = eval($phenotype) ; return ($result); } sub get_fitness_score { my $result = shift(@_); my $fitness_score = ( $target / ( $target + abs( $target - $result ) ) ) * 3; return ($fitness_score); } sub get_phenotype { my $chromosome = shift(@_); my @phenotype = (); my @expressed_phenotype = (); my @nucleotides = @$chromosome; my @gene = (); my $pattern = q(\d); foreach my $nucleotide (@nucleotides) { if ( scalar(@gene) >= $gene_length ) { my $gene = join( "", @gene ); @gene = ($nucleotide); if ( defined( $genome{$gene} ) ) { push( @phenotype, $genome{$gene} ); } } else { push( @gene, $nucleotide ); } } foreach my $item (@phenotype) { if ( $item =~ m/$pattern/ ) { push( @expressed_phenotype, $item ); if ( $pattern eq q(\d) ) { $pattern = q(\D); } else { $pattern = q(\d); } } } if ( $expressed_phenotype[$#expressed_phenotype] =~ m/\D/ ) { pop(@expressed_phenotype); } return ( join( "", @expressed_phenotype ) ); } sub get_gene_length { my $genome = shift(@_); my @gls = (); foreach my $key ( keys(%$genome) ) { push( @gls, length($key) ); } @gls = sort(@gls); if ( $gls[0] != $gls[$#gls] ) { die("Invalid genotype"); } return ( $gls[0] ); }
      but it seems it always produces the same solution, which may mean that it does not do what it is supposed to.

      Hm. At the top of the code I posted you'll see a line srand(100);, this seeds the random number generator so that it will always produce the same sequence of numbers. It's purpose is to cause the program to follow the same sequence of steps each run so that modifications to code can be compared for accuracy and timing.

      The very essence of optimisation process is that you must ensure a) that you don't break anything; b) that you are comparing like with like. Artificially forcing the program to go through the same steps each run does that.

      Once tested and demonstrated, simply delete that line and it will generate truly random solutions each time per your original. I left it in so you could test the accuracy and performance for yourself.

      Having said that, I just cannot see why would one generation in the first version take about 2.5 seconds (on my machine) and the current one only 25 milliseconds. It doesn't seem that the new version does 100 times less calculation.

      For the most part, that is exactly what my optimisations do. They simple avoid doing the same calculation more than once.

      As an example, in your get_phenotype() sub you have:

      my $gene_length = gene_length();

      which means that you call this subroutine over and over:

      sub gene_length { my @gls = (); foreach my $key ( keys(%genome) ) { push( @gls, length($key) ); } @gls = sort(@gls); if ( $gls[0] != $gls[$#gls] ) { die("Invalid genotype"); } return ( $gls[0] ); }

      Which iterates %genome pushing the lengths of its keys to an array; then sorts those lengths; then checks that the first and last are the same.

      In my version of gen_length(), I avoid this O(n)+O(n log n) process by comparing the length of the first key to each of the others, which is just O(n):

      sub gene_length { my $len = length( each %genome ); while( my $key = each %genome ) { die "Invalid genotype" unless length( $key ) == $len; } return $len; }

      But far more significantly, as the lengths of the keys in %genome never vary during the life of the program, I call it only once at the start of the program:

      my %genome = ( '0001' => '1', '0010' => '2', '0011' => '3', '0100' => '4', '0101' + => '5', '0110' => '6', '0111' => '7', '1000' => '8', '1001' => '9', '1010' => '+', '1011' => '-', '1100' => '*', '1101' => '/' ); my $gene_length = gene_length();

      As get_phenotype() is called many thousands of time, you are repeating the get_length() processing many thousands of times for no purpose. And your original code is rife with similar things.

      Another example:

      sub get_nonrandom_chromosome { ### get_population_fitness_score() ### ### calculates the score for every chromosome ### my $population_fitness_score = get_population_fitness_score(); my $rulet_position = rand($population_fitness_score); my $temp_score = 0; my $nonrandom_chromosome; foreach my $chromosome (@population) { ### you then REcalculate them individually here ### $temp_score += get_fitness_score($chromosome); if ( $temp_score > $rulet_position ) { $nonrandom_chromosome = $chromosome; last; } } return ($nonrandom_chromosome); }

      In my version, I only calculate the scores once and reuse them:

      sub get_nonrandom_chromosome { my @scores = map { get_fitness_score($_) } @population; my $rulet_position = rand( sum @scores ); my $temp_score = 0; foreach my $i ( 0 .. $#scores ) { $temp_score += $scores[ $i ]; return $population[ $i ] if $temp_score > $rulet_position; } }

      But the single biggest saving comes in get_result():

      my %memo; sub get_result { return $memo{ $_[0] } //= evalExpr( get_phenotype( $_[0], $gene_le +ngth ) ); }

      For each iteration of the main until( $winner ) loop, most of the chromosomes in the population will not have changed, but your algorithm calls for their results to be recalculated each time.

      The use of %memo above avoids the costly process of reconstructing and evaling the phenotype over and over by storing it the first time it is calculated and returning the stored value whenever it is called for again. I also dodged the notoriously slow eval code by using my own simple four function expression evaluator:

      sub evalExpr { local $_ = shift; s[(?<=[^*/+-])([*/+-])][<$1>]g; 1 while s[([^>]+)<([*/])>([^<]+)][$2 eq '*' ? $1 * $3 : $1 / $3]e +; 1 while s[([^>]+)<([+-])>([^<]+)][$2 eq '+' ? $1 + $3 : $1 - $3]e +; return $_; }

      This is less efficient than it could be because it was necessary to follow the same rules of precedence that Perl's eval does in order to ensure my code produced the same results as your original.

      If however, per the web page linked from your OP, the operations were evaluated in strict left to right order, that could be rather more efficient than it currently is. But, from experience, if an optimised solution doesn't produce exactly the same results as the original, even if the original is in error according to the spec., then the optimisations will be dismissed.

      And that hides another limitation I imposed on myself. In order to ensure that the output of my version matched yours, it was necessary to ensure that rand was called exactly the same number of times in the same sequence. This prevented me employing many further optimisations.

      For example. You've opted to store and manipulate your chromosomes as strings of ascii 0s and 1s. This is easy to work with, but uses a full byte per bit. Moving to using bit strings opens up a whole gamut of further optimisations--both space and time--that add up to another order of magnitude, perhaps two, of performance. But as you seem happy with what you now have, perhaps I'll get around to posting that as a meditation some day.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://876153]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (4)
As of 2024-04-19 11:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found