Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
but it seems it always produces the same solution, which may mean that it does not do what it is supposed to.

Hm. At the top of the code I posted you'll see a line srand(100);, this seeds the random number generator so that it will always produce the same sequence of numbers. It's purpose is to cause the program to follow the same sequence of steps each run so that modifications to code can be compared for accuracy and timing.

The very essence of optimisation process is that you must ensure a) that you don't break anything; b) that you are comparing like with like. Artificially forcing the program to go through the same steps each run does that.

Once tested and demonstrated, simply delete that line and it will generate truly random solutions each time per your original. I left it in so you could test the accuracy and performance for yourself.

Having said that, I just cannot see why would one generation in the first version take about 2.5 seconds (on my machine) and the current one only 25 milliseconds. It doesn't seem that the new version does 100 times less calculation.

For the most part, that is exactly what my optimisations do. They simple avoid doing the same calculation more than once.

As an example, in your get_phenotype() sub you have:

my $gene_length = gene_length();

which means that you call this subroutine over and over:

sub gene_length { my @gls = (); foreach my $key ( keys(%genome) ) { push( @gls, length($key) ); } @gls = sort(@gls); if ( $gls[0] != $gls[$#gls] ) { die("Invalid genotype"); } return ( $gls[0] ); }

Which iterates %genome pushing the lengths of its keys to an array; then sorts those lengths; then checks that the first and last are the same.

In my version of gen_length(), I avoid this O(n)+O(n log n) process by comparing the length of the first key to each of the others, which is just O(n):

sub gene_length { my $len = length( each %genome ); while( my $key = each %genome ) { die "Invalid genotype" unless length( $key ) == $len; } return $len; }

But far more significantly, as the lengths of the keys in %genome never vary during the life of the program, I call it only once at the start of the program:

my %genome = ( '0001' => '1', '0010' => '2', '0011' => '3', '0100' => '4', '0101' + => '5', '0110' => '6', '0111' => '7', '1000' => '8', '1001' => '9', '1010' => '+', '1011' => '-', '1100' => '*', '1101' => '/' ); my $gene_length = gene_length();

As get_phenotype() is called many thousands of time, you are repeating the get_length() processing many thousands of times for no purpose. And your original code is rife with similar things.

Another example:

sub get_nonrandom_chromosome { ### get_population_fitness_score() ### ### calculates the score for every chromosome ### my $population_fitness_score = get_population_fitness_score(); my $rulet_position = rand($population_fitness_score); my $temp_score = 0; my $nonrandom_chromosome; foreach my $chromosome (@population) { ### you then REcalculate them individually here ### $temp_score += get_fitness_score($chromosome); if ( $temp_score > $rulet_position ) { $nonrandom_chromosome = $chromosome; last; } } return ($nonrandom_chromosome); }

In my version, I only calculate the scores once and reuse them:

sub get_nonrandom_chromosome { my @scores = map { get_fitness_score($_) } @population; my $rulet_position = rand( sum @scores ); my $temp_score = 0; foreach my $i ( 0 .. $#scores ) { $temp_score += $scores[ $i ]; return $population[ $i ] if $temp_score > $rulet_position; } }

But the single biggest saving comes in get_result():

my %memo; sub get_result { return $memo{ $_[0] } //= evalExpr( get_phenotype( $_[0], $gene_le +ngth ) ); }

For each iteration of the main until( $winner ) loop, most of the chromosomes in the population will not have changed, but your algorithm calls for their results to be recalculated each time.

The use of %memo above avoids the costly process of reconstructing and evaling the phenotype over and over by storing it the first time it is calculated and returning the stored value whenever it is called for again. I also dodged the notoriously slow eval code by using my own simple four function expression evaluator:

sub evalExpr { local $_ = shift; s[(?<=[^*/+-])([*/+-])][<$1>]g; 1 while s[([^>]+)<([*/])>([^<]+)][$2 eq '*' ? $1 * $3 : $1 / $3]e +; 1 while s[([^>]+)<([+-])>([^<]+)][$2 eq '+' ? $1 + $3 : $1 - $3]e +; return $_; }

This is less efficient than it could be because it was necessary to follow the same rules of precedence that Perl's eval does in order to ensure my code produced the same results as your original.

If however, per the web page linked from your OP, the operations were evaluated in strict left to right order, that could be rather more efficient than it currently is. But, from experience, if an optimised solution doesn't produce exactly the same results as the original, even if the original is in error according to the spec., then the optimisations will be dismissed.

And that hides another limitation I imposed on myself. In order to ensure that the output of my version matched yours, it was necessary to ensure that rand was called exactly the same number of times in the same sequence. This prevented me employing many further optimisations.

For example. You've opted to store and manipulate your chromosomes as strings of ascii 0s and 1s. This is easy to work with, but uses a full byte per bit. Moving to using bit strings opens up a whole gamut of further optimisations--both space and time--that add up to another order of magnitude, perhaps two, of performance. But as you seem happy with what you now have, perhaps I'll get around to posting that as a meditation some day.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^3: Perl slower than java (perl 12x faster than Java) by BrowserUk
in thread Perl slower than java by Christian888

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (8)
As of 2024-03-28 10:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found