Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Longest repeated string...

by glasswalk3r (Friar)
on Feb 03, 2006 at 16:25 UTC ( [id://527713]=note: print w/replies, xml ) Need Help??


in reply to Longest repeated string...

I had taken a chance of reading the weblink that you wrote but after 30 seconds I gave up... so I will try to give only general advices here.

First of all, don't do this:

$ ./test < textfile | sort | uniq

Perl has more then enough tools to do the job that is done with sort and uniq programs. System calls are expensive and sometimes the speed of those programs doesn't pay for the cost of invoking them.

To read the file, use open function. If the file is not that big, you can read it entirely and put into memory like this:

open(IN,"<$file") or die "Cannot read $file: $!\n"; my @content = <IN>; close(IN);

This will speed up things than using while block.

Use an array to keep the results from the string:

$results[0]++ if ( $digit == 0 );

You can even avoid using if statement to do that. For the next string to process, do a @results = () to start again with zeros.

Try as much as you can to avoid using next loops with for. Look for the Schwartzian Transform to see how to improve your code. Try using @sequence = split( //, $sequence ) instead of a other loop.

And last but not least, check the Benchmark module to test all those things.

Alceu Rodrigues de Freitas Junior
---------------------------------
"You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill

Replies are listed 'Best First'.
Re^2: Longest repeated string...
by Limbic~Region (Chancellor) on Feb 03, 2006 at 17:33 UTC
    glasswalk3r,
    First of all, don't do this:
    $ ./test < textfile | sort | uniq
    Perl has more then enough tools to do the job that is done with sort and uniq programs. System calls are expensive and sometimes the speed of those programs doesn't pay for the cost of invoking them.

    I think you are making the mistake of repeating what you have heard others say without really understanding it yourself. In this particular case, sort and uniq are likely compiled C programs optimized for a single task and are far superior to Perl. While system calls can be expensive - it is just not the case here.

    open(IN,"<$file") or die "Cannot read $file: $!\n"; my @content = <IN> +; close(IN); close(IN);
    This will speed up things than using while block.

    Well it may speed things up at the expense of memory. I do not know how many lines are in the file but if individual strings are 9 million characters this may definately be the wrong way to go. You still need to loop through the array so it is not going to avoid the need to loop. The speed savings come in from disk I/O.

    Try as much as you can to avoid using next loops with for. Look for the Schwartzian Transform to see how to improve your code. Try using @sequence = split( //, $sequence ) instead of a other loop.

    I am not exactly sure why you think using next inside a for loop is a bad thing. If it is possible to eliminate those loops prior to entering the loop then it is advantageous because you don't have a conditional every loop. That is seldom the case. The ST is used to speed up sorting routines when the comparison of 2 elements is expensive. This looks out of place in the context of the rest of what you said so you should probably be sure to explain why what you are saying has relavence.

    Finally, the real problem here is the numbers involved. Using a brute force algorithm, no matter how well it is tuned, to find the longest common substring of a 9 million digit number is going to be extremely slow. If you are interested in the math I will be happy to provide it.

    Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://527713]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (5)
As of 2024-04-25 06:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found