Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^2: Longest repeated string...

by Limbic~Region (Chancellor)
on Feb 03, 2006 at 17:33 UTC ( [id://527738]=note: print w/replies, xml ) Need Help??


in reply to Re: Longest repeated string...
in thread Longest repeated string...

glasswalk3r,
First of all, don't do this:
$ ./test < textfile | sort | uniq
Perl has more then enough tools to do the job that is done with sort and uniq programs. System calls are expensive and sometimes the speed of those programs doesn't pay for the cost of invoking them.

I think you are making the mistake of repeating what you have heard others say without really understanding it yourself. In this particular case, sort and uniq are likely compiled C programs optimized for a single task and are far superior to Perl. While system calls can be expensive - it is just not the case here.

open(IN,"<$file") or die "Cannot read $file: $!\n"; my @content = <IN> +; close(IN); close(IN);
This will speed up things than using while block.

Well it may speed things up at the expense of memory. I do not know how many lines are in the file but if individual strings are 9 million characters this may definately be the wrong way to go. You still need to loop through the array so it is not going to avoid the need to loop. The speed savings come in from disk I/O.

Try as much as you can to avoid using next loops with for. Look for the Schwartzian Transform to see how to improve your code. Try using @sequence = split( //, $sequence ) instead of a other loop.

I am not exactly sure why you think using next inside a for loop is a bad thing. If it is possible to eliminate those loops prior to entering the loop then it is advantageous because you don't have a conditional every loop. That is seldom the case. The ST is used to speed up sorting routines when the comparison of 2 elements is expensive. This looks out of place in the context of the rest of what you said so you should probably be sure to explain why what you are saying has relavence.

Finally, the real problem here is the numbers involved. Using a brute force algorithm, no matter how well it is tuned, to find the longest common substring of a 9 million digit number is going to be extremely slow. If you are interested in the math I will be happy to provide it.

Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://527738]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2024-04-24 10:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found