Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

RE: Substring Finding/Counting

by nuance (Hermit)
on Aug 30, 2000 at 20:49 UTC ( [id://30334]=note: print w/replies, xml ) Need Help??


in reply to Substring Finding/Counting

when you first mentioned this in the chatterbox you asked if anyone had any suggestions to improve it. So here goes FWIW.

You are writing your matches to a temporary file and then reading them in again to construct a hash. That means that if you find any substrings with more than one occurrence, you will end up with entries in your data file for all of them. For instance if you find that token you mentioned and it's in every line of your - lets say - 500 record file. Then your first mention in the text file says 500 occurrences, the next says 499 and so on down to 2 occurrences. You don't get an entry that says one, but you will have checked for it.

If instead of writting to that file you created the hash as you process the file, then right at the top you can just check if it exists. If it does then dont bother checking any further, you've already found all these matches. For the example I gave this equates to leaving out 124750 checks and that's just for one pattern.

like this:

do { my $p = $packets[0]; foreach my $l ($level..length($p)) { foreach my $pos (0..length($p)-$l) { my $str = substr($p,$pos,$l); next if exists $all{$str}; # if we've already found this # string somewhere else, exit # this iteration my $num = 0; for (0..$#packets) { if ($l <= length($packets[$_])) { pos($packets[$_]) = 0; while ($packets[$_] =~ /$str/g) { $num ++; } } } unless (exists $all{$str}) { $all{$str} = $num unless $num < $threshold; } } } shift(@packets); } while ($#packets >= 0);

Nuance

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://30334]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-03-29 11:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found