Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re: Comparing pattern

by bv (Friar)
on Sep 18, 2009 at 14:44 UTC ( #796130=note: print w/replies, xml ) Need Help??

in reply to Comparing pattern

Look at study, especially if you have a lot of patterns you are matching against. Also, use the 3-argument form of open whenever possible (though that won't speed up your program any.) If you plan to extend this to multiple files, you should precompile your regexps with qr//. Also, take a look at setting local $/ for file slurping, which will be faster than reading lines and joining.

print pack("A25",pack("V*",map{1919242272+$_}(34481450,-49737472,6228,0,-285028276,6979,-1380265972)))

Replies are listed 'Best First'.
Re^2: Comparing pattern
by mrc (Sexton) on Sep 18, 2009 at 16:11 UTC
    Thanks for reply! I want to use this subroutine to scan many files. With only few patterns, speed is increased considerably. I must find the way to load all patterns in one shot without that loop. I'm learning now.
Re^2: Comparing pattern
by mrc (Sexton) on Sep 21, 2009 at 09:08 UTC
    Look at study, especially if you have a lot of patterns you are matching against.
    I have added study; for both sentences, but I can't see any difference when scanning multiple files using this subroutine. Please advice.
    #!/usr/bin/perl -w use strict; my $patterns = "/path/to/patterns.txt"; my $arg1 = shift; open (PAT, '<', $patterns) or die "$patterns: $!\n"; my @patterns = <PAT>; study; close(PAT); chomp @patterns; my $regex_string = join '|', @patterns; open( FILE, "<", "$arg1") or die "$arg1: $!\n"; $_ = do { local $/; <FILE> }; study; close(FILE); if ( /($regex_string)/is ) {print "\n$arg1\n$1\n";}

      Did you read the documentation on study?

      study attempts to make matches against a string more efficient, but incurs a one-time penalty for the time spent studying the string. It is most beneficial when you are doing many matches against a single string. You should benchmark to determine if you are getting any benefit from study. The first study in your code (line 10) is unnecessary, since you don't have a string in $_ to match against.

      You keep saying "subroutine." Is this really in a sub? If so, are you reading in your patterns every time the sub is run? There's a major inefficiency. And once you solve that one, you can look at precompiling your expressions like I originally suggested.

      print pack("A25",pack("V*",map{1919242272+$_}(34481450,-49737472,6228,0,-285028276,6979,-1380265972)))
        Yes, it is a subroutine. I need this script to scan all files for scams or other abuses. I'm using File::Find to search all files under a directory tree then call the subroutine for each file. Patterns are outside the sub.
        Based on your suggestion, I will precompile this way:
        ... my $list_regex = join '|', @patterns; my $regex_string = qr/$list_regex/is; ... if (/($regex_string)/) {print "\n$arg1\n$1\n";}
        As for study, I noticed a little slowness. Maybe it's not efficient in my case.

        I still have a big problem. Graff helped me with file slurp and scanner working few times faster than my original script, but I don't have experience with $/ or $_ and, if you check my last example, global $1 contains entire text between first pattern and second pattern:
        pattern1 some text pattern2 instead of this match: pattern1.*pattern2
        Can you please give me some advice? Thank you!

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://796130]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2023-11-30 07:15 GMT
Find Nodes?
    Voting Booth?

    No recent polls found