Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Pattern match location finding

by feloniousMonk (Pilgrim)
on Feb 18, 2004 at 19:59 UTC ( [id://330034]=perlquestion: print w/replies, xml ) Need Help??

feloniousMonk has asked for the wisdom of the Perl Monks concerning the following question:

Hello all

Can anyone think of a fast way to do exact pattern matching and report back the actual match location within a string?

Ex., if I have a string "AABAABBBB", and my query string is "B", I would like to return a list of "B" occurences in the target string (3, 6, 7, 8, 9).

Currently I am walking through the target string with substr() calls, I know there's got to be a faster way to do this.

Oh yeah, I didn't post code because I don't want code critique, etc., just looking for concepts.

Thanks,
felonious --

Replies are listed 'Best First'.
Re: Pattern match location finding
by diotalevi (Canon) on Feb 18, 2004 at 20:01 UTC

    Examine $-[0] for the start location of your match. This will not help you if you wanted to use the /g switch in list context so there you have to fudge.

    my @offsets; while ( /B/g ) { push @offsets, $-[0]; }
      I'm seeing something I should have expected - for the given string "AABAABAA" and target "AABAA" my @offsets will have one element whose value = 0. I need it to contain 0 and 3...

      I know what it's doing, when AABAA is matched, it moves on to the end of the match. What I really need is a sliding window type of match. It didn't dawn on me to explain this until I remembered how this works...

      What it's doing:
      *AABAA*BAA - matches here
      AABAA*BAA regex continues at the asterisk,
      but I want it to continure here:
      A*ABAABAA
      i.e., i guess you could say I need a single-width assertion...
      --
        Reset pos() to 1 + $-[0] so you're ready for the next character then.
Re: Pattern match location finding
by borisz (Canon) on Feb 18, 2004 at 20:13 UTC
    perl -e '$p = 0;while ( ( $p = index( "AABAABBBB", "B", $p + 1 ) ) >= 0 ) { push @pos, $p + 1} print "@pos";'
    Boris
Re: Pattern match location finding
by ysth (Canon) on Feb 18, 2004 at 20:09 UTC
    or something like (untested):
    my $ind = -1; while (($ind = index($string, $substring, $ind+1) >= 0) { push @offsets, $ind }
    Update: add missing (
Re: Pattern match location finding
by hmerrill (Friar) on Feb 18, 2004 at 20:15 UTC
    I prefer an easier to understand approach than diotalevi's - if your string is not too big, you can split it into an array where each element of the array is one character of the string, and then search through the array, something like this:
    my $mystring = "AABAABBBB"; my @myarray = split //, $mystring; my $ct = 0; foreach my $element (@myarray) { if ($element eq "B") { push(@myoccurrences, $ct); } $ct++; }
    NOTE that I'm sure that diotalevi's solution is much more efficient than mine. But this one I can understand simply, and that one I have to sit and think about ;-)

    HTH.

      Downside of your approach is that you assume the query string will always be one character. diotalevi's approach does not suffer that limitation. Definately works with the test case feloniousMonk presented, but fM refers to 'B' as a query string implying that it could be multiple characters in some cases.

        And in fact query string is always <= length of target string...

        Thanks for the help everyone, now I have some good suggestions to try out - they're all much better than my own earlier attempts. --

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://330034]
Approved by Ovid
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-25 20:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found