http://qs321.pair.com?node_id=576137

spurperl has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

How can the following be implemented as a regex ?

\x26\x67 followed by 8 bytes in which the sequence \x26\x67 does not appear.

Thanks

Replies are listed 'Best First'.
Re: regex for negating a sequence
by davidrw (Prior) on Oct 03, 2006 at 18:10 UTC
    i think this will work for you:
    /\x26\x67(?!\x26\x67)(.(?!\x26\x67)){6}.{2}/s
    Based upon this test code to try to match ab followed by 4 bytes w/no ab sequence:
    perl -le 'print for map { /ab(?!ab)(.(?!ab)){2}.{2}/s ? $& : () } @AR +GV' abhjabdf abasdasd abdasdadsab ababblah OUTPUTS: abasda abdasd abblah
    Update: Here's a commented version:
    / \x26\x67 # Starting sequence (let's call it AB) (?!\x26\x67) # Exclude double starting seqeuence, e.g. ABAB (?: # non-capture .(?!\x26\x67) # Any character that's not followed by the sequenc +e ){6} # Need N-2 of these .{2} # And the last two characters can be anything. # note we know they're not AB because otherwise the # (N-2)th item in the previous part wouldn't have mat +ched /sx # Enable comments, and make . mean everything
    Update: Here's a generic version:
    my $marker = '&g'; my $N = 8; my $n1 = $N-2; /$marker(?!$marker)(?:.(?!$marker)){$n1}.{2}/s
    Update: (As pointed out by diotalevi) Added the /s modifier in case the data stream can have a newline in it.

      Your regexp can be simplified to the following:

      /\x26\x67(?:(?!\x26\x67).){7}./s

      If you'd rather have simplicity at the cost of an extra check, use the following:

      /\x26\x67(?:(?!\x26\x67).){8}/s

      /(?:(?!$re).){8}/
      is the equivalent of
      /[^$chars]{8}/
      but it (negatively) matches entire regexps instead of a choice of characters.

        Another alternative:
        /\x26\x67(?!.{0,6}\x26\x67).{8}/s
        I've interpreted "does not appear" differently than you when the eighth char is the \x26 of a \x26\x67 sequence.
        nice. My first attempt was trying to do it like that, but i started with something like:
        /\x26\x67(.(?!\x26\x67)){7}./s
        which doesn't work for the case of consectutive sequences.. i just didn't think of putting the look-ahead behind the dot :)

      Your . is excluding the \n character. Use the /s flag in some form if you really mean "any character."

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: regex for negating a sequence
by pizza_milkshake (Monk) on Oct 03, 2006 at 20:11 UTC
    whoops, it's been a while
    #!/usr/bin/perl -wl # ex: set ts=2 et: use strict; use warnings qw(all); $_ = "\x26\x67\x26\x67"."ABCDEFGHIJKLMN"; print $1 while /(\x26\x67(?!\x26\x67).{8})/g;
Re: regex for negating a sequence
by aquarium (Curate) on Oct 03, 2006 at 23:21 UTC
    might be faster to just loop (5 times) through the next two characters in the string and test for equality. lookahead/lookbehind assertions are rather expensive if the string being matched is long
    $matched = undef; for($index=0,$index<11;$index+=2) { if(substr($test_string;$index,2) eq "ab") { $matched = 1; last; } }
    untested code
    the hardest line to type correctly is: stty erase ^H
      lookahead/lookbehind assertions are rather expensive if the string being matched is long

      demerphq might be too shy to toot his own horn, but a couple of weeks he landed an awesome patch to the development version of Perl that teaches the optimiser to look inside those assertions and use the information to improve the matching strategy used. Which equates to some massive speedups in some cases, perhaps the cases to which you were alluding.

      Let's just hope that Perl 5.10 sees the light of day soon. There's lots of goodness in there waiting to be unleashed.

      • another intruder with the mooring in the heart of the Perl

Re: regex for negating a sequence
by monarch (Priest) on Oct 04, 2006 at 15:49 UTC
    I know this is outside the scope of the original question but this is a good opportunity to try your hand at some testing! Sometimes a regexp might look like it will do the job but how can you be SURE it does what you expect?

    Try the following code:

    You can see that I've tried to cover all the cases I can think of in your requirements. Do they look right? Do the test situations describe all the match/nomatch situations you're expecting?

    The test script produces the following output:

    C:\temp>perl -w regexp.t 1..14 ok 1 - lead bytes followed by 8 clear bytes ok 2 - lead bytes followed by only 7 clear bytes ok 3 - lead bytes followed by 9 clear bytes ok 4 - lead bytes with 2 bad bytes ok 5 - lead bytes with 2 bad bytes ok 6 - lead bytes with 2 bad bytes ok 7 - lead bytes with 2 bad bytes ok 8 - lead bytes with 1 bad byte inside the eight byte limit ok 9 - lead bytes with 1 bad bytes ok 10 - lead bytes with 2 sets of bad bytes ok 11 - lead bytes with 2 sets of bad bytes ok 12 - lead bytes with 2 sets of bad bytes ok 13 - lead bytes with 2 sets of bad bytes ok 14 - lead bytes with 2 sets of bad bytes

    You can plug in anybody's suggestion into the script and verify that it does, indeed, do what you want. If you get the hang of a little testing now you'll find it easier for other things too! Good luck!