regex for negating a sequence

http://qs321.pair.com?node_id=576137

spurperl has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

How can the following be implemented as a regex ?

\x26\x67 followed by 8 bytes in which the sequence \x26\x67 does not appear.

Thanks

Comment on regex for negating a sequence Download Code

Replies are listed 'Best First'.
Re: regex for negating a sequence by davidrw (Prior) on Oct 03, 2006 at 18:10 UTC
i think this will work for you: `/\x26\x67(?!\x26\x67)(.(?!\x26\x67)){6}.{2}/s` [download] Based upon this test code to try to match ab followed by 4 bytes w/no ab sequence: `perl -le 'print for map { /ab(?!ab)(.(?!ab)){2}.{2}/s ? $& : () } @AR +GV' abhjabdf abasdasd abdasdadsab ababblah OUTPUTS: abasda abdasd abblah` [download] Update: Here's a commented version: `/ \x26\x67 # Starting sequence (let's call it AB) (?!\x26\x67) # Exclude double starting seqeuence, e.g. ABAB (?: # non-capture .(?!\x26\x67) # Any character that's not followed by the sequenc +e ){6} # Need N-2 of these .{2} # And the last two characters can be anything. # note we know they're not AB because otherwise the # (N-2)th item in the previous part wouldn't have mat +ched /sx # Enable comments, and make . mean everything` [download] Update: Here's a generic version: `my $marker = '&g'; my $N = 8; my $n1 = $N-2; /$marker(?!$marker)(?:.(?!$marker)){$n1}.{2}/s` [download] Update: (As pointed out by diotalevi) Added the `/s` modifier in case the data stream can have a newline in it.	[reply] [d/l] [select]
Re^2: regex for negating a sequence by ikegami (Patriarch) on Oct 03, 2006 at 19:35 UTC
Your regexp can be simplified to the following: `/\x26\x67(?:(?!\x26\x67).){7}./s` [download] If you'd rather have simplicity at the cost of an extra check, use the following: `/\x26\x67(?:(?!\x26\x67).){8}/s` [download] `/(?:(?!$re).){8}/` is the equivalent of `/[^$chars]{8}/` but it (negatively) matches entire regexps instead of a choice of characters.	[reply] [d/l] [select]
Re^3: regex for negating a sequence by ysth (Canon) on Oct 03, 2006 at 20:25 UTC
Another alternative: `/\x26\x67(?!.{0,6}\x26\x67).{8}/s` [download] I've interpreted "does not appear" differently than you when the eighth char is the \x26 of a \x26\x67 sequence.	[reply] [d/l]
Re^3: regex for negating a sequence by davidrw (Prior) on Oct 03, 2006 at 20:40 UTC
nice. My first attempt was trying to do it like that, but i started with something like: `/\x26\x67(.(?!\x26\x67)){7}./s` [download] which doesn't work for the case of consectutive sequences.. i just didn't think of putting the look-ahead behind the dot :)	[reply] [d/l]
Re^2: regex for negating a sequence by diotalevi (Canon) on Oct 03, 2006 at 18:36 UTC
Your . is excluding the \n character. Use the /s flag in some form if you really mean "any character." ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊	[reply]
Re: regex for negating a sequence by pizza_milkshake (Monk) on Oct 03, 2006 at 20:11 UTC
whoops, it's been a while `#!/usr/bin/perl -wl # ex: set ts=2 et: use strict; use warnings qw(all); $_ = "\x26\x67\x26\x67"."ABCDEFGHIJKLMN"; print $1 while /(\x26\x67(?!\x26\x67).{8})/g;` [download]	[reply] [d/l]
Re: regex for negating a sequence by aquarium (Curate) on Oct 03, 2006 at 23:21 UTC
might be faster to just loop (5 times) through the next two characters in the string and test for equality. lookahead/lookbehind assertions are rather expensive if the string being matched is long `$matched = undef; for($index=0,$index<11;$index+=2) { if(substr($test_string;$index,2) eq "ab") { $matched = 1; last; } }` [download] untested code the hardest line to type correctly is: stty erase ^H	[reply] [d/l]
Re^2: regex for negating a sequence (lookaheads and lookbehinds are now fast) by grinder (Bishop) on Oct 04, 2006 at 14:53 UTC
lookahead/lookbehind assertions are rather expensive if the string being matched is long demerphq might be too shy to toot his own horn, but a couple of weeks he landed an awesome patch to the development version of Perl that teaches the optimiser to look inside those assertions and use the information to improve the matching strategy used. Which equates to some massive speedups in some cases, perhaps the cases to which you were alluding. Let's just hope that Perl 5.10 sees the light of day soon. There's lots of goodness in there waiting to be unleashed. • another intruder with the mooring in the heart of the Perl	[reply]
Re: regex for negating a sequence by monarch (Priest) on Oct 04, 2006 at 15:49 UTC
I know this is outside the scope of the original question but this is a good opportunity to try your hand at some testing! Sometimes a regexp might look like it will do the job but how can you be SURE it does what you expect? Try the following code: Read more... (2 kB) You can see that I've tried to cover all the cases I can think of in your requirements. Do they look right? Do the test situations describe all the match/nomatch situations you're expecting? The test script produces the following output: C:\temp>perl -w regexp.t 1..14 ok 1 - lead bytes followed by 8 clear bytes ok 2 - lead bytes followed by only 7 clear bytes ok 3 - lead bytes followed by 9 clear bytes ok 4 - lead bytes with 2 bad bytes ok 5 - lead bytes with 2 bad bytes ok 6 - lead bytes with 2 bad bytes ok 7 - lead bytes with 2 bad bytes ok 8 - lead bytes with 1 bad byte inside the eight byte limit ok 9 - lead bytes with 1 bad bytes ok 10 - lead bytes with 2 sets of bad bytes ok 11 - lead bytes with 2 sets of bad bytes ok 12 - lead bytes with 2 sets of bad bytes ok 13 - lead bytes with 2 sets of bad bytes ok 14 - lead bytes with 2 sets of bad bytes [download] You can plug in anybody's suggestion into the script and verify that it does, indeed, do what you want. If you get the hang of a little testing now you'll find it easier for other things too! Good luck!	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom