regexp golf - homework

Boots111 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re (tilly) 1: regexp golf - homework by tilly (Archbishop) on Feb 02, 2002 at 22:50 UTC
particle's answering of the exact reverse of your question notwithstanding, this problem is impossible to solve without at least adding in anchors. With anchors you are able to solve it, and your second try is on the right path. What you want to do is split the problem into 252 divisions of the digits into subsets of 5 and 5, take each subset and split it into subsets of 2 and 3, then list combinations of those to get something that will match all digits in any order. Add ?s to take into account missing digits. The main point of an exercise like this is to demonstrate that there are simple problems which REs are a very bad fit for. This is important to understand because far too often people get into the frame of mind that, "Oh, I will just use an RE for this!" and then they have two problems instead of one. Incidentally this is very easy to solve with negative lookaheads, but that fact merely obscures the underlying point. Adding features to an RE engine extends what you can do with it, but doesn't address the fact that many tasks are just a bad fit for doing with REs.	[reply]
Re: Re (tilly) 1: regexp golf - homework by hagen (Friar) on Feb 03, 2002 at 00:08 UTC
Ah Hah! Thanks tilly ++. Because it was described as a `homework' problem I thought I'd be able to come up with something without looking `in the back of the book'. However after a few scribbles (and my Camel and Cookbook are at work, and I'm not) I came to a grinding halt. Looking at particle's solution made me realise I should have been able to think it through to that type of solution and I didn't actually pick it as `breaking the rules'. However your points are very well made... when you've got the RE `hammer' every string problem looks like a nail! Once I understood this, I looked for a different approach. This approach would involve using a hash, splitting the string into digits, counting the unique keys and checking for any with a count (value) more than one. Now that's a homework problem I can handle 8-). Thanks for insight. hagen	[reply]
Re: regexp golf - homework by ehdonhon (Curate) on Feb 02, 2002 at 19:35 UTC
Interesting problem, and ++ for admiting it was homework! I'm a little stumped by this one myself. If the problem were inverted ("Write a regexp that matches strings containing matching digits"), it would be a lot easier: /(0.0)\|(1.1)\|(2.2)\|(3.3) ... (you get the idea) I'm curious, do regular expressions have an equivalent of Demorgan's law in boolean expressions that allows you to invert an expression? Best of luck to you, please let us know what you find out.	[reply]
Re: Re: regexp golf - homework by no_slogan (Deacon) on Feb 02, 2002 at 22:37 UTC
Yes, it is always possible to invert a (mathematical) regular expression. I don't know of a simple method for doing it, though. You have to convert the regex to an NFA (nondeterministic finite automaton), remove the nondeterminism to produce a DFA, switch the accepting and nonaccepting states, and then convert the DFA back to a regex. It's nasty, and it's not guaranteed to give you the "best" possible regex. Also, the DFA for this problem has 2**N+1 states, where N is the number of digits. In the final step, converting the DFA to a regex, you have a choice as to which states you're going to prune off first. Boots111's first solution corresponds to pruning from left to right, while the second solution corresponds to working from the outside in. Extending the second solution to 10 digits would result in 252 branches in the top-level parenthesis group, and each of those branches would contain the solutions to two different 5-digit problems. Ick. particle has a good idea of using backreferences. Those don't exist in "mathematical" regular expressions. The person who posed the problem might think of them as extended operators, in which case they wouldn't be allowed here. Not for me to say.	[reply]
Re: (GOLF) - multiple digit finder regex - 17 chars by particle (Vicar) on Feb 02, 2002 at 21:38 UTC
it's not as difficult as you may think. the regex looks like this: `/.?([0-9]).?\1/`. including the brackets, it's 17 characters long! here's the full program: `#!/usr/local/bin/perl -w use strict; $\|++; my @numlist; push @numlist, int(rand 100000) for 1..10; for(@numlist) { next if /^$/; /.?([0-9]).?\1/ ? print "$_\tmultiple $1\n" : print "$_\tsingle\ +n"; }` [download] i generated a list of random five digit numbers to feed the regex. here's a breakdown of the regex: `.?` i match zero or more characters, non-greedily `([0..9])` match numbers 0..9, save in \1 `.?` again match zero or more characters, non-greedily `\1` find the next instance of the character in \1 * what's most interesting about this, is that $1 does not work in place of \1. instead, on my perl 5.6.1 install, it warns with: `Use of uninitialized value in concatenation (.) or string at test_matc +h1.pl line 10. Nested quantifiers before HERE mark in regex m/.?([0-9]).?+ << HERE +/ at test_match1.pl line 10.` [download] by the way, there's nothing to stop this from working with letters, as well. just change 0..9 and you're on your set. enjoy! Update:* i see the error of my ways. tilly's right (below). i solved the exact opposite of the problem, which was pretty easy! whoops! ~Particle	[reply] [d/l] [select]
Re: Re: (GOLF) - multiple digit finder regex - 17 chars by trs80 (Priest) on Feb 02, 2002 at 22:41 UTC
It is nice, but it broke the rules :^( The post says no characters other then (),?,\|,* this rules out the character class [] I believe. Some body please correct me if I am wrong. My own version that breaks the rules by using \d and not using the regex itself for success, but still a fun brain exercise is this: `S:while(<DATA>){%d=();while(/(\d)/g){if($d{$1}) {print "DUP: $1 in $_";next S}$d{$1}=1}} __DATA__ 12345 012345 894376 3885437 98374 30924 03284098325 094385098432 74362 876543 4456789` [download]	[reply] [d/l]
Re: Re: (GOLF) - multiple digit finder regex - 17 chars by chipmunk (Parson) on Feb 03, 2002 at 04:21 UTC
`/.?([0-9]).?\1/` what's most interesting about this, is that $1 does not work in place of \1. Well, of course! $1 and \1 mean different things. When $1 is used in a regex, its value is interpolated as the regex is compiled, just like any other variable. For example: `'b' =~ /(.)/; print "Yup!\n" if 'ab' =~ /(.)$1/;` [download] prints Yup! because $1 holds 'b' from the first match and the second regex is compiled as `/(.).c/`. \1, on the other hand, is special regex syntax, and matches the same thing that was matched by the first capturing group in the current regex. `'b' =~ /(.)/; print "Yup!\n" if 'ab' =~ /(.)\1/;` [download] does not print Yup!, because \1 wants to match an 'a'. (\1 on the right hand side of a substitution, with the same meaning as $1, has been deprecated for quite some time.) Update: Fixed the explanation; an earlier revision of the example used 'c' instead of 'b'.	[reply] [d/l] [select]
Re: Re: Re: (GOLF) - multiple digit finder regex - 17 chars by particle (Vicar) on Feb 03, 2002 at 04:39 UTC
(\1 on the right hand side of a substitution, with the same meaning as $1, has been deprecated for quite some time.) yeah. that's how i was used to seeing \1. from people learning perl who are used to vi, ex, etc. on the right hand side, it's a no-no. although, there's a problem with the explanation of your first example. $1 can't hold 'c', it's nowhere to be found. but i understand the difference now. docs are good. ~Particle	[reply]
Re^2: (GOLF) - multiple digit finder regex - 17 chars by Anonymous Monk on Dec 11, 2012 at 04:09 UTC
I think that solving the opposite porblem means solving the original problem. Just filter the input using /\d+/ to insure that they are composed by only digits. Then, solve the problem in the opposite way.	[reply]
Re: regexp golf - homework by Ido (Hermit) on Feb 02, 2002 at 21:28 UTC
It doesn't really answer the question, but if you could use (??{}), this would do it: `/^((??{'[^\D'.join('',@c,push @c,$1).']'}))*$/` [download]	[reply] [d/l]
Re: regexp golf - homework by LanX (Saint) on Dec 11, 2012 at 06:45 UTC
Ehm ... what am I missing here? like IMHO just searching for a repeated digit like with `/(\d).\1/` and inverting the result should do. `DB<111> print $_ !~ /(\d).\1/ ? "$_ ok\n" :"$_ has <$1>\n" for qw/0 +12 0102 01233/ 012 ok 0102 has <0> 01233 has <3>` [download] Cheers Rolf	[reply] [d/l] [select]


Your skill will accomplish what the force of many cannot
	PerlMonks