Abbreviation regex?

bsb has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Abbreviation regex? by fishbot_v2 (Chaplain) on Jan 16, 2006 at 17:44 UTC
You could just reverse your regex with the match variable: `if ( "abbreviation" =~ m/^$str/ ) { #...` [download] Still a regex solution, just not a refactoring of that regex solution.	[reply] [d/l]
Re^2: Abbreviation regex? by Tanktalus (Canon) on Jan 16, 2006 at 17:55 UTC
++ I like the thinking outside the box. That said, you can only do that if $str comes from a trusted source. If, for example, you get it from user input, try this as input: `abbr(?{system qw(format c:)})ev` [download] Of course, that's only really a problem if you have enabled `use re 'eval';` - but if you already need that for other purposes, you'll need to be very careful. In fact, your "outside the box" is almost exactly described in the documentation for the `(?{ code })` construct in perlre where it goes into a bit more detail and talks about ways to mitigate it.	[reply] [d/l] [select]
Re^3: Abbreviation regex? by ikegami (Patriarch) on Jan 16, 2006 at 18:07 UTC
The injection attack (and the bug preventing the use of special characters) is fixed by simply adding `\Q`! `if ( "abbreviation" =~ m/^\Q$str/ ) { #...` [download]	[reply] [d/l] [select]
Re^3: Abbreviation regex? by fishbot_v2 (Chaplain) on Jan 16, 2006 at 18:02 UTC
Naturally... there are a number of downsides to this solution. The most obvious to me is overhead. If we are scanning a file, we need to recompile a trivial regex for each work. The `substr` solution above is much faster and safe from exploit. I've seen it somewhere before. I don't know where, though. I think that it resides in the place for things that are idiomatic but not useful enough to be idioms.	[reply] [d/l]
Re: Abbreviation regex? by ikegami (Patriarch) on Jan 16, 2006 at 18:02 UTC
Parse::RecDescent uses regexps at the core of the grammars it creates, so this qualifies: `use Parse::RecDescent (); my $word = 'abbreviation'; my $c = 1; my $word_rule = join '', map { my $d = $c++; "L$d: '$_' L$c(?)\n" } split //, $word; my $grammar = "m: <skip:''> L1 /\\Z/ {1}\n${word_rule}L$c:\n"; my $p = Parse::RecDescent->new($grammar); if ($p->m('abb')) { print("yes\n"); } else { print("no\n"); }` [download] I'd never use this, but you did imply this was for fun :)	[reply] [d/l]
Re: Abbreviation regex? by ikegami (Patriarch) on Jan 16, 2006 at 17:31 UTC
An alternative: `/^a($\|b($\|b($\|r($\|e($\|v($\|i($\|a($\|t($\|i($\|o($\|n$)))))))))))/` [download] In both yours and mine, `(?:...)` would be faster than `(...)`. I left it as is for the sake of simplicity. Of course, the simpliest solution would be `substr($word, 0, length($str)) eq $str`, but it's non-regexp.	[reply] [d/l] [select]
Re^2: Abbreviation regex? by Tanktalus (Canon) on Jan 16, 2006 at 17:37 UTC
That's just screaming for factorisation. ;-) `/^a(\|b(\|b(\|r(\|e(\|v(\|i(\|a(\|t(\|i(\|o(\|n)))))))))))$/` [download] Or, to possibly make the abbreviation stand out a bit more at the expense of some readability: `/^a(b(b(r(e(v(i(a(t(i(o(n\|)\|)\|)\|)\|)\|)\|)\|)\|)\|)\|)$/` [download] (The non-toy solution probably involves Text::Abbrev.)	[reply] [d/l] [select]
Re: Abbreviation regex? by radiantmatrix (Parson) on Jan 17, 2006 at 19:43 UTC
fishbot_v2 has my vote for best solution within the constraints. However, if you were doing this repeatedly, using index might be an interesting approach. I've coded fishbot_v2's solution into the `regex` sub below. I did several runs and pasted a pretty typical result. use Benchmark qw/:all/; sub regex { return ($_[1] =~ m/^$_[0]/) ? 1 : 0; } sub index_func { return (index($_[1], $_[0]) == 0) ? 1 : 0; } cmpthese ( 500000, { 'regex match' => sub { regex('abbr', 'abbreviation') or die }, 'index match' => sub { index_func('abbr', 'abbreviation') or die } +, 'regex miss' => sub { regex('anna', 'abbreviation') and die }, 'index miss' => sub { index_func('anna', 'abbreviation') and die +}, }); __END__ Rate regex match regex miss index miss index match regex match 680272/s -- -13% -36% -38% regex miss 781250/s 15% -- -27% -29% index miss 1068376/s 57% 37% -- -3% index match 1101322/s 62% 41% 3% -- [download] As you can see, using index for this is much faster, regardless of whether you are more likely to match or miss. <-radiant.matrix-> A collection of thoughts and links from the minds of geeks The Code that can be seen is not the true Code I haven't found a problem yet that can't be solved by a well-placed trebuchet	[reply] [d/l] [select]


Think about Loose Coupling
	PerlMonks