Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Abbreviation regex?

by bsb (Priest)
on Jan 16, 2006 at 17:23 UTC ( [id://523550]=perlquestion: print w/replies, xml ) Need Help??

bsb has asked for the wisdom of the Perl Monks concerning the following question:

How can you match the prefix of a string with a regex?

I currently have the following, but would like to know if there's a better way.

/^a(b(b(r(e(v(i(a(t(i(o(n)?)?)?)?)?)?)?)?)?)?)?$/
This is only a toy problem so non-regex solutions need not apply :) (unless they're interesting)

Brad

Replies are listed 'Best First'.
Re: Abbreviation regex?
by fishbot_v2 (Chaplain) on Jan 16, 2006 at 17:44 UTC

    You could just reverse your regex with the match variable:

    if ( "abbreviation" =~ m/^$str/ ) { #...

    Still a regex solution, just not a refactoring of that regex solution.

      ++ I like the thinking outside the box. That said, you can only do that if $str comes from a trusted source. If, for example, you get it from user input, try this as input:

      abbr(?{system qw(format c:)})ev
      Of course, that's only really a problem if you have enabled use re 'eval'; - but if you already need that for other purposes, you'll need to be very careful.

      In fact, your "outside the box" is almost exactly described in the documentation for the (?{ code }) construct in perlre where it goes into a bit more detail and talks about ways to mitigate it.

        The injection attack (and the bug preventing the use of special characters) is fixed by simply adding \Q!
        if ( "abbreviation" =~ m/^\Q$str/ ) { #...

        Naturally... there are a number of downsides to this solution. The most obvious to me is overhead. If we are scanning a file, we need to recompile a trivial regex for each work. The substr solution above is much faster and safe from exploit.

        I've seen it somewhere before. I don't know where, though. I think that it resides in the place for things that are idiomatic but not useful enough to be idioms.

Re: Abbreviation regex?
by ikegami (Patriarch) on Jan 16, 2006 at 18:02 UTC

    Parse::RecDescent uses regexps at the core of the grammars it creates, so this qualifies:

    use Parse::RecDescent (); my $word = 'abbreviation'; my $c = 1; my $word_rule = join '', map { my $d = $c++; "L$d: '$_' L$c(?)\n" } split //, $word; my $grammar = "m: <skip:''> L1 /\\Z/ {1}\n${word_rule}L$c:\n"; my $p = Parse::RecDescent->new($grammar); if ($p->m('abb')) { print("yes\n"); } else { print("no\n"); }

    I'd never use this, but you did imply this was for fun :)

Re: Abbreviation regex?
by ikegami (Patriarch) on Jan 16, 2006 at 17:31 UTC
    An alternative:
    /^a($|b($|b($|r($|e($|v($|i($|a($|t($|i($|o($|n$)))))))))))/

    In both yours and mine, (?:...) would be faster than (...). I left it as is for the sake of simplicity.

    Of course, the simpliest solution would be substr($word, 0, length($str)) eq $str, but it's non-regexp.

      That's just screaming for factorisation. ;-)

      /^a(|b(|b(|r(|e(|v(|i(|a(|t(|i(|o(|n)))))))))))$/
      Or, to possibly make the abbreviation stand out a bit more at the expense of some readability:
      /^a(b(b(r(e(v(i(a(t(i(o(n|)|)|)|)|)|)|)|)|)|)|)$/

      (The non-toy solution probably involves Text::Abbrev.)

Re: Abbreviation regex?
by radiantmatrix (Parson) on Jan 17, 2006 at 19:43 UTC

    fishbot_v2 has my vote for best solution within the constraints. However, if you were doing this repeatedly, using index might be an interesting approach. I've coded fishbot_v2's solution into the regex sub below. I did several runs and pasted a pretty typical result.

    use Benchmark qw/:all/; sub regex { return ($_[1] =~ m/^$_[0]/) ? 1 : 0; } sub index_func { return (index($_[1], $_[0]) == 0) ? 1 : 0; } cmpthese ( 500000, { 'regex match' => sub { regex('abbr', 'abbreviation') or die }, 'index match' => sub { index_func('abbr', 'abbreviation') or die } +, 'regex miss' => sub { regex('anna', 'abbreviation') and die }, 'index miss' => sub { index_func('anna', 'abbreviation') and die +}, }); __END__ Rate regex match regex miss index miss index match regex match 680272/s -- -13% -36% -38% regex miss 781250/s 15% -- -27% -29% index miss 1068376/s 57% 37% -- -3% index match 1101322/s 62% 41% 3% --

    As you can see, using index for this is much faster, regardless of whether you are more likely to match or miss.

    <-radiant.matrix->
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://523550]
Approved by willyyam
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (2)
As of 2024-04-26 00:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found