Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: regex step counting

by LanX (Saint)
on Dec 03, 2019 at 13:25 UTC ( [id://11109591]=note: print w/replies, xml ) Need Help??


in reply to regex step counting

Pragmatic approach: you could use re qw/debug/ and count the lines (or do more intelligent grepping)

see re

EDIT: But be warned that executing unfiltered regexes from user input is as dangerous as executing plain Perl code.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

UPDATE

DB<3> use re "debug"; print 'abcdc' =~ /(.).*\1/ Compiling REx "(.).*\1" Final program: 1: OPEN1 (3) 3: REG_ANY (4) 4: CLOSE1 (6) 6: STAR (8) 7: REG_ANY (0) 8: REF1 (10) 10: END (0) minlen 1 Matching REx "(.).*\1" against "abcdc" 0 <> <abcdc> | 0| 1:OPEN1(3) 0 <> <abcdc> | 0| 3:REG_ANY(4) 1 <a> <bcdc> | 0| 4:CLOSE1(6) 1 <a> <bcdc> | 0| 6:STAR(8) | 0| REG_ANY can match 4 times out of 2 +147483647... 5 <abcdc> <> | 1| 8:REF1: "a"(10) | 1| failed... 4 <abcd> <c> | 1| 8:REF1: "a"(10) | 1| failed... 3 <abc> <dc> | 1| 8:REF1: "a"(10) | 1| failed... 2 <ab> <cdc> | 1| 8:REF1: "a"(10) | 1| failed... 1 <a> <bcdc> | 1| 8:REF1: "a"(10) | 1| failed... | 0| failed... 1 <a> <bcdc> | 0| 1:OPEN1(3) 1 <a> <bcdc> | 0| 3:REG_ANY(4) 2 <ab> <cdc> | 0| 4:CLOSE1(6) 2 <ab> <cdc> | 0| 6:STAR(8) | 0| REG_ANY can match 3 times out of 2 +147483647... 5 <abcdc> <> | 1| 8:REF1: "b"(10) | 1| failed... 4 <abcd> <c> | 1| 8:REF1: "b"(10) | 1| failed... 3 <abc> <dc> | 1| 8:REF1: "b"(10) | 1| failed... 2 <ab> <cdc> | 1| 8:REF1: "b"(10) | 1| failed... | 0| failed... 2 <ab> <cdc> | 0| 1:OPEN1(3) 2 <ab> <cdc> | 0| 3:REG_ANY(4) 3 <abc> <dc> | 0| 4:CLOSE1(6) 3 <abc> <dc> | 0| 6:STAR(8) | 0| REG_ANY can match 2 times out of 2 +147483647... 5 <abcdc> <> | 1| 8:REF1: "c"(10) | 1| failed... 4 <abcd> <c> | 1| 8:REF1: "c"(10) 5 <abcdc> <> | 1| 10:END(0) Match successful! cFreeing REx: "(.).*\1" DB<4>

Replies are listed 'Best First'.
Re^2: regex step counting
by Random_Walk (Prior) on Dec 03, 2019 at 17:07 UTC

    Thanks, I like this answer.

    While talk of the halting problem further down is interesting, I have a real world situation now where users are putting out bad regex on production servers. The regex they have access ltoo is actualy not Perl, but a very limited subset of Java's regex. It has .* though and they manage to shoot themselves in the foot with abundant use of .* I don't really need 100% accurate count of how many steps they are taking, just an idea if they are in the region of up to a few hundred steps per line, or way beyond it. If they breach some arbitrary limit we will ask them to come to us for regex guidance. So counting the lines from a re debug will be close enough for my purpose.

    Cheers,
    R.

    Pereant, qui ante nos nostra dixerunt!
      > not Perl, but a very limited subset of Java's regex

      I was afraid you'd say this, different engines do different guessing and optimizations.

      Something very fast in Perl can be a nightmare elsewhere, and vice versa.

      > I don't really need 100% accurate count of how many steps they are taking, just an idea if they are in the region of up to a few hundred steps per line

      It really depends on how limited your grammar is.

      The point why I mentioned the halting problem is, that the proof shows a way to construct a pathological case for every approach.

      In other words, if your tests are limited it's possible to construct a troublesome regex which will pass them. (Provided the grammar is rich enough)°

      Hence no guaranties whatsoever!

      Good luck! :)

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      °) actually it's pretty trivially done, if the regex has no length restriction.

Re^2: regex step counting
by Anonymous Monk on Dec 04, 2019 at 01:07 UTC

    But be warned that executing unfiltered regexes from user input is as dangerous as executing plain Perl code.

    nonsense

    no re 'eval';

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11109591]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-04-19 14:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found