Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

[SOLVED] Evaluating user-entered captured groups during Perl substitution

by Polyglot (Pilgrim)
on Mar 16, 2020 at 03:28 UTC ( #11114321=perlquestion: print w/replies, xml ) Need Help??

Polyglot has asked for the wisdom of the Perl Monks concerning the following question:

This question is related to one I posted earlier in which I had limited the form to certain specific operations. I now wish to expand it to allow a more savvy user to enter their own regex to make substitutions on the text.

I desire to make an application where a logged-in user (maybe just me) may enter a substitution regex in order to make edits to a body of text in a database. The substitution needs to allow captured groups, and evaluate properly in perl when it is executed.

Suppose we have the following:

#FROM DATABASE TEXT $line = "Her house is on 34th Mt. Whitney St. near St. Mt. Helens St." +; #FROM INCOMING FORM INPUTS $query = "(St\.\s)(Mt\.\s)(?=Helens)"; $substitution = "Mount ${1}"; # USER MAY HAVE ENTERED "$1" #FOR RETURNED HTML HIGHLIGHT OF CHANGES $start = qq|<span class="highlight">|; $end = "</span>"; return "Regex containing code disallowed." if $query =~ m[\(\?\??\{]; return "Regex containing code disallowed." if $substitution =~ m[\(\?\ +??\{]; my $replace = sub { my $evaluate = sub { return eval($1); }; my $val = $substitution; $val =~ s/(\$\{\d+\})/$evaluate->()/eg; $val = "$start$val$end"; return eval($val); }; $line =~ s/$query/$replace->()/eg; #EXPECTED TEXT AFTER SUBSTITUTION $line = "Her house is on 34th Mt. Whitney St. near Mount St. Helens St +.";

However, the above does not perform properly, hence this question. It seems that eval will only work on an actual variable, and not when mixed with other text. Furthermore, in order to ensure I am doing the eval on every possible capture group, I need to iterate over the substitution side looking for each one, and the nested eval seems problematic. I'm not sure if this is the problem--or what to do about it.

Yes, I have had to use some careful processing to untaint these inputs before this segment of the code, but I think that part is working--so assume there are no issues with taint at this point, and that the user's original input is unchanged--though feel free to suggest a better method for untainting that would dovetail nicely with the code above.

Note that code in the regex is explicitly disallowed for security purposes--and because it should be wholly unnecessary for my application. With no code, no other variables should be applicable--I only want the captured groups to properly evaluate via their ordinary $1, $2, etc. notations.

EDIT: A solution to this issue was posted by "jo37" in his second post below.

Blessings,

~Polyglot~

Replies are listed 'Best First'.
Re: Evaluating user-entered captured groups during Perl substitution
by jo37 (Friar) on Mar 16, 2020 at 07:37 UTC

    Some quotes were wrong - use warnings would have told you. And an eval was in the wrong place.

    EDIT: I shouldn't have posted this in a hurry. Sorry, this does not work.

    At least for the given example this works:

    #!/usr/bin/perl use strict; use warnings; #FROM DATABASE TEXT my $line = "Her house is on 34th Mt. Whitney St. near St. Mt. Helens S +t."; #FROM INCOMING FORM INPUTS my $query = '(St\.\s)(Mt\.\s)(?=Helens)'; my $substitution = 'Mount ${1}'; # USER MAY HAVE ENTERED "$1" #FOR RETURNED HTML HIGHLIGHT OF CHANGES my $start = q|<span class="highlight">|; my $end = "</span>"; return "Regex containing code disallowed." if $query =~ m[\(\?\??\{]; return "Regex containing code disallowed." if $substitution =~ m[\(\?\ +??\{]; my $replace = sub { my $evaluate = sub { return eval($1); }; my $val = $substitution; $val =~ s/(\$\{\d+\})/$evaluate->()/eg; $val = "$start$val$end"; return $val; }; eval "\$line =~ s/\$query/$replace->()/eg"; #EXPECTED TEXT AFTER SUBSTITUTION $line = "Her house is on 34th Mt. Whitney St. near Mount St. Helens St +.";

    Greetings,
    -jo

    $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$

      Replying to myself, here is a corrected and simplified version:

      #!/usr/bin/perl use Test2::V0; #FROM DATABASE TEXT my $line = "Her house is on 34th Mt. Whitney St. near St. Mt. Helens S +t."; #FROM INCOMING FORM INPUTS my $query = qr'(St\.\s)(Mt\.\s)(?=Helens)'; my $substitution = 'Mount ${1}'; # USER MAY HAVE ENTERED "$1" #FOR RETURNED HTML HIGHLIGHT OF CHANGES my $start = '<span class="highlight">'; my $end = '</span>'; my $replace = $start . $substitution . $end; eval "\$line =~ s{\$query}{$replace}g;"; die $@ if $@; is $line, 'Her house is on 34th Mt. Whitney St. near <span class="high +light">Mount St. </span>Helens St.', 'pattern substitution'; done_testing;

      Greetings,
      -jo

      $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$

        Wow! This works!

        I actually tried "correcting" what you had there, thinking you'd made a mistake, as it was so simple, i.e., it appeared you'd forgotten the "->()" after "$replace" and the "e" at the end. (I guess I was making this all harder than it needed to be.) Of course, my corrections did not work. Then I took a closer look and realized you had eliminated those evaluated subroutines altogether. I felt like it couldn't possibly work that way, but, went ahead and tried it anyhow. I'm very surprised at how well it works, and I'm not sure why it does work so well. I guess I need to learn more about using eval as a wrapper around a substitution regex. It's highly likely this would have been useful for me in a number of my past projects.

        Thank you ever so much for taking time to offer your corrections.

        It may be worthy of mention that in my actual code, I am not using any of the quotes such as the qr'(St\.\s)(Mt\.\s)(?=Helens)'; because the variables are coming in straight from the form, with the exception of the untainting routine that the substitution side passes through. So escaped characters were never a part of my issue. The difficulty seems to have been with the complexity of the nested eval.

        Blessings,

        ~Polyglot~

      I appreciate seeing how you handled the eval for the entire regex expression, escaping two of the tokens to delay their evaluation (I presume). I don't remember having seen that kind of grammar before.

      As for the application, my usage involves UTF8 text, and I just provided the example here to simulate the functionality I wish to have. In my own code, what you suggested doesn't seem to work. I'll have to play with it some more later, when I have time again, to see if I can do something more with the grammar you introduced.

      The quotes seem to be fine at my end...at least, warnings doesn't indicate any special issue with them. Warnings just says there's an unrecognized escape in the line where I escaped the period and indicated a space with \s. As far as my eye can see, there should be no error there. Useless error messages is why I usually turn warnings off unless I specifically am watching the logs during trouble shooting. Otherwise, my logs just get fat without benefit. In the actual application, the text is coming in from a web form, and is not directly assigned in this manner.

      Blessings,

      ~Polyglot~

        The quotes seem to be fine at my end...at least, warnings doesn't indicate any special issue with them. Warnings just says there's an unrecognized escape in the line where I escaped the period and indicated a space with \s. As far as my eye can see, there should be no error there.
        Line under discussion:
        $query = "(St\.\s)(Mt\.\s)(?=Helens)";
        Unrecognized escape \s passed through at blah... line x

        Perl is saying that it figures you made a mistake with \s. It translated that into a single "s" character. It also translated \. into a literal single character of '.' but it knew about escaping a period and Perl didn't complain about that.
        Consider the following:

        #FROM INCOMING FORM INPUTS $query = '(St\.\s)(Mt\.\s)(?=Helens)'; #right way print "$query\n"; ##(St\.\s)(Mt\.\s)(?=Helens) $query = "(St\.\s)(Mt\.\s)(?=Helens)"; #your way print "",$query,"\n"; ##(St.s)(Mt.s)(?=Helens) print "$query\n"; ## same thing (St.s)(Mt.s)(?=Helens) $query = "(St\\.\\s)(Mt\\.\\s)(?=Helens)"; #ok, but confusing print "$query\n"; ## (St\.\s)(Mt\.\s)(?=Helens)
        Fixing the quoting has real consequences in terms of what $query winds up being!
        I always "use warnings;". I very rarely ignore a warning, with the possible exception of working with old code and the "deprecated syntax" warning. However, in all cases I do strive to understand what the heck is wrong that Perl is complaining about and then try to "make Perl happy". Sometimes with deprecated syntax, the error may be so pervasive that is not practical.

        I understand that in your production code, this string will come from elsewhere instead of an assignment statement like above. Be that as it may, I still strongly advise understanding what a Perl warning is telling you and fixing all test code so that it runs without any warnings. I have heard that Perl runs slightly slower with warnings enabled. I have never benchmarked that because this just hasn't been a significant factor in my work. I recommend leaving warnings enabled at all times.

Re: Evaluating user-entered captured groups during Perl substitution
by Anonymous Monk on Mar 16, 2020 at 09:38 UTC

    Yes, I have had to use some careful processing to untaint these inputs before this segment of the code, but I think that part is working--so assume there are no issues with taint at this point, and that the user's original input is unchanged--though feel free to suggest a better method for untainting that would dovetail nicely with the code above.

    wronp

    interpolate doesn't require eval. Rookie move

      Anonymous,

      Interpolation cannot perform a substitution, nor would an intended interpolation of the variable for a capture group such as $1 interpolate correctly outside of the substitution.

      I don't mind being called a rookie. I consider myself to basically be one, despite having been learning Perl for nearly 15 years. I didn't study computer programming in college, and feel much less gifted than most here, which is why I so much appreciate the advice of those here whom I look up to for their skill. But posting inaccurate information regarding the possibility of using interpolation, and providing no example for how such would solve this issue, is not helpful.

      Because my variables have come in from an HTML form, they are "tainted," whether I like this or not. Interpolation will neither untaint them, nor perform the required substitution. If I have misunderstood, or am somehow in the wrong, I will welcome your courteous correction.

      Blessings,

      ~Polyglot~

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11114321]
Approved by haukex
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2020-11-30 12:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?