http://qs321.pair.com?node_id=11114321

Polyglot has asked for the wisdom of the Perl Monks concerning the following question:

This question is related to one I posted earlier in which I had limited the form to certain specific operations. I now wish to expand it to allow a more savvy user to enter their own regex to make substitutions on the text.

I desire to make an application where a logged-in user (maybe just me) may enter a substitution regex in order to make edits to a body of text in a database. The substitution needs to allow captured groups, and evaluate properly in perl when it is executed.

Suppose we have the following:

#FROM DATABASE TEXT $line = "Her house is on 34th Mt. Whitney St. near St. Mt. Helens St." +; #FROM INCOMING FORM INPUTS $query = "(St\.\s)(Mt\.\s)(?=Helens)"; $substitution = "Mount ${1}"; # USER MAY HAVE ENTERED "$1" #FOR RETURNED HTML HIGHLIGHT OF CHANGES $start = qq|<span class="highlight">|; $end = "</span>"; return "Regex containing code disallowed." if $query =~ m[\(\?\??\{]; return "Regex containing code disallowed." if $substitution =~ m[\(\?\ +??\{]; my $replace = sub { my $evaluate = sub { return eval($1); }; my $val = $substitution; $val =~ s/(\$\{\d+\})/$evaluate->()/eg; $val = "$start$val$end"; return eval($val); }; $line =~ s/$query/$replace->()/eg; #EXPECTED TEXT AFTER SUBSTITUTION $line = "Her house is on 34th Mt. Whitney St. near Mount St. Helens St +.";

However, the above does not perform properly, hence this question. It seems that eval will only work on an actual variable, and not when mixed with other text. Furthermore, in order to ensure I am doing the eval on every possible capture group, I need to iterate over the substitution side looking for each one, and the nested eval seems problematic. I'm not sure if this is the problem--or what to do about it.

Yes, I have had to use some careful processing to untaint these inputs before this segment of the code, but I think that part is working--so assume there are no issues with taint at this point, and that the user's original input is unchanged--though feel free to suggest a better method for untainting that would dovetail nicely with the code above.

Note that code in the regex is explicitly disallowed for security purposes--and because it should be wholly unnecessary for my application. With no code, no other variables should be applicable--I only want the captured groups to properly evaluate via their ordinary $1, $2, etc. notations.

EDIT: A solution to this issue was posted by "jo37" in his second post below.

Blessings,

~Polyglot~