Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Regexp substitution using variables

by MikeTaylor (Acolyte)
on Nov 25, 2020 at 22:29 UTC ( [id://11124232]=note: print w/replies, xml ) Need Help??


in reply to Re: Regexp substitution using variables
in thread Regexp substitution using variables

I understand your scepticism; this does indeed feel like one of those "How do I do X?" questions where the answer "Don't do X, do Y instead". (Is that what you meant by an "XY problem"? My situation is basically that I need to run a config file that specifies regular-expression substitutions. Specifically, my program is generating USMARC-format bibliographic records, and a config file says things like "in the 245$a field, replace /foo/ with 'bar' globally". In fact, the config looks like this:
"245$a": [ { "op": "regsub", "from": "foo", "to": "bar", "flags": "g" } ]
If you can think of a better way to do this, I am all ears — but bear in mind I do need the full power of regexp substitutions, e.g. the ability to include parenthesized sub-expressions in the "from" part and $1 back-references in the "to" part.

Replies are listed 'Best First'.
Re^3: Regexp substitution using variables
by davido (Cardinal) on Nov 25, 2020 at 22:55 UTC

    This is interesting. Can you provide some additional examples, including more esoteric ones, and possible a little sample text? I was just wanting to look at the challenges you're facing more pragmatically. Test cases would be fantastic.


    Dave

      I'm afraid I'm not yet far enough into the project to have solid examples, let alone test cases. I am waiting on the customer to let me know what specific transformations they need. But it would not be unlikely that we'd find, for example, a field containing call-numbers like PR.123.ABC that we needed to change to PR-ABC:123, which of course we could do with s/(.*)\.(.*)\.(.*)/$1-$3:$2/.

        Win8 Strawberry 5.8.9.5 (32) Thu 11/26/2020 5:05:35 C:\@Work\Perl\monks >perl use strict; use warnings; my $pattern = '(.*)\.(.*)\.(.*)'; my $replacement = '$1-$3:$2'; my $flags = ''; # $got_g is true if /g modifier present in flags. # ($flags, my $got_g) = sanitize_flags_detect_g($flags); fixup_forward_slashes($pattern, $replacement); my $value = 'PR.123.ABC'; print "replacement '$replacement' \n"; my $eval_string = "\$value =~ s/$pattern/$replacement/$flags"; print "eval_string '$eval_string' \n"; eval $eval_string; print "eval err '$@' \n"; print "output '$value' \n"; sub fixup_forward_slashes { s{/}'\/'g for @_; } ^Z replacement '$1-$3:$2' eval_string '$value =~ s/(.*)\.(.*)\.(.*)/$1-$3:$2/' eval err '' output 'PR-ABC:123'


        Give a man a fish:  <%-{-{-{-<

Re^3: Regexp substitution using variables
by AnomalousMonk (Archbishop) on Nov 26, 2020 at 03:41 UTC
    "245$a": [ { "op": "regsub", "from": "foo", "to": "bar", "flags": "g" } ]

    This seems like a good starting point. See neilwatson's article How to ask better questions using Test::More and sample data for the way forward. Once you have a few working test cases defined, the only thing left is to define about a million more, including generous edge and corner cases and exception cases! No problem. :)


    Give a man a fish:  <%-{-{-{-<

Re^3: Regexp substitution using variables
by LanX (Saint) on Nov 26, 2020 at 10:53 UTC
    > ... e.g. the ability to include parenthesized sub-expressions in the "from" part and $1 back-references in the "to" part.

    Honestly .... store the full real regexp in your config and eval it (or eval it into a sub to optimize execution time)

    "245$a": [ { "regexp": 's/(foo|bar)/He said "$1"/' } ]

    There is no way to "safely" abstract the capture-var away, it has to be compiled into the regex and this needs an eval or /ee with all connected security issues.

    > but bear in mind I do need the full power of regexp substitutions,

    I have the impression your JSON format is an attempt to make it language agnostic. But the "full power" means you will be stuck with Perl.

    And full power means that security becomes an illusion.

    DB<111> $_="abc" DB<112> s/(.)/@{[print "what? --> $1\n"]}/g what? --> a what? --> b what? --> c DB<113>

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Though ... there is one "lighter" version to build your replacement dynamically.

      eval the replacement-string into a sub, and apply just one /e at the s///

      DB<137> $rep = '<$1>' DB<138> eval qq( sub rep { "$rep" } ) DB<139> p "abc" =~ s/(.)/rep()/rge <a><b><c> DB<140>

      This will give you more control about what is happening, since you can use B::Deparse to check the replacement string before executing it.

      Like this you have at least a chance to reject dubious code.

      DB<140> p B::Deparse->new('-q')->coderef2text(\&rep) { use feature 'current_sub', 'evalbytes', 'fc', 'postderef_qq', 'say +', 'state', 'switch', 'unicode_strings', 'unico\ de_eval'; '<' . $1 . '>'; } DB<141>

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re^3: Regexp substitution using variables
by AnomalousMonk (Archbishop) on Nov 26, 2020 at 03:30 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11124232]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-03-28 22:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found