Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: How to replace envs in path?

by BillKSmith (Monsignor)
on May 09, 2022 at 15:06 UTC ( [id://11143701]=note: print w/replies, xml ) Need Help??


in reply to How to replace envs in path?

It is usually easier to treat errors and special cases with Perl rather than a regex. In this case, do one field at a time. I do not know the error processing that you need, but this should get you started.
use strict; use warnings; use Test::More tests=>2; my $raw_path = '$HOME/work_dir/${TOOL_NAME}/$VERSION'; my $required_path = 'home/work_dir/hammer/1.01'; my %env = ( # Dummy hash for testing HOME => 'home', TOOL_NAME => 'hammer', #VERSION => '1.01', # Removed to force error ); sub resolve_env_in_path { my ($path) = @_; while ($path =~ m/\$\{?(\w+)\}?/) { my $field; if( exists($env{$1}) ) { $field = $env{$1}; } else { warn "Invallid path"; $field = 'UNSPECIFIED'; } $path =~ s/\$\{?(\w+)\}?/$field/; } return $path; }; ok( resolve_env_in_path($raw_path) ne $required_path, 'error expected' + ); $env{VERSION} = '1.01'; ok( resolve_env_in_path($raw_path) eq $required_path, 'good' );
Bill

Replies are listed 'Best First'.
Re^2: How to replace envs in path?
by ovedpo15 (Pilgrim) on May 22, 2022 at 13:51 UTC
    Hi Bill, thank you!
    Based on your answer, I built the following sub:
    sub resolve_envs { my ($path) = @_; while ($path =~ m/[^\\]\$\{?(\w+)\}?/) { my $env = $1; if (exists($ENV{$env})) { $path =~ s/([^\\])\$\{?(\w+)\}?/$1$ENV{$env}/; } else { $path =~ s/([^\\])\$\{?(\w+)\}?/$1\\\$$env/; } } $path =~ s/\\\$/\$/g; return $path; }
    It works good for most of the cases. The only corner case that it does not work is that if you have a file that already contains "\$" (not just "$", already escaped). For example:
    set playground="${HOME}/playground" set file1=${playground}'/fi$le1' set file2=${playground}'/fi\$le2' mkdir -p ${playground} touch ${file1} touch ${file2}
    For the first one it works, but for the second one it does not because I remove the extra backslashes that I added so you get ${playground}'/fi$le2. Actually it won't work for any escaped chars. Any ideas how can I handle this corner case?
      I really do not understand your test cases. Please post code that we can run and duplicate both your successes and your failures (and know the difference). Note how my example used Test::More to show that my function did exactly what I expected (Perhaps not what you wanted, but you can tell). You have a good start with your pass and fail test cases. Unfortunately, we cannot tell if they are single or doubly quoted strings or if you intend to include newlines. The same thing applies to your expected results. Are backslashes literal or are they escapes? The result fragments you posted are much harder to test than complete strings.
      Bill
        Hey Bill! The input of that sub is a path so under Unix it could be both - escaped or not escaped. Consider the following example:
        set playground="${HOME}/playground" set file1=${playground}'/fi$le1' set file2=${playground}'/file2$' set file3=${playground}'/f$i$l$e$3$' set file4=${playground}'/fi\$le4' mkdir -p ${playground} touch ${file1} touch ${file2} touch ${file3} touch ${file4}
        It contains different use cases. If you take a look at them, you can see that:
        ls -la $HOME/playground total 8 drwxr-s--- 2 root root 4096 May 22 06:26 . drwxr-s--- 7 root root 4096 May 22 06:26 .. -rw-r----- 1 root root 0 May 22 06:26 f$i$l$e$3$ -rw-r----- 1 root root 0 May 22 06:26 fi$le1 -rw-r----- 1 root root 0 May 22 06:26 fi\$le4 -rw-r----- 1 root root 0 May 22 06:26 file2$
        The two most interesting ones are file1 and file4. File1 does not escape the special symbol but File4 does escape it. I even can go further and create something like:
        > touch $HOME/playground/fi\\\\\$le5 > ls -la $HOME/playground/ total 8 drwxr-s--- 2 root root 4096 May 22 22:57 . drwxr-s--- 7 root root 4096 May 22 06:26 .. -rw-r----- 1 root root 0 May 22 06:26 f$i$l$e$3$ -rw-r----- 1 root root 0 May 22 06:26 fi$le1 -rw-r----- 1 root root 0 May 22 06:26 fi\$le4 -rw-r----- 1 root root 0 May 22 22:57 fi\\$le5 -rw-r----- 1 root root 0 May 22 06:26 file2$
        All of them are considered valid paths. My utility gets those paths and should check if there is a defined env in that path and if so, replace it. So the current algorithm is:
        - While there is a substring in the path that starts with $ and does not have backslash before the $, do:
        -- If the env is defined, replace it.
        -- Otherwise, escape the $ symbol with a backslash to indicate that it does not have a defined env (otherwise you will get an infinite loop).
        - Remove all the backslashes that escaped $.

        So the problem with that algorithm is a path where the $ symbol is already escaped - in order words, to distinguish between custom escaping as part of the algorithm and original escaping. It's a really rare corner case but could happen and I'm wondering how to handle it.
        The one thing that came to my mind in order to solve it, is to add some rare string, instead of just the backslash. For example, as you suggested, add UNSPECIFIED instead of the backslash. So you get something like:
        drwxr-s--- 2 root root 4096 May 22 22:57 . drwxr-s--- 7 root root 4096 May 22 06:26 .. -rw-r----- 1 root root 0 May 22 06:26 fUNSPECIFIED$iUNSPECIFIED$lUN +SPECIFIED$eUNSPECIFIED$3UNSPECIFIED$ -rw-r----- 1 root root 0 May 22 06:26 fiUNSPECIFIED$le1 -rw-r----- 1 root root 0 May 22 06:26 fi\UNSPECIFIED$le4 -rw-r----- 1 root root 0 May 22 22:57 fi\\UNSPECIFIED$le5 -rw-r----- 1 root root 0 May 22 06:26 file2UNSPECIFIED$
        And then just remove every occurence of UNSPECIFIED. But I see two problems with it:
        1. Of course, if user creates a file that contains UNSPECIFIED in the path, it will break it. But if there is no better solution that solves 100% of the cases, then it will do.
        2. My regex is [^\\]\$\{?(\w+)\}?. The [^\\] is to say "every char except backslash". How can I say "every substring that starts with $ and does not have UNSPECIFIED before it? In order words, how to fix the regex sequences to support UNSPECIFIED instead of a backslash?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11143701]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-03-29 06:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found