Hey Bill! The input of that sub is a path so under Unix it could be both - escaped or not escaped. Consider the following example:
set playground="${HOME}/playground"
set file1=${playground}'/fi$le1'
set file2=${playground}'/file2$'
set file3=${playground}'/f$i$l$e$3$'
set file4=${playground}'/fi\$le4'
mkdir -p ${playground}
touch ${file1}
touch ${file2}
touch ${file3}
touch ${file4}
It contains different use cases. If you take a look at them, you can see that:
ls -la $HOME/playground
total 8
drwxr-s--- 2 root root 4096 May 22 06:26 .
drwxr-s--- 7 root root 4096 May 22 06:26 ..
-rw-r----- 1 root root 0 May 22 06:26 f$i$l$e$3$
-rw-r----- 1 root root 0 May 22 06:26 fi$le1
-rw-r----- 1 root root 0 May 22 06:26 fi\$le4
-rw-r----- 1 root root 0 May 22 06:26 file2$
The two most interesting ones are file1 and file4. File1 does not escape the special symbol but File4 does escape it. I even can go further and create something like:
> touch $HOME/playground/fi\\\\\$le5
> ls -la $HOME/playground/
total 8
drwxr-s--- 2 root root 4096 May 22 22:57 .
drwxr-s--- 7 root root 4096 May 22 06:26 ..
-rw-r----- 1 root root 0 May 22 06:26 f$i$l$e$3$
-rw-r----- 1 root root 0 May 22 06:26 fi$le1
-rw-r----- 1 root root 0 May 22 06:26 fi\$le4
-rw-r----- 1 root root 0 May 22 22:57 fi\\$le5
-rw-r----- 1 root root 0 May 22 06:26 file2$
All of them are considered valid paths. My utility gets those paths and should check if there is a defined env in that path and if so, replace it. So the current algorithm is:
- While there is a substring in the path that starts with $ and does not have backslash before the $, do:
-- If the env is defined, replace it.
-- Otherwise, escape the $ symbol with a backslash to indicate that it does not have a defined env (otherwise you will get an infinite loop).
- Remove all the backslashes that escaped $.
So the problem with that algorithm is a path where the $ symbol is already escaped - in order words, to distinguish between custom escaping as part of the algorithm and original escaping. It's a really rare corner case but could happen and I'm wondering how to handle it.
The one thing that came to my mind in order to solve it, is to add some rare string, instead of just the backslash. For example, as you suggested, add UNSPECIFIED instead of the backslash. So you get something like:
drwxr-s--- 2 root root 4096 May 22 22:57 .
drwxr-s--- 7 root root 4096 May 22 06:26 ..
-rw-r----- 1 root root 0 May 22 06:26 fUNSPECIFIED$iUNSPECIFIED$lUN
+SPECIFIED$eUNSPECIFIED$3UNSPECIFIED$
-rw-r----- 1 root root 0 May 22 06:26 fiUNSPECIFIED$le1
-rw-r----- 1 root root 0 May 22 06:26 fi\UNSPECIFIED$le4
-rw-r----- 1 root root 0 May 22 22:57 fi\\UNSPECIFIED$le5
-rw-r----- 1 root root 0 May 22 06:26 file2UNSPECIFIED$
And then just remove every occurence of UNSPECIFIED. But I see two problems with it:
1. Of course, if user creates a file that contains UNSPECIFIED in the path, it will break it. But if there is no better solution that solves 100% of the cases, then it will do.
2. My regex is [^\\]\$\{?(\w+)\}?. The [^\\] is to say "every char except backslash". How can I say "every substring that starts with $ and does not have UNSPECIFIED before it? In order words, how to fix the regex sequences to support UNSPECIFIED instead of a backslash?
|