|Problems? Is your data what you think it is?
The problem of "the" default shellby afoken (Chancellor)
|on Dec 09, 2017 at 13:17 UTC
I've got a little bit tired of searching my "avoid the default shell" postings over and over again, so I wrote this meditation to sum it up.
What is wrong with the default shell?
In an ideal world, nothing. The default shell /bin/sh would have a consistent, well-defined behaviour across all platforms, including quoting and escaping rules. It would be quite easy and unproblematic to use.
But this is the real world. Different platforms have different default shells, and they change the default shell over time. Also, shell behaviour changed over time. Remember that the Unix family of operating systems has evolved since the 1970s, and of course, this includes the shells. Have a look at "Various system shells" to get a first impression. Don't even assume that operating systems keep using the same shell as default shell.
And yes, there is more than just the huge Unix family. MS-DOS copied concepts from CP/M and also a very little bit of Unix. OS/2 and the Windows NT family (including 2000, XP, Vista, 7, 10) copied from MS-DOS. Windows 1-3, 9x, ME still ran on top of DOS. From this tree of operating systems, we got command.com and cmd.exe.
By the way: Modern MacOS variants (since MacOS X) are part of the Unix family, and so is Android (after all, it's just a heavily customized Linux).
Some ugly details:
And when it comes to Windows (and DOS, OS/2), legacy becomes really ugly.
So, to sum it up, there is no thing like "the" default shell. There are a lot of default shells, all with more or less different behaviour. You can't even hope that the default shell resembles a well-known family of shells, like bourne. So there is much potential for nasty surprises.
Why and how does that affect Perl?
Perl has several ways to execute external commands, some more obvious, some less. In the very basic form, you pass a string to perl that roughly ressembles what you would type into your favorite shell:
Looks pretty innocent, doesn't it? And it is, until you want to start doing real-world things, like passing arguments containing quotes, dollar signs, or backslashes to an external program. You need to know the quoting rule of whatever shell happens to be the default shell.
For those cases, perl is expected to pass the string to /bin/sh for execution. Except that in this innocent case, and several other cases, perl does not invoke the default shell at all. Burried deep in the perl sources, there is some heuristics happening. If perl thinks that it can start the executable on its own, because the command does not contain what is documented as "shell metacharacters", perl splits the command on its own and can avoid invoking the default shell.
Why? Because perl can easily figure out what the shell would do, and do it by itself instead. This avoids a lot of overhead and so is faster and does not use as much memory as invoking the shell would.
Unfortunately, the documentation is a little bit short on details. See "Perl guessing" in Re^2: Improve pipe open? (redirect hook): From the code of Perl_do_exec3() in doio.c (perl 5.24.1), it seems that the word "exec" inside the command string triggers a different handling, and some of the logic also depends on how perl was compiled (preprocessor symbol CSH).
If you don't need support from the default shell, you can help perl by passing system(), exec(), and open() a list of arguments instead of a string. This "multi-argument" or "list form" of the commands always avoids the shell, and it completely avoids any need to quote.
(Well, at least on Unix. Windows is a completely different beast. See Re^3: Perl Rename and Re^3: Having to manually escape quote character in args to "system"?. It should be safe to pretend that you are on Unix even if you are on Windows. Perl should do the right thing with the "list form".)
So our examples now look like this:
Did you notice that qx() and its shorter alias `` don't support a list form? That sucks, but we can work around that by using open instead. Writing a small function that wraps open is quite easy. See "Safe pipe opens" in perlipc.
OK, let's assume I've convinced you to use the list forms of system, exec, and open. You want to start a program named "foo bar", and it needs an argument "baz". Yes, the program has a space in its name. This is unusual but legal in the Unix family, and quite common on Windows.
my @command=('foo bar','baz'); and one of:
All is well. Perl does what you expect, no default shell is ever involved.
Now, "foo bar" get's an update, and you no longer have to pass the "baz" argument. In fact, you must not pass the "baz" argument at all. Should be easy, right?
my @command=('foo bar'); and one of:
Wrong! system, exec, and even open in the three-argument form now see a single scalar value as the command, and start once again guessing what you want. And they will wrongly guess that you want to start "foo" with an argument of "bar".
The solution for system and exec is hidden in the documentation of exec: Pass the executable name using indirect object syntax to system or exec, and perl will treat the single-argument list as list, and not a single command string.
my @command=('foo bar'); and one of:
If the command list is not guaranteed to contain at least two elements (e.g. because arguments come from the user or the network), you should always use the indirect object notation to avoid this trap.
Did you notice that we lost another way of invoking external commands here? There is (currently) no way in perl to use pipe open with a single-element command list without triggering the default shell heuristics. That's why I wrote Improve pipe open?. Yes, you can work around by using the code shown in "Safe pipe opens" in perlipc and using exec with indirect object notation in the child process. But that takes 10 to 20 lines of code just because perl tries to be smart instead of being secure.
Avoiding external programs
Why do you want to run external programs? Perl can easily replace most of the basic Unix utilities, by using internal functions or existing modules. And as an additional extra, you don't depend on the external programs. This makes your code more portable. For example, Windows does not have ls, grep, awk, sed, test, cat, head, or tail out of the box, and find is not find, but a poor excuse for grep. If you use perl functions and modules, that does not matter at all. Likewise, not all members of the Unix family have the GNU variant of those utilities. Again, if you use perl functions and modules, it does not matter.
Note: The table above is far from being complete.
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)