Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Improve pipe open?

by afoken (Chancellor)
on Apr 01, 2017 at 15:17 UTC ( [id://1186687]=perlmeditation: print w/replies, xml ) Need Help??

A small meditation started by Ssh and qx


Intro

Let's face it: qx is evil, as soon as you want to reliably pass arguments to a program. And it's not necessarily perl's fault. Blame the default shell (Update: see The problem of "the" default shell). Luckily, perl has multi-argument pipe open since 5.8.0:

open(my $pipe,'-|','/usr/local/bin/foo','bar','baz','1&2>3') or die "C +an't start foo: $!"; my @output=<$pipe>; close $pipe or die "Broken pipe: $!";

It's so easy. Granted, it takes two more lines than qx, but we got rid of the default shell. And that two extra lines could easily be wrapped in a function:

my @output=safe_qx('/usr/local/bin/foo','bar','baz','1&2>3');

But, of course, that would be too easy to be true. Why can't we have nice things?

Three-argument pipe open gets the nasty default shell back into play:

> perl -E 'open my $pipe,"-|","pstree --ascii --arguments --long $$ 1> +&2" or die $!;' perl -E open my $pipe,"-|","pstree --ascii --arguments --long $$ 1>&2" + or die $!; `-sh -c pstree --ascii --arguments --long 22176 1>&2 `-pstree --ascii --arguments --long 22176 >

Now what? We could resort to my favorite part of perlipc, "Safe pipe opens". 15 to 28 lines of code just to safely start an external program, and all of that only because perl wants to be clever instead of being safe.


How to fix it

Let's make multi-argument pipe open clever.

system and exec have the indirect-object-as-executable-name hack to prevent the default shell mess. Applying that to open might be possible, but still looks quite hacky:

open $list[0] my $pipe,'-|',@list or die "Can't open pipe: $!";

No! Just no!

So, do we really need to specify the executable twice? We usually don't want to lie to the target program about it's name. It might be useful to make a shell think that it's a login shell, but then again, that can also be done by passing an extra argument. No, we don't want to lie to our child process. If backwards compatibility was not a problem, we could simply disable the shell logic for any pipe open with more than two arguments. But for backwards compatibility, we can't do that. We need is a flag to disable the shell logic.

My first idea was to just double the dash in the MODE argument:

ModeActionUsage of default shell
-|Read from child's STDOUTenabled for three argument open,
disabled for more than three arguments passed to open
(legacy mode)
|-Write to child's STDIN
--|Read from child's STDOUTdisabled
|--Write to child's STDIN

But we still can do better: A single bit is sufficient for a flag. + is already used in MODE, but not in combination with the pipe symbol. So let's use + instead of - to disable the default shell:

ModeActionUsage of default shell
-|Read from child's STDOUTenabled for three argument open,
disabled for more than three arguments passed to open
(legacy mode)
|-Write to child's STDIN
+|Read from child's STDOUTdisabled
|+Write to child's STDIN

Yes, I'm aware that the difference between "+" and "-" in ASCII is two bits.


Update:

For a better mnemonic (ls -f uses "*" to indicate an executable file), we could use "*" instead of "-" to disable the default shell and specify the executable file in the third argument:

ModeActionUsage of default shell
-|Read from child's STDOUTenabled for three argument open,
disabled for more than three arguments passed to open
(legacy mode)
|-Write to child's STDIN
*|Read from child's STDOUTdisabled
|*Write to child's STDIN

Thanks to huck and hippo for finding two missing quotes.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re: Improve pipe open? (redirect hook)
by oiskuu (Hermit) on Apr 01, 2017 at 21:07 UTC

    Perl could certainly use a hook triggering before exec(). Performing custom setup between fork() and exec() is essential for correct operation in many scenarios. For example: thread save calling an external command, Re^6: Capture::Tiny alternative.

    So the IPC modules might use constructs like

    { local $SIG{__EXEC__} = \&_do_redirect; system ... }
    But this would no doubt have other creative uses. Syntactic sugar to make things neat.
    { use redirect qw( 3>&1 1>&2 2>&3 3>&- ); open my $fh, ...; `another cmd`; }

    ps. I'm not entirely sure if the hook ought to be post-fork or pre-exec.

      I don't see a need for a hook between fork and exec hidden in system and open, and abusing %SIG for that hook makes it even worse. That hook introduces action-at-a-distance.

      If you intend to make some stuff happen between fork and exec, write it explicitly:

      sub redirected_system { my @args=@_; my $pid=fork() // die "Can't fork: $!"; if ($pid) { # (parent) waitpid($pid,0); # plus whatever is needed to collect data from the child } else { # (child) # modify file handles as needed (redirection) # change UID, GID if needed # chdir if needed # chroot if needed exec { $args[0] } @args or die "exec failed: $!"; } }

      This is clean, readable, and has no action-at-a-distance. Of course, it is possible to use a generic function to allow changes to the child process without resorting to global variables:

      sub hooked_system(&@) { my ($hook,@args)=@; my $pid=fork() // die "Can't fork: $!"; if ($pid) { # (parent) waitpid($pid,0); # plus whatever is needed to collect data from the child } else { # (child) $hook->(); exec { $args[0] } @args or die "exec failed: $!"; } }

      I also don't see how a hook would solve the problem of perl invoking the default shell in case of three-argument pipe open or single-argument system. And please don't tell me that the hook code should start guessing how the single string should be splitted into a list of arguments. This is excactly what perl already does: It guesses, and if the string looks too complex to guess (see below), it delegates that to a random default shell. This is the cause of the trouble, not its solution.

      qx, ``, system and pipe open are already wrappers for fork and exec. Adding more and more parameters will give us nightmares like CreateProcessAsUser(), effectively passing more than 30 parameters plus command line arguments to a nightmare function that will finally start a child process. See also Re^3: Set env variables as part of a command.


      Perl guessing

      exec states:

      If there is only one element in LIST, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system's command shell for parsing (this is /bin/sh -c on Unix platforms, but varies on other platforms). If there are no shell metacharacters in the argument, it is split into words and passed directly to execvp, which is more efficient.

      So, what exactly are shell metacharacters? I don't know. My guess is that a lot of the non-alpanumeric, printable ASCII characters count as shell metacharacters. They may depend on the operating system, too. And they may have changed over time. It seems that Perl_do_exec3() in doio.c of the perl 5.24.1 sources contains at least a little bit of the guessing logic. And it seems that the word "exec" at the start of the string also forces perl to invoke the default shell, not only non-alphanumeric characters. To make things even worse, some of the logic depends on the preprocessor symbol CSH. My guess is that happens only if the default shell is a csh variant.

      BTW: A trailing 2>&1 seems to be handled by perl as a special case, without resorting to the default shell.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        Action-at-a-distance was precisely the intention in this case. One might then trivially enhance a standard capture with a certain additional effect like dropping of privileges. Callbacks like that allow for a (more) generalized routine instead of a bunch of specialized modules.

        Anyway, I was contemplating the numerous problems with piping/capturing I've witnessed on PM and elsewhere. Can you give an example where the list form open has caused mayhem, because of the one-element list?

        As far as qx{}; is concerned, I do not really see any problem. The string inside qx is not perl code, it is shell syntax. One could perhaps make a point about always requesting a shell, even when perl thinks this is redundant, like qx{}F; maybe. The opposite, to force an op to not do what it's intended to do, makes no sense.

        Edit. BTW: out of curiosity, do you sometimes use the <> operator in your code or do you always go for the safe diamond? I'd have simply plugged *that* hole, methinks...

        Edit2. Clarification in regards to "shell syntax". The qx is for interfacing with system shell, the meaning is thus "system shell syntax, whatever that may be". Never mind about the qx though, I just found your safe_qx() oddly named, that's all.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://1186687]
Front-paged by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (6)
As of 2024-03-28 15:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found