Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

pipes and return data

by Anonymous Monk
on Oct 31, 2003 at 18:15 UTC ( [id://303640]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Gr8 11

I open a pipe from one Perl program to another like so

open (PROG, "|$HOME/server/bin/prog.pl")||die "Cannot open $HOME/server/bin/prog.pl"

And then I print anywhere from 1 million to 2.5 million lines of data like so

foreach $line (@array){ print PROG "$line\n"; }

But I just made some modifications to prog.pl which necessitates passing data back to the calling script. What can I do (Can I read from the pipe, or is there another way I can get data passed back)?

Thank you so much!

me

Replies are listed 'Best First'.
Re: pipes and return data
by Roy Johnson (Monsignor) on Oct 31, 2003 at 18:32 UTC
    perldoc -q pipe
    How can I open a pipe both to and from a command?
    The IPC::Open2 module (part of the standard perl distribution) is an easy-to-use approach that internally uses pipe(), fork(), and exec() to do the job. Make sure you read the deadlock warnings in its documentation, though (see IPC::Open2). See "Bidirectional Communication with Another Process" in perlipc and "Bidirectional Communication with Yourself" in perlipc

    You may also use the IPC::Open3 module (part of the standard perl distribution), but be warned that it has a different order of arguments from IPC::Open2 (see IPC::Open3).

      Hi Roy,

      First of all, thwanks for your help. I **think** this is what I need.

      Now you'll see what a novice I am

      When I use perldoc as you've descibed, how do I view this:

      Make sure you read the deadlock warnings in its documentation, though (see L<IPC::Open2>).

      Also, How do I signal that I am done writting to the pipe and I would like prog.pl to proces the data and print it's output to Reader?

      thanks, me

        Anytime you see "see <Module::Name::Here>" you can do perldoc Module::Name::Here to see the documentation, provided the module is installed on your host.

        In terms of finishing with the application, if you end up going with the IPC::Open[23] methodology you would do something ala

        # assuming READ, and WRITE are the reader and writer FHs # and $pid is the pid of the process spawned close(READ); close(WRITE); waitpid $pid, 0; # or if you used FileHandle objects or autovivified scalars close($read); close($write); waitpid $pid, 0;

        After rereading your post, I noticed I didn't answer the right question. How does the program presently know you are done writing? Is there some sentinel value? Do you pass a <CTRL> + d? You will simply do the same at the end of your write loop. Alternately if you don't have that functionality presently in the prog, you could add it and then make use of it :).

        use perl;

        "When I use perldoc as you've descibed, how do I view this: Make sure you read the deadlock warnings in its documentation, though (see L<IPC::Open2>). "

        What they mean is to man IPC::Open2 or look at the IPC::Open2 library docs however you view library docs on your system. That whole "L<...>" is a way of doing a link to another data source in perl POD documentation... but it seems to be either broken or broken in your POD-viewer.

        Oh, and as for signalling that you are done writing, etc... that's what they're talking about in the whole deadlock-prevention business. The issue is, you need to make it completely clear to each process which one is writing and which one is reading at any given time. If there is any ambiguity, and both process think that they should be reading or both think they should be writing... then you're deadlocked.

        A simple means of avoiding this (if it fits your problem) is to provide "framing data" in your streams. This is sort of akin to saying "over" on a walky-talky. You mean "I'm done writing, it's your turn to write". The way you do that is you can isolate some string that should never appear in your data, and you send that string as a magical token over your stream as such a signal.

        If there is no way to determine a token which should never appear in your data, then you would need to encapsulate your data somehow. A simple example is to pick some magical character... like let's say a newline character ("\n") and an escape character like a backslash. Then in the program writing the data, you turn all newlines into \n's and all \'s into \\'s, and in the reading program you do the opposite. Now the actual newline character won't appear anywhere in your data, and you can use it to "frame" your data.

        # writer $data =~ s/\\/\\\\/g; $data =~ s/\n/\\/g; print PIPE $data,"\n"; # reader $/ = "\n"; $data = <PIPE>; chomp $data; $data =~ s/\\n/\n/g; $data =~ s/\\\\/\\/g;
        Of course, that example assumes that you can easily establish a protocol (by that I mean simply a set of rules) for which process should be reading and which process should be writing at any given time. If you need to go beyond that, then you're really opening up a can of worms and should look into some more robust frameworks for IPC (inter-process-communication).

        ------------
        :Wq
        Not an editor command: Wq

        If you only have one batch of data to send to your child process, you can signal that you're done by just closing the write filehandle. The child process will see this as EOF in its STDIN, but will still be able to write to STDOUT just fine. Of course, once you've done this, you can't send any more data to the child process.

        Here's a short example:

        #!/usr/bin/perl -w # This is t6 use strict; use IPC::Open2; open2(\*READ,\*WRITE,'/tmp/t6b') or die "Error running sort: $!\n"; print WRITE join("\n",5,2,8,7,9,1,3,6,4,0),"\n"; close(WRITE); while (<READ>) { print "$0 read $_"; } #!/usr/bin/perl -w # This is t6b use strict; use vars qw(@a); while (<>) { push(@a,$_); } # OK, now we've gotten EOF. print sort @a;
Re: pipes and return data
by mcogan1966 (Monk) on Oct 31, 2003 at 18:29 UTC
Re: pipes and return data
by TomDLux (Vicar) on Oct 31, 2003 at 22:41 UTC

    Not related to your question, but my pet peeve is code that proceesses data element by element, when it could be handled all at once.

    There's no need to process your output line by line. If you print an array on it's own:

    print PROG @array

    the array elements are printed without any separator. If this is a CGI script, separators are ignored by the browser, anyway, so that should be the fastest solution. But, for some odd reason, it is actually quite slow.

    If it isn't HTML, and it isn't convenient to add newline characters when the array is generated, you'll need to introduce the newlines at the print. If you have an array in a double-quoted strring, the array elements are printed separated by the value of the variable $". So temporarily re-defined $" to be newline, and use the array in a string ... the string will need a newline at the end.

    { local $" = "\n"; print PROG "@array\n"; }

    That's equivalent to using join to convert the array into a string, and equally fast.

    print PROG join( "\n", @array), "\n";

    That makes me think the underlying code is very much identical. Don't forget that Perl built-ins ( and natively coded module routines ) are fast, while re-implementing equivalent constructs in perl is slower.

    I timed the options, one hundred iterations of each with an array of one million elements. join and local are 6 times as fast as manual looping and 3 times as fast as printing the array outside a string.

    use Benchmark; for ( 1..1000000 ) { push @a, "$i"; } open PROG, ">/dev/nul"; timethese( 100], { manual => sub {for $line ( @a ){ print PROG "$line\n";}}, join => sub {print PROG join( "\n", @a ), "\n" }, local => sub {local $"="\n"; print PROG "@a\n";} html => sub {print PROG @a;} }); ##### output -> $ perl t.pl Benchmark: timing 100 iterations of join, local, manual... html:133 wallclock secs (131.79 usr+ 0.15 sys = 131.94 CPU) @ 0.76 +/s (n=100) join: 45 wallclock secs (44.04 usr + 0.01 sys = 44.05 CPU) @ 2.27/ +s (n=100) local: 45 wallclock secs (44.51 usr + 0.00 sys = 44.51 CPU) @ 2.25/ +s (n=100) manual:275 wallclock secs (274.17 usr+ 0.04 sys = 274.21 CPU) @ 0.36 +/s (n=100)

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

      Please, if you are going to post a benchmark and results, do so responsibly. The code you posted has (at least) two errors:

      timethese( 100], {

      Where'd that spurious ']' come from? I realize this was probably a cut-n-paste error, but still...

      open PROG, ">/dev/nul";

      This one is more serious. It implies that you tested printing your strings to a non-existent file handle. Ooops.

      Benchmark: timing 100 iterations of join, local, manual...

      Wait... what about 'html'? So, I guess you ran that later and added it into your results?

      I probably would have missed all of this had I not been suspicious of your results. I would have guessed the 'html' version to be the fastest by far... so I had to check for myself. Here are the results I got, after correcting the errors noted above:

      Frankly, I still don't know how you managed to get such poor figures for the 'html' code.

      -sauoq
      "My two cents aren't worth a dime.";
      
      That's pretty cool. I know mypost is semi-meaningless (technically, at least), by I really appreciate your input. I was not aware of this perf. issue (nor the local trick). thanks Tomd, me

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://303640]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-20 01:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found