http://qs321.pair.com?node_id=253055


in reply to Very fast reads from an external program

If I had to point a finger at the most significant factor reducing your performance, I would be tempted to point at your use of system pipes. System pipes have a maximum buffer size, and when this buffer is filled, the producer process must be swapped out. On the other end of the pipe, the same remains true as well. The Perl program (the consumer) can only read one pipe buffer size (often 4096 bytes or 8192 bytes on Unix) before it needs to be swapped out. For very large amounts of data, the constant rescheduling necessary, along with the expectation that other processes on the machine are vying for the same resources, make system pipes inefficient for this solution.

If this is the case, there are a few alternatives you might consider. The first, is that tcpdump could write bytes to a real file, and not a pipe. This would allow tcpdump to pump as much data as it could into the system. In the most optimal solution, the bytes would be written to a file system based on virtual memory, or that used deferred writes, such that the tcpdump write() system calls succeed quickly, and the data is then immediately available to other processes. Your Perl script would then perform a 'read behind' technique that would read() from the file until EOF is encountered. At EOF, a system call such as yield(), poll(), or select() should be executed to yield the processor back to the tcpdump process. When the Perl script is scheduled for execution again, it should read until EOF again. This approach gives you an effectively limitless buffer size, as opposed to the system pipe approach that provides only a fixed (small) buffer size. Of course, the situation of the file becoming too large for the file system may be a consideration.

Otherwise, your only solution would be to hack tcpdump, or find an alternative program than tcpdump, that would invoke the Perl inlines inline, or that would transfer the data more efficiency, such as using a large shared memory segment.