Re: Very fast reads from an external program

If I had to point a finger at the most significant factor reducing your performance, I would be tempted to point at your use of system pipes. System pipes have a maximum buffer size, and when this buffer is filled, the producer process must be swapped out. On the other end of the pipe, the same remains true as well. The Perl program (the consumer) can only read one pipe buffer size (often 4096 bytes or 8192 bytes on Unix) before it needs to be swapped out. For very large amounts of data, the constant rescheduling necessary, along with the expectation that other processes on the machine are vying for the same resources, make system pipes inefficient for this solution.

If this is the case, there are a few alternatives you might consider. The first, is that tcpdump could write bytes to a real file, and not a pipe. This would allow tcpdump to pump as much data as it could into the system. In the most optimal solution, the bytes would be written to a file system based on virtual memory, or that used deferred writes, such that the tcpdump write() system calls succeed quickly, and the data is then immediately available to other processes. Your Perl script would then perform a 'read behind' technique that would read() from the file until EOF is encountered. At EOF, a system call such as yield(), poll(), or select() should be executed to yield the processor back to the tcpdump process. When the Perl script is scheduled for execution again, it should read until EOF again. This approach gives you an effectively limitless buffer size, as opposed to the system pipe approach that provides only a fixed (small) buffer size. Of course, the situation of the file becoming too large for the file system may be a consideration.

Otherwise, your only solution would be to hack tcpdump, or find an alternative program than tcpdump, that would invoke the Perl inlines inline, or that would transfer the data more efficiency, such as using a large shared memory segment.

Comment on Re: Very fast reads from an external program

Replies are listed 'Best First'.
Re: Very fast reads from an external program by Abigail-II (Bishop) on Apr 25, 2003 at 07:31 UTC
Well, if you are going to write the output to disk file, the program will even be slower, as disks are orders of a magnitude slower than memory. If you are writing to memory, you quickly have a problem because the large amounts of data tcpdump is writing will quickly fill up large chuncks of memory. If it's indeed the buffer size that is the problem, you're better off enlarging the buffer size, by tweaking the OS. But I expect that this is one of the cases one would prefer to use C instead of Perl. Abigail	[reply]
Re: Re: Very fast reads from an external program by MarkM (Curate) on Apr 26, 2003 at 03:45 UTC
You should note that I suggested the use of a file system that performed deferred writes, or that are based from virtual memory, and not a real disk. In any case, the pages just written are still likely to be in RAM, meaning that the write ahead is to RAM, and the read behind is from RAM. The real benefit of using a true file, is that the system can write the data to the file at its leisure, marking pages as clean and ready for reclaimation as necessary. Using a shared memory segment, for example, requires that the processes perform their own scheduling to ensure that the producer does not over-write pages not yet read by the consumer (during a fast network burst, for example). Using a file lets the kernel do this magic for us.	[reply]


Perl-Sensitive Sunglasses
	PerlMonks