in reply to Re: slurped scalar map
in thread slurped scalar map

I am already past the "working" phase and in the "optimisation" phase.I'm curious about efficiency in terms of "best programming practise". The program (to large to post) creates a file consisting of N records and an index containing key info like fpos markers at the end. (the records consist of the stdout/stderr of several o/s commands and files => 30-50Mb/server for almost 100 servers)

The program currently reads the index first, then processes & reads each record as it requires it while processing the data-file. I'm trying find a faster solution, i.e. performing larger sequential reads upfront. Of course, it may have extra considerations, such as an max. slurp size.

This exercise will be worth it (in my mind at least) if I can understand the margin by which

<sequential slurp><process><process><process>

operations are faster than

<slurp 1 record><process><slurp next record><process> ...

Hope this makes sense.


Replies are listed 'Best First'.
Re^3: slurped scalar map
by dragonchild (Archbishop) on Jun 20, 2006 at 17:28 UTC
    The OS already does that for you. When you read from a file, you're not actually reading from the disk itself. You read from a buffer than the disk manager creates for you. So, slurp-process-slurp-process is going to be nearly as fast (or faster) as slurp-process-process-process.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      In principle I agree, except on one important detail.

      How can the OS buffer multiple small seek/read operations as effectively as when it is doing a single seek to start-of-file and performing a large sequential read of the entire content?

      On Unix systems the readahead buffer is continuously increased with logic something like "ok you're still reading sequentially - let me double the readahead buffer when doing the next read".