Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^3: Splitting a Blocked file in Round Robin into smaller files

by KurtSchwind (Chaplain)
on Dec 14, 2015 at 16:19 UTC ( [id://1150240]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Splitting a Blocked file in Round Robin into smaller files
in thread Splitting a Blocked file in Round Robin into smaller files

I'm in the habit of forcing a flush.

Would the flush happen regardless? If so, I've picked up something new.

--
“For the Present is the point at which time touches eternity.” - CS Lewis
  • Comment on Re^3: Splitting a Blocked file in Round Robin into smaller files

Replies are listed 'Best First'.
Re^4: Splitting a Blocked file in Round Robin into smaller files
by BrowserUk (Patriarch) on Dec 14, 2015 at 17:39 UTC
    Would the flush happen regardless?

    Wrong question. The one you should be asking yourself is: why do you feel the need to to defeat the whole purpose of buffered IO?

    Is your hardware so unreliable or your code so flaky?

    Beside which your efforts are of limited benefit as every modern OS also buffers files in the system cache anyway.

    In the very rare circumstances that you have a real reason to avoid buffered IO, why not just set autoflush on the file handle with IO::Handle::autoflush()

    Or do it manually with IO::Handle::flush() within the loop.

    Forcing the system to keep opening and closing the files in order to achieve flushing, all for very limited benefit and no good reason, is very silly and hugely expensive.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I'm not sure it's a wrong question, or even a question of flaky code or buggy hardware, per se, but I do concede your following points. I could call autoflush or call flush manually.

      While it's true that there is a degree of buffering by OS and even by DASD, you can still eliminate one degree of buffering if it's applicable. I'm used to writing system logs for audit and government organizations, so the accepted pattern is to flush often. I get that for this example it's probably over-kill. Especially given that the input file isn't destroyed/altered as it's processed.

      --
      “For the Present is the point at which time touches eternity.” - CS Lewis
Re^4: Splitting a Blocked file in Round Robin into smaller files
by Apero (Scribe) on Dec 14, 2015 at 21:10 UTC

    I'm in the habit of forcing a flush.

    Would the flush happen regardless? If so, I've picked up something new.

    To extend on the earlier reply, there are 2 buffers at play here, and normally it's not prudent to worry about either of them, as BrowserUK suggested earlier. System calls are expensive, and when you needlessly open/close files, you're making the kernel do a lot more work; doing that in tight loops is a sure recipe for performance problems on loaded systems.

    This said, there are times when you did something "important" like adjust a system configuration file where a power outage at that point could have disastrous consequences. In this case, you'd really need to flush/sync both the PerlIO layer and the system cache layer (all without reopening the file, which is far more wasteful.) Again, these cases tend to be rare, and I do not suggest you use them everywhere as it is a performance penalty to do so. However..

    If you must flush both the PerlIO and OS buffers you would have to do something like this:

    $fh->print("Some critical text that must be flushed right away."); $fh->flush; $fh->sync;

    In the code above, the flush() call asks the PerlIO layer to send its buffers to the OS, and the sync() call asks the lower-level system to flush its buffers to the storage medium. This will take at minimum 2 system calls (such as write(2) and fsync(2),) and recall that system calls are expensive.

    Doing this where important is one thing, but don't do it out of habit. If you want tail -f to be responsive, flush() is sufficient (smarter would be to set $fh->autoflush(1) instead.) If it must get written to disk as well, follow it up with a sync() and take the performance hit. Either/both will degrade your code's performance, so use them wisely.

    And beware, because some terrible OSes or storage controllers lie to you and say the data has been written even when it hasn't. There's nothing you can do about that if you have such a device, so generally one doesn't worry about such things.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1150240]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (8)
As of 2024-04-23 12:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found