http://qs321.pair.com?node_id=895605


in reply to Atomic Config Updater with History?

Atomic IO is as rare as rocking horse do-do.

If you write to a local text file, your blob may (will frequently) cross a disk block boundary. Therefore, the start of the blob may get written to disk as a part of one 4k disk block; and the end of it as part of another. If the machine goes down between the two writes, non-atomic.

Compound that with the fact that all modern OSs use transparent file caching. Even once you've "written to disc", you've often only written to the cache. and if something crashes, what you think you've already written can get lost.

And unless your file-system allows you to make your log file contiguous, it is quite possible that due to write reordering, that the second block in the first scenario might get written before the first. And if the interrupt occurs at the inappropriate time, you have the end of a blob but not it's start.

If you are prepared to bypass Perl & your CRT lib, then your OS might provide write-thru file handling APIs. If you use these, synchronous IO, and write 4K blocks every time, you can achieve something approaching atomic. But still, disk heads do occasionally crash mid-block.

If your file is on a remote system, the transmission protocol (TCP/IP or whatever) is free and will frequently aggregate and/or break up writes in order to form transmission packets optimised for the comms fabric. And that can happen multiple times if the transmissions cross fabric boundaries (eg. cat5 to fiber and back; or 54Mb/s to 1Gb/s; etc) in the course of it's journey.

The point is, that if you really need total reliability under any (well most at least) circumstances, then you need to stop thinking "atomic" and start thinking Two Phase Commit.

Personally, I think a transactional DB is your best bet. INSERT the message saying what you are about to do; do it; then commit.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: Atomic Config Updater with History?

Replies are listed 'Best First'.
Re^2: Atomic Config Updater with History?
by Illuminatus (Curate) on Mar 26, 2011 at 16:56 UTC
    All of what BrowserUk says is true, although most journaled filesystems will limit your liability when using regular files. Couple this with (at least on Linux) sync-ed writes (which you *don't* want to do a lot of, as they are dreadfully slow), and you might get by. A transactional DB is better, but you do have to remember that the two-phase commit is designed to ensure that multiple operations on the DB itself are either all done, or not done (ie rolled back). When part of what you are trying to 'commit' has nothing to do with the database (ie, transition a server to a new state), then you are still not atomic. In BrowserUk's example, if you
    1. INSERT the command msg
    2. perform the command
    3. commit the INSERT
    but the system crashes before step 3 completes, the DB will rollback the INSERT, but the DB has no knowledge of the command you performed. You would have to take the additional step of looking at the DB's transaction log (which many DBs allow you to record in a readable format). Upon crash recovery, if you see a 'command rollback', you would want to check the state of the execution of that command, and try to 'roll that back' too...

    fnord

      Sync'd writes (and write-thru) can be slow due to the absence of caching, but so are most journalled file-systems.

      When part of what you are trying to 'commit' has nothing to do with the database (ie, transition a server to a new state), then you are still not atomic. In BrowserUk's example,

      Agreed. That example only works if repeating the performed, but unlogged command over is effectively a noop.

      Mind you, breaking processing up into steps such that any given step can be repeated 2 or more times without affecting the overall result is something of a black art in itself. The basic steps are: a) don't discard source data for a given step, until the output data for that step has been successfully processed by the next step. b) discard any source data for this step that is 'incomplete'. Sentinel values are useful for this c) Once the input data for this step--ie. the output of the previous step--has been successfully processed, delete the associated input data to the previous step. Of course, in critical systems, 'delete' is probably spelt 'move to archive'.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Atomic Config Updater with History?
by pileofrogs (Priest) on Mar 25, 2011 at 22:43 UTC

    Yes! Thank you! That's exactly the kind of thing I needed to know. ++