Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

knowing when a file is done writing to the server?

by flieckster (Scribe)
on Apr 04, 2017 at 23:43 UTC ( [id://1187053]=perlquestion: print w/replies, xml ) Need Help??

flieckster has asked for the wisdom of the Perl Monks concerning the following question:

Hello, i'm wondering a good place to start this little quest. i have a few scripts that move large batches of files on a local server, i have no problem moving them quickly, the issue i have is the user who is uploading has a much slower connection then me. how do i know when to move a file? if they are uploading 200 images, whats a good practice to know when any of the files is done writing to the server? thanks you for starting me on the best path.

Replies are listed 'Best First'.
Re: knowing when a file is done writing to the server?
by afoken (Chancellor) on Apr 05, 2017 at 04:39 UTC
    i have a few scripts that move large batches of files on a local server, i have no problem moving them quickly, the issue i have is the user who is uploading has a much slower connection then me. how do i know when to move a file? if they are uploading 200 images, whats a good practice to know when any of the files is done writing to the server?

    Start by repairing your shift key. It seems to have severe contact problems. Consider replacing your keyboard.

    Then, don't let your users upload to the final destination with the final filename. Upload to a temp file in a temp directory on the same disk, then rename to the final location and the final name. This way, the upload can take ages without affecting the scripts that move completed files.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: knowing when a file is done writing to the server?
by Your Mother (Archbishop) on Apr 05, 2017 at 00:42 UTC
Re: knowing when a file is done writing to the server?
by marinersk (Priest) on Apr 07, 2017 at 08:35 UTC

    It sounds like the upload process is a human action, such as SFTP or a browser upload. If the latter, you might be able to mask the underlying enginery.

    For example, if the technique you use is the temp upload approach, if the upload is happening with a CGI script or similar process over which you have complete control, you can simply do the move step (see below) after the upload is complete inside the CGI script.

    This becomes a human manual step if they're uploading using something like SFTP, which may or may not be culturally palatable, and is therefore a design decision. Enjoy.

    So -- the usual solutions:

    • Flag file (noted above)
    • Temp upload location and rename/move (noted above)
    • Track file size over time (complex, prone to failure, assumptive, last resort)

    Flag File

    The flag file technique is more often used in automated systems. It goes something like this:

    1. Sender uploads file (i.e., abc.dat)
    2. Sender uploads a tiny flag file (i.e., abc.dat.ok)
    3. Mover loop scans for flag files *.ok and finds this new one
    4. Mover renames flag file abc.dat.okto abc.dat.wipto show it's being worked on
    5. Mover moves file abc.dat
    6. Mover removes file abc.dat
    7. Mover removes flag file abc.dat.wip

    Renaming the flag file to show current status has the added advantage of permitting multiple worker threads to work on the same repository of files needing processing, since the rename operation is likely atomic (eliminates single-step race condition), and if not atomic, is at least very fast (low risk of race condition).

    Robust craftsmanship can fine-tune this process for efficiency and collision reduction, such as globbing *.okbut rechecking each flag file's existence before attempting the processing of it.

    Temp Upload

    The temp upload technique is more often used in automated systems. It goes something like this:

    1. Sender uploads file to temporary location (i.e., tempul/abc.dat)
    2. Sender moves file to repository location (i.e., rename tempul/abc.dat ./abc.dat)
    3. Mover loop scans for files in the repository and finds this new one
    4. Mover moves file abc.dat
    5. Mover removes file abc.dat

    As described, this assumes a single worker process for the mover; the mover process can use flag files or some other technique to achieve the same multi-worker-safe environment as the Flag File technique.

    Track File Size

    The file size tracking technique is fraught with assumptions and difficulties, but if you cannot gracefully implement another solution, it's a possible improvement over leaving everything to chance. It goes something like this:

    1. Mover process is configured to give each file a certain amount of time before it is presumed complete (i.e., 5 minutes with no change in file size)
    2. Sender uploads file (i.e., abc.dat)
    3. Mover loop scans for files in the repository and records file sizes; if the size has changed, record the current timestamp
    4. Mover eventually notices the file size hasn't changed for the previously noted configuration time (e.g., 5 minutes), and thus presumes the file upload is complete, and moves the file

    This process is dependent upon a dynamic file size being properly reported from the OS; I've seen some environments where the reported size of the file is the full size as soon as the file is opened, which obviously renders this technique impotent.

    As described, this assumes a single worker process for the mover; however, so long as a failure in the move operation does not cause the script to die, it is likely a fairly safe construct for the multiple worker process environment.

Re: knowing when a file is done writing to the server?
by Anonymous Monk on Apr 05, 2017 at 01:19 UTC
    The file goes in a temp directory until it is fully uploaded, and then the user who is uploading moves it to the final directory.
      Maybe I shouldn't have said "final". The uploader puts it in a "completed" directory when it is done. When your server process sees that something is in the "completed" directory, it can move it wherever it needs to go after that.
Re: knowing when a file is done writing to the server?
by ablanke (Prior) on Apr 07, 2017 at 07:53 UTC
    Hello flieckster,

    since it is not in your hands, i don't know if it is possible,
    but the uploading user could provide a checksum file for you
    or at least an ok-file (image-name.png.ok)
    after the uploading process is complete.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1187053]
Approved by beech
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-03-28 22:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found