Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Storable - File empties itself?

by feumw (Sexton)
on Apr 13, 2023 at 08:40 UTC ( [id://11151626]=perlquestion: print w/replies, xml ) Need Help??

feumw has asked for the wisdom of the Perl Monks concerning the following question:

We had a service provider during a project and one of them built a feature calles "Userlist and Useraction". This has two parts. There is an overview "Userlist" where we can see the userid + timestamp (last time he was active). This "Userlist" just reads the Perl-Storage. The second part is the "Useraction" which is a Controller in our CMS which is called whenever a user does anything. This "Useraction" retrieves the Perl-Storage and stores username + timestamp.

Our issue is that sometimes (randomly) this Perl-Storage seems to be cleared. When we do stat on the file, only the "modified" changed. So the file does not get deleted. But it seems like it's loosing all it's content. Apparently it seems to be like once a month. We made sure there were no updates/backups or anything at this time of the day.

Since this is built into our CMS i can only try to provide parts of the code.

Userlist.PM
use Storable; use Time::Local; sub go { my $storeFile = '/db/users.store'; my $usersFile = (); if ( -e $storeFile ) { $usersFile = retrieve( $storeFile ); } # Do stuff and return me a precious overview of all my userids and + their timestamps # # userid | timestamp # 1337 | Fri Dec 16 12:11:43 2022 # 1234 | ri Jan 20 09:21:36 2023 # ... }
Useraction.PM
use Storable; sub go : Path { my $user = $common->{user}; # CMS Stuff: Get the curre +nt logged in user my $uid = $user->getInfo()->{id}; # CMS Stuff: Get his useri +d my $storeFile = '/db/users.store'; my $ts = localtime( time ); my $users = (); if (-e $storeFile) { $users = retrieve( $storeFile ); } $users->{$uid} = $ts; store $users, $storeFile; }

Is there any chance that the Module has some kind of issue if only using store and retrieve? On https://perldoc.perl.org/Storable I read something about recursion limitations. This could be the reason why we're having the issue once every month. We might reach the limitation after 1 month. I'm clueless right now what could be the issue here.

I tried to reproduce the error by copying one of our files and use the following script:

my $storeFile = "/db/users.store.test"; my $users = (); my $ts; for (my $i = 0; $i < 2000; $i++){ $users = retrieve( $storeFile ); $ts = localtime( time ); foreach my $KEY ( keys % { $users } ){ print "#" . $KEY . "#" . $ts . "#\n"; $users->{$KEY} = $ts; store $users, $storeFile; } usleep(250); }

And while the script is running and chaning the file i do ls -l on a second shell and sometimes the file will be set to 0KB.

Example File can be downloaded from my BSCW (Basic Support for Cooperative Work) Cloud

Replies are listed 'Best First'.
Re: Storable - File empties itself?
by hippo (Bishop) on Apr 13, 2023 at 08:57 UTC

    I notice 2 things right away with the code you have kindly provided. Firstly it is not using locking and secondly it performs no error checking (eg. the return values from retrieve and particularly store are never tested). I would expect to see both of these in critical code and without them there is unfortunately little to go on.

    And just in case it is relevant, it would be good to know which version of Storable your environment is using.


    🦛

      >> Firstly it is not using locking
      Yeah, that's right. Might this be an issue? I've never worked with this Module before.

      >> secondly it performs no error checking (eg. the return values from retrieve and particularly store are never tested)
      On the "Userlist" we check if the Perl-Storage File exists. Then we get all userids from the CMS and we check if they exists in the Perl-Storage. If yes show it's data otherwise we display a fixed string.
      if ( $storage->{$id} ) { # show userid + timestamp } else { # show userid + "never logged in" }
      On the "Useraction" we only check if the Perl-Storage file exists. Since this kind of system is only adding/overwriting data and not deleting anything we would add an entry to the Perl-Storage if it's existing otherwise overwrite it.

      I just read in the documentation "The routine returns undef for I/O problems or other internal error, a true value otherwise. Serious errors are propagated as a die exception." so I'll add proper error checking now.

      >>it would be good to know which version of Storable your environment is using
      Apache-Overview states Storable Version 2.56
        >> Firstly it is not using locking Yeah, that's right. Might this be an issue? I've never worked with this Module before.

        I rarely use Storable directly myself but yes it might well be an issue. I don't know the details of how Storable performs its I/O deep down but any time you might have two processes performing simultaneous I/O on the same file there is the possibility of catastrophic corruption. The very fact that the module offers functions like lock_store and lock_retrieve suggests that their use might be necessary on occasion.

        just read in the documentation "The routine returns undef for I/O problems or other internal error, a true value otherwise. Serious errors are propagated as a die exception." so I'll add proper error checking now.

        That seems like a worthwile action for sure.

        Apache-Overview states Storable Version 2.56

        I don't see that version listed anywhere but the 3.x releases started coming out in 2016 so you have a version which is quite old. It is at least worth bearing that in mind when reading newer documentation.


        🦛

        If more than one process at a time may decide to rewrite the file you use for Storable data, you either need locks, or you need to delegate the problem (e.g. to a relational database).

        For locking, see Re: Update config file parameters and flock. Note that locking can be messy if NFS is involved.

        Often it is easier not to re-invent the wheel. Just don't use files at all, put your data into a relational database. File locking, concurrent reading and writing are solved problems for relational databases. You just need to connect to the database. That's what DBI is for.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        >> Firstly it is not using locking > Yeah, that's right. Might this be an issue? I've never worked with this Module before.

        It's not really a question about this module. You have a file, and (on every page request?) you read the file, and write completely new data overtop the old file. Assuming this happens in a web controller, all you need is two users to simultaneously load a page and their controllers will be simultaneously overwriting the file. Only one update will win, and if the perfect timing happens of one controller clearing the file to begin writing it while the other controller has just started reading it, you end up with an empty file.

        There are dozens of database-in-a-file options to choose from here (and DB_File is a core perl module), but since it looks like all you want are a user ID and a timestamp, you could even just create a directory and "touch" an empty file per user id.

Re: Storable - File empties itself?
by bliako (Monsignor) on Apr 13, 2023 at 10:07 UTC
    my $fsize = -s $storeFile; print "filesize: $fsize\n"; die if $fsize == 0;

    the above will check on the filesize as the script is running from within the script.

    Also store {}, $storeFile will create the file to store that empty hash(ref) but not with zero size. So if you see zero size perhaps it's coming because you are re-initialising your code and re-running the part the file is created? And/or, as hippo suggests, store may mess up if it's called at the same time on the same file from 2 different processes without file-locking. See Re: Preventing multiple instances (References on flock and running one copy of a script at a time) (random link I recall seeing lately) for file locking.

      >> the above will check on the filesize as the script is running from within the script.
      Good idea.

      >> So if you see zero size perhaps it's coming because you are re-initialising your code and re-running the part the file is created?
      I haven't thought about this. Of course it's possible that multiple users working in the CMS might click at the same time. I'd assume that the Controller handles those but I can't make sure that there is only a single process using that file.

      I added your filesize/check as well as I added some "dies" and when I start my test-script in 2 different shells the second one immediately stops
      filesize: 9254 retrieve-error: at ../bin/store.pl line 19.
      I'll adjust out function a little and instead of die I might just let it Debug into an Logfile for now.
Re: Storable - File empties itself?
by feumw (Sexton) on Apr 14, 2023 at 07:01 UTC
    So while playing around with my test-script I opened two shells and ran my script from each. I found out that you get an error in case two processes use the same Perl-Storage. I expected that. But every time I reproduce this error the Perl-Storage is 0KB for a period of time. Depending on which script gets the access it happened that the file was cleared. So this is exactly what might happen in our system.

    I added debugging and error checking in our production environment. I'll deploy this on Monday. I'll keep you updated if we could make sure it's the "store" from different processes. If we can confirm this the second step will be to rework this script to avoid this.
      I'll keep you updated if we could make sure it's the "store" from different processes

      I very rarely gamble, but I'll bet that this is your problem.

      My experience of writing inter-process communication software with shared storage using Storable says that you are having collisions that result in broken writes.

      You need to provide adequate exclusive locking mechanisms, which isn't easy.

      You're running a CRM. That, by default means you may have multiple entries at once, all the time. You're likely only going to grow, so patching this with locks is not sustainable nor is it scalable.

      Rewrite all of it to use a proper database. Doing anything else is a temporary band-aid and will only cost your company money for nothing.

      Having a (potentially growing) multi-user input all using the same single output file is like pouring an ever increasing amount of liquid into a funnel and expecting the funnel to allow the increasing amount of liquid to flow through. It won't, no matter what you do.

      Even if you successfully manage to set up a proper locking mechanism (trust me, this is hard), there's always the single-file contention. You will forever have problems no matter what.

      To further, Storable is a proprietary format. Don't use it for the type of storage you're using it for. Serialize your data in a standard format.

        I want to expand on this, just so my point gets across...

        The general process of locking something for exclusive write is this:

        my $data = $thing->fetch; $thing->lock; $thing->write('new data'); $thing->unlock;

        ...all well and good. However, these are ADVISORY locks, not actual disk-based locks on the file. That means a script that isn't taught to lock/unlock the file could just write all over it even if your fixed script has it locked. In other words, all of your software that deals with this file must all honour and set locks properly. Let's now not forget about the OS itself. Someone opens the file in Finder or Windows Explorer or something, that has no idea nor does it care about your software advisory locks. It'll write to your file no matter what.

        Database. It's your only feasible way forward.

        Update: Here is a snippet of an example of the only way I've ever found to 100% ensure that write locking would be effective... a database transaction, which rolls back if the write failed. Note the FOR UPDATE. (I'm no DB expert so I'm putting this out there for feedback on improvements while providing an example for the OP/thread):

        if ($bcast_id) { my $broadcast_transaction_status = eval { # We have to disable AutoCommit so that all of the DB task +s # are build into a single transaction. This locks the row. $self->dbh->{AutoCommit} = 0; $self->dbh->do('BEGIN'); # We need to ensure the broadcast is still available. If i +t is, # we claim it. If it isn't, we do nothing my $bcast_status_check_query = qq~ SELECT BroadcastID FROM $db_table WHERE BroadcastID=? AND HostRunning IS NULL FOR UPDATE ~; my $broadcast_unclaimed = $self->dbh->selectrow_array( $bcast_status_check_query, undef, $bcast_id ); if ($broadcast_unclaimed) { # We only update the DB table with broadcast claimed s +tatus # if nobody else has claimed it yet $self->dbh->do( qq~ UPDATE $db_table SET HostRunning=? WHERE BroadcastID=? AND HostRunning IS NULL; ~, undef, "$hostname, Process $$", $bcast_id ); } else { # If the broadcast was claimed in between our first ch +eck and our # second check inside the transaction, we set this bro +adcast to no # longer available $bcast_id = 0; } # Commit the transaction and re-enable AutoCommit so that # it doesn't impact other DB operations $self->dbh->commit; $self->dbh->{AutoCommit} = 1; 1; } } if (! $broadcast_transaction_status) { $self->dbh->rollback; $bcast_id = 0; _email_sysadmins( $self->dbh, "Broadcast $bcast_id claim rolled back", "Broadcast $bcast_id host claim transaction rolled back: $ +@" ); }

        Note the likelihood that the code may have come from a CMS ;) What happens there is we check a row, make sure nobody else has updated it, proceed to update it, but if the update fails, the entire transaction is rolled out with the DB not being touched at all. This is true write locking which can not be had if using a file as the backend.

        Its called rename an atomic operation

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11151626]
Approved by marto
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (11)
As of 2024-04-12 13:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found