http://qs321.pair.com?node_id=289792

monsieur_champs has asked for the wisdom of the Perl Monks concerning the following question:

Dear fellows
I'm building a CGI script at work that permits edition over a lot of fields from a record stored in a database. I'm really concerned about performance, and whish to minimize useless update statements when updating data on this database.

What is the best strategy to update data on the database? Should I update all the fields, every time, or there is something I can do for minimizing the load on the database server?

I'm thinking about building a SQL statement on-the-fly, so I will update only the changed fields, but I don't know if this is the best strategy, or even a Good Way To Do It[tm]. Any help, comments, new issues, suggestion or code skeletons are wellcome.

Many thanks to all in advance.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
monsieur_champs

Replies are listed 'Best First'.
Re: Record edition management strategy needed
by Abigail-II (Bishop) on Sep 08, 2003 at 15:54 UTC
    In general, to speed up the database side, you want to
    • Minimize the times you connect to the database. So, you cache connection.
    • Minimize the times you compile SQL statements. So, you use stored procedures.
    • Minimize the search time to the appropriate row(s). So, you use indices.
    • Minimize the need to disk access. So, you let the database use as much memory as possible for caching.
    • Minimize the number of pages involved. So, you keep your rows as small as possible, possibly using various tables instead of having many columns.
    Your question is very general, and without knowing the database server, and the structure of the database, there's not much you can say. A method that's faster on one server, could be slower on another.

    Not that any of this has anything to do with Perl.

    Abigail

      After anything else, thank you for your interest, abigail.
      Here is some more work on the same issue:

      • I'm using a legacy CGI environment, I don't have any ideas about how to maintain connection caching here.
      • I'm using MySQL (legacy, too), so I can't use stored procedures: there is no support for them here.
      • I'm currently using indexes and I'm sure that data is stored as closer to the third normal form as possible.
      • I'm very limited on the database server available memory. MySQL is doing the best it can on keeping data on memory.
      • My tables are legated, too, and I can't transform things that much (changing data structs don't worth the effort).

      The perl-related part is: without changing things at the database (this could break other applications that use the same data), I need to develop a strategy to control changes, and built a UPDATE SQL query to apply changes to the database, minimizing the data writen. If you know any modules that could be useful, or any strategy that I can implement, please let me know.

      Thank you very much for your attention.
      =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      monsieur_champs

Re: Record edition management strategy needed
by simonm (Vicar) on Sep 08, 2003 at 16:25 UTC
    Don't put a lot of work into optimization unless you have a good reason to believe it's going to make a difference. Because each database server behaves differently, this probably requires some local testing to determine what makes sense in your particular context.

    Try creating a script with the Benchmark module that compares the time required for 1000 updates of all columns, and 1000 updates of just one or two columns, and see if the difference is significant enough to make it worth your while.

Re: Record edition management strategy needed
by bean (Monk) on Sep 08, 2003 at 16:18 UTC
    Updating all the fields all the time (in a single update statement) should not affect database performance in a meaningful way (unless you are using extremely wide columns), and will be easier to read and maintain than building the SQL on the fly.
    Update
    As Abigail II noted, stored procedures can speed things up, but you are unlikely to save a worthwhile amount of time on a simple update statement on a single table. However, if you are querying the database to validate the data you're about to update, you would probably get a speed boost by putting the validation and update into a stored procedure, if only because you would reduce the communication between the database and Perl (another method of speeding up Perl/SQL that Abigail II mentioned).

      I will take this into account, bean. But I can't update all the fields. I have a password field in one of the tables, and can't update this field unless I know the actual password for this user (it's a user record). The administrator is not obligated to know the password to update the user.

      Other fields have simmilar problems: I don't want to update those fields unless they changed (blank fields on requests aren't considered "changed").

      I'm thinking about iterate over all the fields and set an update query only for those that aren't blank. Is this an acceptable, readable, affordable strategy?

      Thank you very much
      =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      monsieur_champs

        That sounds like a good strategy, monsieur_champs. I just wanted to warn you away from trying to optimize a few hundredths of a second off an update - if you were going to query the database to compare the columns so you could only update the ones that changed, that would almost certainly take more time than updating all of them. If you have valid reasons for not updating some columns, however, that's something else entirely. You could always "cache" the original values in the form and compare the cached values to the new ones - but it is unlikely to be worth the trouble.

        MrChromeDome raises a good point about updating indexes and keys (with constraints) - although I would argue it probably wouldn't matter for a single update on a single table. However, in a good database design the index will generally not change when you update a record (changing the index would change its essential identity, which would be logically equivalent to deleting the record and inserting another). Also, if the column is not really changing (you are setting it to the original value), the database should be smart enough to recognize that. Another thing to take into account (when updating many rows on databases that support it) is the possibility that there is an update trigger on the table, which may do different things depending on the column updated. If you are updating millions of records, triggers, indexes, and constraints can make a big difference.

      Actually, if some of those columns are part of a primary/foreign key or an index, updating those columns can make a huge difference. Whatever DBMS you're using will have to either partially or completely reconstruct those index structures to account for the update.

      I strongly recommend updating only those fields that you need. Aside from being a potential timesaver, it's just cleaner IMHO.

      Cheers!
      MrCromeDome