Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

mirroring mySQL database

by filmo (Scribe)
on Dec 06, 2001 at 07:17 UTC ( [id://129843]=perlquestion: print w/replies, xml ) Need Help??

filmo has asked for the wisdom of the Perl Monks concerning the following question:

Okay, our principle database is hosted by our web service company. (Using Perl and mySQL).

What I'd like to do is mirror that database to our development environment without having to dump the database from the server and import it to the development environment. Instead, I'd like only that days changes somehow reflected in the development database. Here are the catches.

1) The mySQL database on the live server can only be accessed by the localhost, so I can't simply log in remotely.

2)The local development platform isn't up all the time, or even most of the time, so I can't have the remote Perl scripts simply access the development database as a seperate call on-the-fly

3)I'd like it to update in batches. i.e. have the live database generate a set of the changes as they occur (all the SQL commands) and then send that set of commands to the development server when it is available to perform the mirror.

My first inclination is to have it so that every call to the database via DBI is recorded to a file. For example

UPDATE sales SET qty = 12, price = 10 WHERE trans_id = 13 INSERT customers SET cust_name = 'Bob', address = '1234 Main Str.' ....etc. SELECT commands would not be included as they don't modify the databas +e
Later that file could be either e-mailed to the development platform or retrieved via FTP when the development platform is available for such activities. The retrieved file would be executed to the development mySQL database, and in theory, a mirror would be created. Thus, the development mySQL database should be up-to-date as of it's last retrieval.

I'd be interested in anybody's ideas or suggestions as to how to implement such a system. (or if it's already been done and I'm reinventing the wheel ;)

I'm also starting to get my fingers wet with XML, so if somebody has a nifty XML fix for this problem I'd be interested to learn it.
--
Filmo the Klown

Replies are listed 'Best First'.
Re: mirroring mySQL database
by blakem (Monsignor) on Dec 06, 2001 at 07:25 UTC
    You should take a look at the mysql replication FAQ... Without remote access, you'll probably have to work out your own binlog distribution mechanism.

    -Blake

      From my cursory reading of the replication FAQ, it seems to require that I have mysqladmin capabilities. As it is a hosted account, the only privileges are those grandted to specific databases attached to the account. I can not start or restart the server or execute any FLUSH commands (or for that matter, do anything of an administrative nature to the mySQL server.)
      --
      Filmo the Klown
Re: mirroring mySQL database
by VSarkiss (Monsignor) on Dec 06, 2001 at 09:08 UTC

    You can cook up a completely general solution for this, but it takes application and database changes. Put most simply, you need to maintain a last-modified-time column in any table that you want to mirror. Any code that updates or inserts the table needs to update that column with a time of sufficient resolution for your application (usually down to seconds).

    With that data available, you can retrieve all rows that are newer than the last time you looked at the table (presumably you're maintaining the last-lookup time in a separate table -- or even a flat file). The only trick is if that if the tables allow row deletion, you have to store at least the primary key of every deleted row in a counterpart table. You can also try to do "soft deletes", where you only mark a row as ready to be deleted, and don't actually delete it until some point in the future. But that gets even more complicated.

    One client I work for has a system like this in place for about a hundred tables in dozens of applications. It takes work, but it's effective, and generalizes to all rdbms platforms.

    HTH

Re: mirroring mySQL database
by Zaxo (Archbishop) on Dec 06, 2001 at 07:38 UTC

    MySQL 3.23.15 introduced Replication, see (gack, info!) info mysql#Replication. The constraints and hurdles from your provider suggest looking at DBI::Proxy.

    After Compline,
    Zaxo

Re: mirroring mySQL database
by gmax (Abbot) on Dec 06, 2001 at 15:22 UTC
    Well, this doesn't look like an entirely Perl related problem, and I have a non-entirely-Perl-related answer. :-)
    If your tables are not timestamped, an alternative solution could be reached through log-update files.
    In your my.cnf, you can have the following:
    [mysqld] log-update log-long-format
    It will produce a log with a copy of all the commands issued to the server. The long-format business means that before each query there is a comment with information about user, IP, date and time of the query.
    This log is a SQL script that you can present to MySQL and replicate all the changes made since the start of the log. (Whith the exception of LOAD DATA queries: for these ones you would need the original input files as well).
    Here, finally, enters Perl, which you can use to parse this log and send the lines you want to the second server.
    It would help if you saved the log on a regular basis (it grows really BIG!).
    The resulting format of the log-file is something like
    # Time: 011206 9:47:01 # User@Host: john[john] @ [10.10.10.112] INSERT INTO official_holidays (OHDate,OHName) VALUES ( "2000/01/01" , +"New Years Day");
    Parsing this format in Perl is not straightforward but not extremely difficult either.
    For example, to get all the records after a given date and time, you could use:
    #!/usr/bin/perl -w use strict; my $start_copy = 0; while (<>) { next if /^\s+$/; if ((!$start_copy) and (/^# Time: (\d+)\s+(\d+):(\d\d):(\d\d)/)) { my $date = $1; my $hour = $2; # check if date and time are what you need if ((substr($1,4,2) eq "06") and ($hour gt "09")) { $start_copy = 1; } } print if $start_copy; }
    The regex should be fortified a bit, but I am sure you got the idea.

    One more thing: within your applications, you can decide not to update the logs temporarily, by setting the SQL_LOG_OFF variable to "1". More on that in the manual.
    Cheers
    gmax
      update
      Sorry, it was SQL_LOG_UPDATE=0
      The previous one (SQL_LOG_OFF=1) will only affect the general log file. And, in addition, they have different semantics. :-(
      summary:
      my $query = qq{ SET SQL_LOG_OFF=1; # no logs to the general log SET SQL_LOG_UPDATE=0; # no logs to the update log. UPDATE $table SET bigField = $BLOB_value WHERE ID=1; SET SQL_LOG_OFF=0; # logs to the general log SET SQL_LOG_UPDATE=1; # logs to the update log. UPDATE $another SET smallfield = $smallvalue WHERE ID =3; }; $dbh->do($query);
Re: mirroring mySQL database
by gwhite (Friar) on Dec 06, 2001 at 17:38 UTC

    I think you are on the right track with your email the SQL statements idea, that way data only gets sent when there is data to send. Probably slightly more secure than FTP. The other methods suggested so far require more rights or access than you have. You could take your SQL convert to XML then back to SQL, but that would be extra work. You will need to do some sort of integrity/validity checking so someone doesn't email your program a "DELETE sales" command. PGP email may do it.

    g_White

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://129843]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-19 04:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found