Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: [OT] SQL "on duplicate key" custom action

by Corion (Patriarch)
on Oct 21, 2019 at 07:36 UTC ( [id://11107748]=note: print w/replies, xml ) Need Help??


in reply to [OT] SQL "on duplicate key" custom action

I think the "best" approach (if that is supported by your DBMS) are ON UPDATE triggers, or if that import is a one time thing only, creating a unique key across all columns:

create or replace function skip_insert() returns trigger language plpg +sql as $$ begin return null; end $$; create or replace function raise_exception() returns trigger language +plpgsql as $$ begin RAISE EXCEPTION 'UPDATE changing row values not allowed'; end $$; CREATE TRIGGER mytrigger ON records INSTEAD OF UPDATE WHEN old.PriKey = new.PriKey and old.Foo=new.Foo and old.Bar=new.B +ar FOR EACH ROW EXECUTE PROCEDURE skip_insert() CREATE TRIGGER mytrigger ON records INSTEAD OF UPDATE WHEN old.PriKey = new.PriKey and (old.Foo!=new.Foo or old.Bar!=new +.Bar) FOR EACH ROW EXECUTE PROCEDURE raise_exception()

... but this approach does not handle NULL values.

Replies are listed 'Best First'.
Re^2: [OT] SQL "on duplicate key" custom action
by haukex (Archbishop) on Jan 28, 2020 at 15:00 UTC

    Here's the solution I've ended up with, based on your suggestion:

    CREATE FUNCTION skip_update() RETURNS trigger AS $$ BEGIN RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE FUNCTION fail_update() RETURNS trigger AS $$ BEGIN RAISE EXCEPTION 'can''t modify row values for existing primary + key'; END; $$ LANGUAGE plpgsql; CREATE TABLE Dummy_Test ( Timestamp TIMESTAMP WITH TIME ZONE PRIMARY KEY, Foo TEXT, Bar TEXT ); CREATE TRIGGER Dummy_Test_dupe BEFORE UPDATE ON Dummy_Test FOR EACH RO +W WHEN ( OLD.Timestamp IS NOT DISTINCT FROM NEW.Timestamp AND OLD.Fo +o IS NOT DISTINCT FROM NEW.Foo AND OLD.Bar IS NOT DISTINCT FROM NEW.B +ar ) EXECUTE FUNCTION skip_update(); CREATE TRIGGER Dummy_Test_modify BEFORE UPDATE ON Dummy_Test FOR EACH +ROW WHEN ( OLD.Foo IS DISTINCT FROM NEW.Foo OR OLD.Bar IS DISTINCT FRO +M NEW.Bar ) EXECUTE FUNCTION fail_update(); INSERT INTO Dummy_Test (Timestamp,Foo,Bar) VALUES ('2020-01-02 12:34', +'Hello','World') ON CONFLICT ON CONSTRAINT Dummy_Test_pkey DO UPDATE +SET Foo=EXCLUDED.Foo, Bar=EXCLUDED.Bar; INSERT INTO Dummy_Test (Timestamp,Foo,Bar) VALUES ('2020-01-02 12:34', +'Hello','World') ON CONFLICT ON CONSTRAINT Dummy_Test_pkey DO UPDATE +SET Foo=EXCLUDED.Foo, Bar=EXCLUDED.Bar; INSERT INTO Dummy_Test (Timestamp,Foo,Bar) VALUES ('2020-01-02 12:34', +'Hello','abcde') ON CONFLICT ON CONSTRAINT Dummy_Test_pkey DO UPDATE +SET Foo=EXCLUDED.Foo, Bar=EXCLUDED.Bar;

    The last statement will fail, which is exactly what I wanted. The function FUNCTION skip_update() and TRIGGER Dummy_Test_dupe really only prevent the UPDATE from happening, they can be omitted - I haven't yet tested whether it's more performant to have the TRIGGER or the UPDATE fire. In any case, thank you very much!

      Just for the record, here's the same thing for SQLite. As above, the TRIGGER Dummy_Test_dupe is optional and I haven't tested the performance difference yet.

      use warnings; use strict; use DBI; my $dbh = DBI->connect("dbi:SQLite:dbname=:memory:", undef, undef, { RaiseError=>1, AutoCommit=>1 } ); $dbh->do(<<'ENDSQL'); CREATE TABLE Dummy_Test ( Timestamp INTEGER PRIMARY KEY, Foo TEXT, Bar TEXT ); ENDSQL $dbh->do(<<'ENDSQL'); CREATE TRIGGER Dummy_Test_dupe BEFORE UPDATE ON Dummy_Test FOR EACH RO +W WHEN ( OLD.Timestamp IS NEW.Timestamp AND OLD.Foo IS NEW.Foo AND O +LD.Bar IS NEW.Bar ) BEGIN SELECT RAISE(IGNORE); END; ENDSQL $dbh->do(<<'ENDSQL'); CREATE TRIGGER Dummy_Test_modify BEFORE UPDATE ON Dummy_Test FOR EACH +ROW WHEN ( OLD.Foo IS NOT NEW.Foo OR OLD.Bar IS NOT NEW.Bar ) BEGIN SELECT RAISE(ABORT, "same Timestamp but different values"); +END; ENDSQL my $in = $dbh->prepare(<<'ENDSQL'); INSERT INTO Dummy_Test (Timestamp,Foo,Bar) VALUES (?,?,?) ON CONFLICT (Timestamp) DO UPDATE SET Foo=EXCLUDED.Foo, Bar=EXCLUD +ED.Bar; ENDSQL $in->execute(12345,'Hello','World'); $in->execute(12345,'Hello','World'); my $e; eval { $in->execute(12345,'Hello','abcde'); 1 } or do { $e = $@ }; if ( defined $e ) { print "Third insert failed as expected: $e" } else { die "Third insert didn't fail as expected" }
      The function FUNCTION skip_update() and TRIGGER Dummy_Test_dupe really only prevent the UPDATE from happening, they can be omitted - I haven't yet tested whether it's more performant to have the TRIGGER or the UPDATE fire.

      Interestingly, it made a small but consistent difference. I imported the same dataset (73613 INSERTs) three times into a previously populated database, and without the TRIGGER, i.e. with the UPDATE, it took an average 91.540s, and with the TRIGGER, i.e. preventing the UPDATE, it took 99.368s (each of these is only +/- about 2s), i.e. a difference of roughly 0.1ms per record. Not a big deal here, and not a rigorous test, but still interesting.

      Update: Importing into a previously empty database revealed only a tiny difference: overall the import was roughly 10sec faster, again UPDATE beating TRIGGER with a difference of 0.17ms per record. Again, not really significant.

Re^2: [OT] SQL "on duplicate key" custom action
by erix (Prior) on Oct 21, 2019 at 16:25 UTC

    Do you perhaps have a tested version? I can't get this to run.

    The trigger functions are all right, it seems, but I don't see how to create the triggers themselves onto the table 'records' so that they react in the desired way. (For one thing, you can't do an 'INSTEAD OF UPDATE' on a table, and when you use a BEFORE INSERT or an AFTER INSERT trigger, this old. and new. business is not going to work.)

      No, sorry - I don't have a tested version, and I'm not on the machine where I did the searches.

      I worked from the Postgres 12 documentation and some Stackoverflow response, but I don't find them now :( I think it should be a before update trigger and the statement would be insert or update maybe. But again, I didn't test any of this, sorry.

Re^2: [OT] SQL "on duplicate key" custom action
by haukex (Archbishop) on Oct 21, 2019 at 15:00 UTC

    Thanks very much! It's not just a single import, it'll be a large import roughly once a month. So I understand you'd suggest a trigger over a unique key, may I ask why? (Performance?)

      An index across the complete table will eat up lots of disk space, and also will make each insert slow(er). Think of an index as (somewhat structured) hash keys, where you can also quickly look up strings starting with a specific string.

      If your table is largely read-only and you have the disk space to spare, the index will speed up queries maybe a bit more than an index only on the primary keys, because the DB can satisfy the query completely from the index without hitting the table storage.

      The triggers will slow down all UPDATE statements but leave SELECT untouched and also will not use additional storage.

        Ok, thank you! I do think that since the database will probably grow relatively large, I don't want to be too wasteful with the space, and slightly slower imports are probably fine. I'll try to get the trigger working and maybe I'll do a comparison with the unique key method too.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11107748]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-04-19 10:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found