Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Perl DBI (sybase) rapidly delete large number of records

by simm42 (Novice)
on May 07, 2007 at 10:09 UTC ( [id://613909]=perlquestion: print w/replies, xml ) Need Help??

simm42 has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I'm doing some processing in perl to determine which records I want to delete - I end up with a list of about 20,000 (in a database of around 200,000 records)

I tried doing the delete in perl (prepare the delete query then itterate round the list executing with each value) and the performance is terrible.

In the end I wrote the values to a file, used the sybase bulk copy tool to throw them into a temporary table and did a "delete from table where id in (select id form temp table)"

The performance is much better (4 seconds vs about 10 minutes) but it just feels like a bit of a hack

Anyone know a way to improve the performance of this type of opperation staying within perl?

Thanks

simm

Replies are listed 'Best First'.
Re: Perl DBI (sybase) rapidly delete large number of records
by kyle (Abbot) on May 07, 2007 at 11:15 UTC

    You could delete them in chunks instead of one at a time.

    my @deletable_ids = qw( 1 2 3 ); my $delete_at_a_time = 100; while ( @deletable_ids ) { my @delete_now; if ( scalar @deletable_ids > $delete_at_a_time ) { @delete_now = splice @deletable_ids, 0, $delete_at_a_time; } else { @delete_now = @deletable_ids; @deletable_ids = (); } my $in_clause = join q{,}, @delete_now; $dbh->do( "DELETE FROM t WHERE blah IN ($in_clause)" ); }

    You may not be able to shove all of your ids into one DELETE, but with some trial and error you can figure out how many you can delete at once.

    Update: If you want to make use of parameters (to get quoting, for instance), you can make the end of the loop look like this:

    my $qmarks = join q{,}, map { '?' } @delete_now; $dbh->do( "DELETE FROM t WHERE blah IN ($qmarks)", undef, @delete_now );

      You actually don't need the if/else block, as splice will return all of the remaining records in the 'else' case. I use the following:

      (I probably don't need to rebuild the sql statement each iteration of the loop, only the first/last ones, but it's rare that I delete more than 100 at a time)

      while ( @delete ) { my @files = splice( @delete, 0, 100 ); my $sth_delete = $db->prepare( 'DELETE FROM SECCHI WHERE filep +ath IN (' . join ( ',', ( ('?') x @files ) ) . ')' ); $sth_delete->execute( @files ) or warn "Couldn't delete files"; }
Re: Perl DBI (sybase) rapidly delete large number of records
by grinder (Bishop) on May 07, 2007 at 13:23 UTC
    I tried doing the delete in perl (prepare the delete query then itterate round the list executing with each value) and the performance is terrible.

    In the end I wrote the values to a file, used the sybase bulk copy tool to throw them into a temporary table and did a "delete from table where id in (select id form temp table)"

    You will never be as fast as a bulk copy tool from Perl, regardless of the DB. There's just too much unavoidable overhead. That said, if the bcp approach is 4 seconds, 8-10 seconds in Perl sounds like a reasonable target to achieve.

    I can't help wondering if you have automatic commits occurring, as that is the default behaviour. You must explicitly say AutoCommit => 0 to disable it.

    That would neatly explain the 2 orders of magnitude slowdown that you see occurring, especially if the table in question has a number of indexes.

    • another intruder with the mooring in the heart of the Perl

Re: Perl DBI (sybase) rapidly delete large number of records
by dokkeldepper (Friar) on May 07, 2007 at 10:50 UTC
    With respect to the database/table: Throw away the indices and other constraints of the table that is target of the deletion. Otherwise, indices must be rebuilt after every delete, constraints need to be checked. In most cases the contraint check and index rebuilt is much more expensive with respect to the database time than the loss due to the missing index coverage. In particular, your database is quite small, so that the delete-search is not really an issue. The issue of communication time might be improved by sending a larger query batch, if possible.
Re: Perl DBI (sybase) rapidly delete large number of records
by Moron (Curate) on May 07, 2007 at 13:52 UTC
    All such insert/update/delete bulk operations need to be batched. What I do (and this is something I am doing many times a year) is something like:
    use constant (SYBBATCHSIZE => 1000, # ... ); # ... while( for ( my $i = 0; $i <= $#keys; $i += SYBBATCHSIZE ) { SqlSub( 'DELETE FROM mytable ' . 'where pkcol in ( "' . join( '", "', @keys[ $i ..SYBBATCHSIZE ] ) . '" )' ); }
    __________________________________________________________________________________

    ^M Free your mind!

Re: Perl DBI (sybase) rapidly delete large number of records
by jettero (Monsignor) on May 07, 2007 at 10:20 UTC

    It sounds like the bottelneck is communication cost, not perl precisely, so I don't think there's any way around it. If the subselect solves the problem, I'd say you have the solution. Using perl, select the values into a temporary table and then (using perl), use the subselect?

    If that doesn't sound right, I may not understand the problem.

    -Paul

      The processing to find the right ids to delete is fairly complex and is a lot quicker to do in perl rather than in the database, so I have to pull the data from the db, process it which generates the list of ids to delete - its getting that list back into the database that I end up having to shell out for - would prefer to be able to do that in perl but I just cant get close to the performance of bcp
        There's about a million variations of this problem. It would help if you could submit some code and/or data that demonstrated how it is that you're sending the temporary data to the database and which (may) illuminate possible solutions.

        Presently, it feels like there is no solution; but that seems to be false since you have some way to "shell out" to produce the temp table.

        UPDATE: No, no, you miss my point entirely. If you can find a way to "shell out" and do what you want, then there's almost certainly a way to do it in perl. I still don't think I know enough about the problem or what you're doing when you "shell out" to be able to provide more information.

        -Paul

Re: Perl DBI (sybase) rapidly delete large number of records
by RL (Monk) on May 07, 2007 at 15:44 UTC

    I don't know anything about sybase but I tried similiar with SQlite and there was a *terrific* improvement if the whole processing was done within a transaction due to the fact that updating the database needed only one writing procedure if transaction was used.

    Greetings
    RL

Re: Perl DBI (sybase) rapidly delete large number of records
by one4k4 (Hermit) on May 07, 2007 at 17:11 UTC
    I've got a table (Sybase, Transact) with a few hundred thousand records in it and desire to remove records older than say three months. What I think is a good way is to let the SQL server do the work.

    If you are allowed, create a stored procedure to do this, otherwise I think it can be done via one prepare/execute statement in perl.

    Create a temp table #t1 and select into #t1 the records you want to _keep_.

    Truncate the original table.

    Select into the original table * from #t1.

    The delete-from-execute method will create a transaction for each record. If you have it in some form of a loop, it'll only get worse and you may find your transaction log getting full. Truncate will trunc in one transaction, iirc.
Re: Perl DBI (sybase) rapidly delete large number of records
by jfroebe (Parson) on May 08, 2007 at 15:25 UTC

    My guess is that you don't have a covering index for the id column of your table.

    That's easy to fix: create an index on the table (preferably an unique index) that has 'id' as one of the keys.

    The thing you need to worry about is the danger of putting the entire delete in a single tran is that you may fill up your transaction log. 20,000 rows isn't that big and neither is 200k row tables. Let's cover that:

    Say you have a query that easily fills up the transaction log of the Sybase ASE server: 

    delete from rep_queues_archive where sample_date < dateadd(wk, -1, getdate())

    How would you break up the transaction so it doesn't fill up the log?  There are several ways to do it, but I'll just cover two of them:

    1. Break up the transaction based on the actual data.
      1. delete from rep_queues_archive where sample_date < dateadd(wk, -52, getdate()
      2. delete from rep_queues_archive where sample_date < dateadd(wk, -51, getdate())
      3. etc.
    2. Break up the transaction based on the number of rows affected.
    declare @rows_affected int
    declare @dbName varchar(50)
    select @dbName = db_name()
    set rowcount 1000

    select @rows_affected = 1

    while @rows_affected > 0
    begin
        delete from rep_queues_archive where sample_date < dateadd(wk, -1, getdate())
        select @rows_affected = @@rowcount

        if @@error = 0
        begin
            commit tran
            dump tran @dbName with truncate only
       end
        else
            rollback tran
    end

    set rowcount 0

    Of course, if you are performing incremental backups, you will want to modify the dump tran @dbName with truncate only line to read something like dump tran @dbName to "/sybdumps/mydb_tran.dmp". Modify to suit your naming standard.

    HTH,

    Jason L. Froebe

    Help find a cure for breast cancer! Net proceeds benefit the Susan G. Komen Breast Cancer Foundation and the National Philanthropic Trust. Help by donating - I'm walking 60 miles in 3 days in August 2007. (The day I return from TechWave is the first day of the Walk).

    Blog, Tech Blog

Re: Perl DBI (sybase) rapidly delete large number of records
by djp (Hermit) on May 08, 2007 at 02:58 UTC
    Sybase::Xfer may be useful for this, see the -delete_flag and -auto_delete options.
Re: Perl DBI (sybase) rapidly delete large number of records
by mpeppler (Vicar) on May 08, 2007 at 15:55 UTC
    Like jfroebe said - the most likely problem is that you don't have the correct index on the target table - this sounds like a table scan for each row you delete, which seems to match the performance problems you see.

    Michael

Re: Perl DBI (sybase) rapidly delete large number of records
by TomDLux (Vicar) on May 10, 2007 at 17:09 UTC

    One solution is to extract the data from the database, and feed the data back in, one by one, with a delete command.

    The other solution is to use the nature of a database correctly, operating on sets of data, and keeping the operation with the database, which is optimized to handle such situations.

    Why does the correct solution strike you as being a hack?

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: Perl DBI (sybase) rapidly delete large number of records
by Anonymous Monk on May 11, 2007 at 00:50 UTC
    Sounds to me like you just need a transaction wrapped around the whole thing.
    BEGIN DELETE FROM y WHERE x=1; DELETE FROM y WHERE x=2; DELETE FROM y WHERE x=3; ... COMMIT
    That'll make things way faster.
      Don't assume that what make things faster with one database engine will make it faster with another.

      In the case of Sybase ASE this probably won't make that much difference, but will potentially cause much locking, and may exceed the size of the transaction log.

      Michael

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://613909]
Approved by Tomte
Front-paged by one4k4
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-03-29 12:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found