Moving SQL from perl to Stored Procedures

Replies are listed 'Best First'.

Re: Moving SQL from perl to Stored Procedures
by tilly (Archbishop) on Sep 10, 2006 at 01:12 UTC

There are a lot of arguments on both sides when it comes to moving logic in and out of the database. Either decision is very defensible. In a particular situation it may make sense to choose one over the other, but I do not believe that either choice is universally better.

Furthermore if you're going to move logic into the database, then there are additional things you need to do as well. I think that if you're saying that you should do the first, you should at least mention the second.

What are some of the trade-offs?

Against keeping logic in databases:

Who is developing your application? Probably programmers. What are they best at? Probably their application language (given this site, likely to be Perl). Why make them program in another language?
You present SQL injection attacks as a key problem to solve. That has many solutions. For instance you can use placeholders. Or you can use an object relational manager such as Class::DBI.
Stored procedure languages tend to be quite bad. Business logic etc can often be far more clearly done outside of the database than inside the database.
Having significant logic in stored procedures makes code releases significantly more complex.
If multiple programmers are working on logic in stored procedures in the database, they are more likely to accidentally step on each other's toes.
If there is any chance that you'll need to change databases, the more logic in the database, the worse it is. Very relevant to this point is that proprietary database vendors actively attempt to charge you as close to your cost of switching databases as possible. Those negotiations may go better for you if you really can switch databases easily. (See Information Rules for an explanation of why they do this.)
The database tends to be the hardest part of your system to scale. Verifying logic takes work for your database, and may cause it to run into problems sooner. (Note "may", scalability can go either way.) Yes, I know that database vendors claim that it is more scaleable to put logic in the database. It is also in their financial interest that you do so...
As the anonymous monk pointed out above, when you leave some logic in the application and move other logic into the database, you are tightly coupling two piece of code that are very far from each other. In general, this is an unwise thing to do.

For putting logic in databases:

Security. While SQL injection attacks can prevented in other ways, if someone manages to break into your webserver in another way, having the database locked down makes it more difficult for them to accomplish.
When multiple applications access the same database (one of those "applications" may be ad hoc queries), you can't guarantee that they will all implement the same business logic unless it is in the database.
Performance. It is often possible to code business logic to run faster in the database than it will outside. The biggest win being due to avoiding unnecessary round trips between machines. Some may wonder how this squares with my scalability point above. To understand it, remember that scalability is like a Mack truck - a sports car may get your groceries home faster, but the truck can move more groceries.
Avoid race conditions. When logic is out of the database, you sometimes have to choose between accepting a race condition and holding a lock that can be a scalability hit. (See my comment before on scalability going either way...) The same locking done within the database is less of a scalability issue because you do away with the latency of round trips between the database and the client.

Now looking at those lists, what stands out? The first thing that stands out for me is that there is no single "killer" point that makes it obvious that one is always the right way to go and the other is not. The second is that the list of reasons against logic in the database is longer than the list of reasons to put it there. The third is that the reasons against putting logic in the database are mostly addressable, while the benefits of putting it in the database are mostly things you're stuck with.

Therefore my conclusion is that, depending on the project, either choice can be reasonable. But if you choose to put logic in the database, then you should address the problems that can arise. Here is a short list of things that I think you should do.

Give each developer their own database. That way they won't get in each other's way as they are developing.
Create automated procedures for synchronizing your database with what is in source control. This will help take the pain out of setting up and maintaining per developer databases, and those procedures can become part of your release process, addressing that issue.
Make it very clear in your application design what the line is between what belongs in the database and what does not. Be very vigilant in maintaining that line. That way you'll limit the coupling issues.
Make sure that your developers understand enough about how to work with stored procedures to do what they need to do. They don't need to be experts, but they should be able to understand what a procedure does.

If you don't do at least some (and probably most, if not all) of those things, then I think it is a bad idea to move logic into the database. If you do all of those things and you actively need the benefits from moving your logic there, then you are probably making a wise choice.

[reply]

Re^2: Moving SQL from perl to Stored Procedures

by BrowserUk (Patriarch) on Sep 10, 2006 at 02:14 UTC

The first thing that stands out for me is that there is no single "killer" point ...

The killer points for me in favour of stored procedures over embedded SQL are:

Relational logic and database design is hard to master.
A good DBA (a rare breed), can save the company their salary many times over. Very few general programmers become really adept and DB design.
Applications need only the results, not how to obtain them.
DB design transcends applications for all but the most trivial. A DBA can provide cross-application overview and arbitrate between their disparate requirements. Application programmers are usually to caught up in their current applications requirements to step back and subjugate their perception of their applications needs in favour of another applications with such arbitration.
Isolating your DB design from your applications, and vice versa, is a sound application of the decoupling principle at so many levels. New applications come on line that require changes to the DB schema. Other applications should not have to change to accommodate this. The stored procedures are modified to account for the new schema and existing applications continue to run without change.
Business code and data should live together and be application independent.
Tax rates, depreciation terms and a whole gamut of other typical business calculations, rules and logic live outside of the applications remit, and are subject to change by government edicts and legislation.
When they change, they change across the board for all affected applications. Changes should be implemented and tested once, not in every application.
You do not realise the true benefits of a RDBMS until all your business data exists in a single database.
It is only once it is possible to perform relational logic across the entire range of company data assets that they really begin to earn their keep.
Warehousing solutions where the data from individual databases is replicated wholesale into secondary conglomerations miss opportunities for ad-hoc queries that the conglomeration process does not support.
Similarly, warehousing that attempts to perform cross DB boundary searches via meta-data and meta searching are rarely, if ever, capable of keeping up with changes and accommodating speculative ad-hoc queries. There is usually a fairly substantial delay between the notion of a cross-DB connection query being useful and it becoming available; and a not inconsiderable expense in making it available, which requires a costed business case to be made before budget can be allocated. Those delays, costs and red-tape suppress discovery.

No web programmer worth his salt would advocate the mixing of Perl and HTML. Why advocate the mixing of Perl and SQL.

The obligatory analogy. GP's diagnose, prescribe and manage year on year health care, but they defer specialist treatment to specialists. Eye, brain, heart specialists etc. etc.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

[reply]

Re^3: Moving SQL from perl to Stored Procedures

by tilly (Archbishop) on Sep 10, 2006 at 05:43 UTC

Relational logic and database design is hard to master.
That depends what you mean by "master". But a good working understanding of it is within the grasp of any good programmer. If you are working on an application that heavily depends on a relational database, then I'd consider it your responsibility to gain that understanding.
This doesn't mean that you'd replace a DBA. There is a lot of detail about, for instance, when to use what kind of database and how to physically lay tables out which developers should not be expected to understand. But a developer should be able to understand how to structure data in a database, and how to access that data.
Applications need only the results, not how to obtain them.
That's a wonderful theory. It is not so wonderful in practice.
If there is an impedence mismatch between how the application works and how the database works, that mismatch is going to be a constant source of problems. As Joel Spolsky put it, abstractions leak. And the more that the abstraction tries to look different than the underlying reality, the more leakage there is. Therefore for a straightforward CRUD application, the database design dictates a lot of how the application should naturally be designed. (Which is one reason why it is good for developers to understand enough about database design to be able to understand that.)
Business code and data should live together and be application independent.
The way that you have put this presupposes that there are many independent applications accessing the same data, and those applications are tied to some bigger business. That was one of the situations where I stated one should move logic into the database.
However that presupposition is often not the case. For instance consider an application like RT. RT sets up a private database as part of the installation. It is very reasonable for it to assume that it is the only thing accessing that database. If you break this assumption, that is your problem. It is also not the case that RT's design depends in any way on the corporation where it has been installed. So your presupposition completely fails in the case of this application. (Note: RT gives you lots of ways to export its data, and APIs to manipulate RT. Integrating RT's data into other databases is not hard.)
You do not realise the true benefits of a RDBMS until all your business data exists in a single database.
You do not realize the true limitations of a RDBMS until all your business data exists in a single database and that database melts down.
I don't know what you do, or who you do it for. However I happen to handle reporting needs for a high performance website. And I to know enough about databases to understand the difference between a transactional database and a data warehouse.
If you attempt to run a large query against our transactional database, you will generally fail. Data is changing too fast, and before you get very far you'll get a snapshot too old. (If you guessed from that error that I'm talking about Oracle, give yourself a gold star.) But even if you go against the data warehouse, many ad hoc queries are simply not directly feasible. If you have experience running complex queries against billions of rows of data, you'll understand. Reality is far short of the ideal of just putting everything into a database and letting it take care of the details.

Class::DBI

In fact you'll want that layer no matter where you choose to put all of that logic. There has to be some sort of interface, and you ideally want that interface to be as cleanly defined on each side as possible. That is basic common sense. What is not is how much logic to put where.

[reply]

Re^4: Moving SQL from perl to Stored Procedures

by BrowserUk (Patriarch) on Sep 10, 2006 at 18:12 UTC

Re^5: Moving SQL from perl to Stored Procedures

by tilly (Archbishop) on Sep 12, 2006 at 19:29 UTC

Re^2: Moving SQL from perl to Stored Procedures

by imp (Priest) on Sep 10, 2006 at 02:24 UTC

tilly

I agree that stored procedures do not fit every situation, and that there are new issues that arise when you decide to embrace that solution - the biggest one for me is coupling. That said, there are many situations where they are the best solution available.

The intent for the original post was not to suggest that stored procedures are the best practice by the way - it was to discuss best practices when using stored procedures.

[reply]

Re: Moving SQL from perl to Stored Procedures
by Jenda (Abbot) on Sep 09, 2006 at 11:03 UTC

Document your stored procedures

I tend to put the docs just above the CREATE PROCEDURE statement. I also do not include the dates and descriptions of changes, that's what the source control software is for. As I keep all stored procedures as separate files in Perforce and do use the changelist descriptions I do not feel the need to repeat the same info elsewhere. If I need the history I know where to look.

Identifying changes for promotion

While I do keep the procs in Perforce and do have separate Dev, QA and Prod branches I usualy do not use this when generating the update scripts. I run a service on the servers indexing (generating MD5 hashes of) the objects (pages, dlls, executables, stored procs, views, schemas of tables, ...) and whenever preparing the update simply compare the indexes and pack the differing files.

Naming conventions

I like to be able to read the names of the procs. And since the stored procs are commands they should IMHO have the form of a command sentence. FetchThis, InsertThat, UpdateSomething, DeleteWhatever, ExpireJobs, ExportSites, ImportSitePostingParameters, GetSiteName, ...

The most common verbs are Fetch = return a recordset with some data, Get = return a few pieces of info in OUTPUT params, Insert = insert a new row to a table and return the generated ID, Update = update a row (usually using the ID), Set = insert or update something that doesn't have its own generated ID (like some additional details for some object), Delete = delete something.

Dynamic Where clauses

As explained in Re^3: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging it's most efficient, even if inconvenient, to branch out and write several SELECT statements.

Jenda
Support Denmark!
Defend the free world!

[reply]

Re: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging
by imp (Priest) on Sep 09, 2006 at 04:04 UTC

You cannot use a variable with TOP
You cannot specify an offset with TOP

Read more... Common solutions, and my preferred approach (5 kB)

[reply]
[d/l]
[select]

Re^2: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging

by dokkeldepper (Friar) on Sep 09, 2006 at 08:22 UTC

Use cursors! Update: BTW: These dynamic where clauses are very expensive because the optimizer cannot optimize them. Better way: Generate SQL-Statements with a fixed filter, and execute them from within a procedure. This is much more performant. There was an article in SQL - mag in the black belt sql column recently. Maybe I should meditate someday about "SQL-cursors for the unwary" or something like that.

[reply]

Re^3: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging

by Jenda (Abbot) on Sep 09, 2006 at 10:32 UTC

You mean build an SQL statement in a nvarachar variable and then execute it? Well well well ... this has two big problems. The first being that this way you force the server to generate a query plan for each execution which may be a rather expensive operation. And the second, even more important, is that this is prone to SQL injection attacks. And it's even more dangerous than SQL statements built in the pages/scripts. First because it's kinda hidden in something that's by most believed to be safe and second because the statement is executed with the permissions of the stored procedure owner (usually dbo!) instead of the permissions of the user accessing the database!

I can't say I never use this, but I do try to take this as a last resort solution. It's much better, even if not very convenient to branch out and write several SELECT statements. That way the server may generate and REMEMBER the query plans for each of the cases and there is nothing ad-hoc generated and therefore dangerous:

CREATE PROCEDURE [dbo].[dynamic_example]
  @lastname varchar(50) = NULL,
  @firstname varchar(50) = NULL
AS
BEGIN
  IF (@lastname IS NULL)
    IF (@firstname IS NULL)
      SELECT * from Users
    ELSE
      SELECT * from Users
       WHERE (users.firstname LIKE @firstname)
  ELSE
    IF (@firstname IS NULL)
      SELECT * from Users
       WHERE (users.lastname LIKE @lastname)
    ELSE
      SELECT * from Users
       WHERE (users.lastname LIKE @lastname)
         AND (users.firstname LIKE @firstname)
END
GO
[download]

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]

Re^2: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging

by Jenda (Abbot) on Sep 09, 2006 at 10:46 UTC

Could you show me how do you assign the sort_id in your first strategy? I don't have access to an SQL Server at the moment, but the difference in the speed of those two strategies sound too big. Do you use

declare @paging_buffer table(
  sort_id int not null identity(1,1) PRIMARY KEY,
  user_id int not null,
  firstname varchar(50),
  lastname varchar(50)
);
[download]

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]

Re^3: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging

by imp (Priest) on Sep 09, 2006 at 16:05 UTC

I'll correct the benchmark data provided and add a note about your correction to the end - thanks for catching my error.

Update - Redid benchmark

[reply]

Re: Moving SQL from perl to Stored Procedures
by derby (Abbot) on Sep 09, 2006 at 10:32 UTC

and treat it as I would a class API

++ and given a decent naming convention (we use NOUN_I, NOUN_GET, NOUN_U, NOUN_D - I couldn't convince them to use _C and _R - dbas are a strange breed), you can use your dbs stock utilities to automagically create an API class.

-derby

[reply]

Re: Moving SQL from perl to Stored Procedures
by dws (Chancellor) on Sep 10, 2006 at 05:56 UTC

The idea of moving logic into stored procedure makes me squeamish. I've only had direct experience with a significant number of stored procedures once, but lots of indirect exposure through friends.

There were two common themes.

The first is that it always seemed to split development into two camps, the non-database developers and the database developers. And too often, friction between the camps built up and hindered progress.

The second one, lack of ability to unit test business logic that had migrated into stored procedures, may have been resolved by now. But when I was dealing with it, the difficulty of writing unit tests for stored procedures really sucked, and slowed down development. You could test functionally, by throwing data at the stored procedure and then probing the database to make sure the right things had happened, but such tests seem to require a lot of maintenance, and tend to miss problems of the type that a mock object would spot right away.

I would need to see a pretty compelling performance argument before I would consider putting any significant business logic in stored procedures. But that's me. YMMV.

[reply]

Re: Moving SQL from perl to Stored Procedures
by gam3 (Curate) on Sep 10, 2006 at 13:38 UTC

Should there be SQL mixed with perl

my $sql = "EXEC dynamic_example @firstname = ?";
$dbh->execute('chris%')
[download]

$dbp->execute('dynamic_example', firstname => 'chris');
[download]

-- gam3
A picture is worth a thousand words, but takes 200K.

[reply]
[d/l]
[select]

Re: Moving SQL from perl to Stored Procedures
by Anonymous Monk on Sep 09, 2006 at 11:13 UTC

Any university student will know that dividing application logic between application and SQL server is messy at best. It's a direct violation of the principle of cohesion which seeks to keep related logic together.

Indeed, the principle of coupling is also attacked using stored procedures.

I see industry racing to use these "cool ideas" but in reality so many of them are unsound. Too many cowboys in industry..

[reply]

Re^2: Moving SQL from perl to Stored Procedures

by jhourcle (Prior) on Sep 09, 2006 at 20:12 UTC

You're looking at it wrong -- we're maintaining coupling in the database -- I'm keeping the data with its associated logic.

Should I leave the data to be misinterpreted by each application's programmers?

Should data security be moved from the database and to the application?

Should each application implement the same accessors or business logic? (remember -- they might not all be using the same programming language)

Any person with experience knows that every situation is different, and there are advantages and disadvantages to almost every method. There may be times when it makes sense to keep all of the business logic out of the database (eg, you have reason to believe that your company may be switching database vendors in the future, and you'll have to re-code all stored procedures), and there are times when keeping it all close to the database makes sense (eg, it's a large database that's used by multiple applications, and you need to keep applications from making calls that might adversely affect the overall performance of the database).

[reply]