Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Moving SQL from perl to Stored Procedures

by imp (Priest)
on Sep 09, 2006 at 03:13 UTC ( [id://572114]=perlmeditation: print w/replies, xml ) Need Help??

Every programming position I have held has been very database centric, and as a result I have been exposed to a number of approaches to SQL code management. My current preference is to keep the SQL within stored procedures, and treat it as I would a class API.

I have come up with a set of what I feel are best practices, and would be very interested in feedback from the community. My most recent experience in this area has been with microsoft sql server 2000, so I will focus on that in this node.

Document your stored procedures internally

I find it helpful to document the intent for stored procedures within their body, and to also document changes to the stored procedure with a date stamp and a ticket number. This makes it easier for future maintainers.
CREATE PROCEDURE dbo.SOMETHING_SAVE @something_id int, @name varchar(50) AS /* Does blah blah -- Changed on 2006-09-07 to resolve ticket #570 */ -- Do stuff GO

Treat your stored procedures as any other asset - keep them in a versioning system

Your schema and stored procedures should always be recorded in a versioning system. If you find yourself needing to use a version of your application from 2 years ago, you need to be able to recreate the database as it was at that time.

Additionally, when moving from a development to production environment you need to be able to easily identify the changes that have been made.

Privilege Separation

I prefer to only access the data through stored procedures (and possibly views). With some databases you can revoke all table permissions, and grant selective permissions to individual SPs/Views. This works because of 'Ownership Chaining'.

By doing this you can restrict the level of access each database user is granted to a much finer level. Different interfaces into the system (e.g. admin vs external user) can have access to specific procedures which adhere to business logic.

Another way I like to use this is to have password verification be a black box.

$sql = q{EXEC AUTH_USER @username = ?, @password = ?}; my ($user_id) = $dbh->selectrow_array($sql,{},$username,$password);
And finally, it overs an additional protection from SQL Injection. If they can't touch the table, they are less dangerous.

Naming conventions

I like is to reverse the normal naming scheme, and use NOUN_VERB.

Examples: SOMETHING_BY_ID, SOMETHING_SAVE, SOMETHING_LIST. I often add an extra level to that by prefixing the name with an abbreviation of the intended interface. e.g.:

  • PORTAL_SOMETHING_LIST
  • PORTAL_SOMETHING_BY_ID
  • ADMIN_SOMETHING_LIST
  • ADMIN_SOMETHING_BY_ID
  • ADMIN_SOMETHING_SAVE

I find that this keeps related items grouped when ordered alphabetically.

Benchmark the stored procedures

Make use of whatever tools are available for your database. For MS SQL2000 there is Query Analyer, which is a wonderful tool if you enable the 'Show Execution Plan' option.

The profiler tool is also useful for identifying bottlenecks.

Solutions (MS SQL2000 centric) to common reasons against stored procedures

Dynamic ordering

SQL Server 2000 does not directly support dynamic ordering, but you can emulate it with a CASE statement.

An important thing to remember is that it will not intelligently switch what datatype is being compared, so if some of the fields are char and some are integer, it will try to make them all integer - and cause an error

To avoid this you need to have multiple CASE statements, with the WHEN clauses grouped by data type.

order by CASE @order WHEN 'name' THEN lastname WHEN 'state' THEN state END, CASE @order WHEN 'name' THEN firstname END, CASE @order WHEN 'user_id' THEN user_id END

Dynamic Where clauses

Make use of boolean shortcircuiting.
CREATE PROCEDURE [dbo].[dynamic_example] @lastname varchar(50) = NULL, @firstname varchar(50) = NULL AS SELECT * from Users WHERE (@lastname IS NULL OR users.lastname LIKE @lastname) AND (@firstname IS NULL OR users.firstname LIKE @firstname) GO
EXEC dynamic_example @firstname = 'chris%'

Update - moved dynamic paging to a reply, instead of linking to my perlmonk.org account

Replies are listed 'Best First'.
Re: Moving SQL from perl to Stored Procedures
by tilly (Archbishop) on Sep 10, 2006 at 01:12 UTC
    I disagree with calling this a best practice without qualifiers or more details.

    There are a lot of arguments on both sides when it comes to moving logic in and out of the database. Either decision is very defensible. In a particular situation it may make sense to choose one over the other, but I do not believe that either choice is universally better.

    Furthermore if you're going to move logic into the database, then there are additional things you need to do as well. I think that if you're saying that you should do the first, you should at least mention the second.

    What are some of the trade-offs?

    Against keeping logic in databases:

    1. Who is developing your application? Probably programmers. What are they best at? Probably their application language (given this site, likely to be Perl). Why make them program in another language?
    2. You present SQL injection attacks as a key problem to solve. That has many solutions. For instance you can use placeholders. Or you can use an object relational manager such as Class::DBI.
    3. Stored procedure languages tend to be quite bad. Business logic etc can often be far more clearly done outside of the database than inside the database.
    4. Having significant logic in stored procedures makes code releases significantly more complex.
    5. If multiple programmers are working on logic in stored procedures in the database, they are more likely to accidentally step on each other's toes.
    6. If there is any chance that you'll need to change databases, the more logic in the database, the worse it is. Very relevant to this point is that proprietary database vendors actively attempt to charge you as close to your cost of switching databases as possible. Those negotiations may go better for you if you really can switch databases easily. (See Information Rules for an explanation of why they do this.)
    7. The database tends to be the hardest part of your system to scale. Verifying logic takes work for your database, and may cause it to run into problems sooner. (Note "may", scalability can go either way.) Yes, I know that database vendors claim that it is more scaleable to put logic in the database. It is also in their financial interest that you do so...
    8. As the anonymous monk pointed out above, when you leave some logic in the application and move other logic into the database, you are tightly coupling two piece of code that are very far from each other. In general, this is an unwise thing to do.

    For putting logic in databases:

    1. Security. While SQL injection attacks can prevented in other ways, if someone manages to break into your webserver in another way, having the database locked down makes it more difficult for them to accomplish.
    2. When multiple applications access the same database (one of those "applications" may be ad hoc queries), you can't guarantee that they will all implement the same business logic unless it is in the database.
    3. Performance. It is often possible to code business logic to run faster in the database than it will outside. The biggest win being due to avoiding unnecessary round trips between machines. Some may wonder how this squares with my scalability point above. To understand it, remember that scalability is like a Mack truck - a sports car may get your groceries home faster, but the truck can move more groceries.
    4. Avoid race conditions. When logic is out of the database, you sometimes have to choose between accepting a race condition and holding a lock that can be a scalability hit. (See my comment before on scalability going either way...) The same locking done within the database is less of a scalability issue because you do away with the latency of round trips between the database and the client.

    Now looking at those lists, what stands out? The first thing that stands out for me is that there is no single "killer" point that makes it obvious that one is always the right way to go and the other is not. The second is that the list of reasons against logic in the database is longer than the list of reasons to put it there. The third is that the reasons against putting logic in the database are mostly addressable, while the benefits of putting it in the database are mostly things you're stuck with.

    Therefore my conclusion is that, depending on the project, either choice can be reasonable. But if you choose to put logic in the database, then you should address the problems that can arise. Here is a short list of things that I think you should do.

    • Give each developer their own database. That way they won't get in each other's way as they are developing.
    • Create automated procedures for synchronizing your database with what is in source control. This will help take the pain out of setting up and maintaining per developer databases, and those procedures can become part of your release process, addressing that issue.
    • Make it very clear in your application design what the line is between what belongs in the database and what does not. Be very vigilant in maintaining that line. That way you'll limit the coupling issues.
    • Make sure that your developers understand enough about how to work with stored procedures to do what they need to do. They don't need to be experts, but they should be able to understand what a procedure does.

    If you don't do at least some (and probably most, if not all) of those things, then I think it is a bad idea to move logic into the database. If you do all of those things and you actively need the benefits from moving your logic there, then you are probably making a wise choice.

      The first thing that stands out for me is that there is no single "killer" point ...

      The killer points for me in favour of stored procedures over embedded SQL are:

      • Relational logic and database design is hard to master.

        A good DBA (a rare breed), can save the company their salary many times over. Very few general programmers become really adept and DB design.

      • Applications need only the results, not how to obtain them.

        DB design transcends applications for all but the most trivial. A DBA can provide cross-application overview and arbitrate between their disparate requirements. Application programmers are usually to caught up in their current applications requirements to step back and subjugate their perception of their applications needs in favour of another applications with such arbitration.

        Isolating your DB design from your applications, and vice versa, is a sound application of the decoupling principle at so many levels. New applications come on line that require changes to the DB schema. Other applications should not have to change to accommodate this. The stored procedures are modified to account for the new schema and existing applications continue to run without change.

      • Business code and data should live together and be application independent.

        Tax rates, depreciation terms and a whole gamut of other typical business calculations, rules and logic live outside of the applications remit, and are subject to change by government edicts and legislation.

        When they change, they change across the board for all affected applications. Changes should be implemented and tested once, not in every application.

      • You do not realise the true benefits of a RDBMS until all your business data exists in a single database.

        It is only once it is possible to perform relational logic across the entire range of company data assets that they really begin to earn their keep.

        Warehousing solutions where the data from individual databases is replicated wholesale into secondary conglomerations miss opportunities for ad-hoc queries that the conglomeration process does not support.

        Similarly, warehousing that attempts to perform cross DB boundary searches via meta-data and meta searching are rarely, if ever, capable of keeping up with changes and accommodating speculative ad-hoc queries. There is usually a fairly substantial delay between the notion of a cross-DB connection query being useful and it becoming available; and a not inconsiderable expense in making it available, which requires a costed business case to be made before budget can be allocated. Those delays, costs and red-tape suppress discovery.

      No web programmer worth his salt would advocate the mixing of Perl and HTML. Why advocate the mixing of Perl and SQL.

      The obligatory analogy. GP's diagnose, prescribe and manage year on year health care, but they defer specialist treatment to specialists. Eye, brain, heart specialists etc. etc.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Colour me unconvinced.

        • Relational logic and database design is hard to master.

          That depends what you mean by "master". But a good working understanding of it is within the grasp of any good programmer. If you are working on an application that heavily depends on a relational database, then I'd consider it your responsibility to gain that understanding.

          This doesn't mean that you'd replace a DBA. There is a lot of detail about, for instance, when to use what kind of database and how to physically lay tables out which developers should not be expected to understand. But a developer should be able to understand how to structure data in a database, and how to access that data.

        • Applications need only the results, not how to obtain them.

          That's a wonderful theory. It is not so wonderful in practice.

          If there is an impedence mismatch between how the application works and how the database works, that mismatch is going to be a constant source of problems. As Joel Spolsky put it, abstractions leak. And the more that the abstraction tries to look different than the underlying reality, the more leakage there is. Therefore for a straightforward CRUD application, the database design dictates a lot of how the application should naturally be designed. (Which is one reason why it is good for developers to understand enough about database design to be able to understand that.)

        • Business code and data should live together and be application independent.

          The way that you have put this presupposes that there are many independent applications accessing the same data, and those applications are tied to some bigger business. That was one of the situations where I stated one should move logic into the database.

          However that presupposition is often not the case. For instance consider an application like RT. RT sets up a private database as part of the installation. It is very reasonable for it to assume that it is the only thing accessing that database. If you break this assumption, that is your problem. It is also not the case that RT's design depends in any way on the corporation where it has been installed. So your presupposition completely fails in the case of this application. (Note: RT gives you lots of ways to export its data, and APIs to manipulate RT. Integrating RT's data into other databases is not hard.)

        • You do not realise the true benefits of a RDBMS until all your business data exists in a single database.

          You do not realize the true limitations of a RDBMS until all your business data exists in a single database and that database melts down.

          I don't know what you do, or who you do it for. However I happen to handle reporting needs for a high performance website. And I to know enough about databases to understand the difference between a transactional database and a data warehouse.

          If you attempt to run a large query against our transactional database, you will generally fail. Data is changing too fast, and before you get very far you'll get a snapshot too old. (If you guessed from that error that I'm talking about Oracle, give yourself a gold star.) But even if you go against the data warehouse, many ad hoc queries are simply not directly feasible. If you have experience running complex queries against billions of rows of data, you'll understand. Reality is far short of the ideal of just putting everything into a database and letting it take care of the details.

        Now you take a cheap shot about mixing Perl and HTML. But if you go back and look at what I've recommended, I never did that. I am talking about how much logic is pushed into the database. If you say, "very little", that doesn't mean that you're will randomly mix Perl and SQL. Instead in a sensibly designed application, you'll have an application layer to handle that interface. The role of that application layer is conceptually similar to a templating system - it creates a bridge from Perl to something else that allows you to avoid mixing the two. I even suggested one possible option for how to build that application layer, namely use Class::DBI.

        In fact you'll want that layer no matter where you choose to put all of that logic. There has to be some sort of interface, and you ideally want that interface to be as cleanly defined on each side as possible. That is basic common sense. What is not is how much logic to put where.

      Excellent points tilly - thank you for the feedback.

      I agree that stored procedures do not fit every situation, and that there are new issues that arise when you decide to embrace that solution - the biggest one for me is coupling. That said, there are many situations where they are the best solution available.

      The intent for the original post was not to suggest that stored procedures are the best practice by the way - it was to discuss best practices when using stored procedures.

Re: Moving SQL from perl to Stored Procedures
by Jenda (Abbot) on Sep 09, 2006 at 11:03 UTC

    Document your stored procedures

    I tend to put the docs just above the CREATE PROCEDURE statement. I also do not include the dates and descriptions of changes, that's what the source control software is for. As I keep all stored procedures as separate files in Perforce and do use the changelist descriptions I do not feel the need to repeat the same info elsewhere. If I need the history I know where to look.

    Identifying changes for promotion

    While I do keep the procs in Perforce and do have separate Dev, QA and Prod branches I usualy do not use this when generating the update scripts. I run a service on the servers indexing (generating MD5 hashes of) the objects (pages, dlls, executables, stored procs, views, schemas of tables, ...) and whenever preparing the update simply compare the indexes and pack the differing files.

    Naming conventions

    I like to be able to read the names of the procs. And since the stored procs are commands they should IMHO have the form of a command sentence. FetchThis, InsertThat, UpdateSomething, DeleteWhatever, ExpireJobs, ExportSites, ImportSitePostingParameters, GetSiteName, ...

    The most common verbs are Fetch = return a recordset with some data, Get = return a few pieces of info in OUTPUT params, Insert = insert a new row to a table and return the generated ID, Update = update a row (usually using the ID), Set = insert or update something that doesn't have its own generated ID (like some additional details for some object), Delete = delete something.

    Dynamic Where clauses

    As explained in Re^3: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging it's most efficient, even if inconvenient, to branch out and write several SELECT statements.

Re: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging
by imp (Priest) on Sep 09, 2006 at 04:04 UTC
    Paging is difficult in SQL Server 2000 for the following reasons:
    • You cannot use a variable with TOP
    • You cannot specify an offset with TOP
      Use cursors! Update: BTW: These dynamic where clauses are very expensive because the optimizer cannot optimize them. Better way: Generate SQL-Statements with a fixed filter, and execute them from within a procedure. This is much more performant. There was an article in SQL - mag in the black belt sql column recently. Maybe I should meditate someday about "SQL-cursors for the unwary" or something like that.

        You mean build an SQL statement in a nvarachar variable and then execute it? Well well well ... this has two big problems. The first being that this way you force the server to generate a query plan for each execution which may be a rather expensive operation. And the second, even more important, is that this is prone to SQL injection attacks. And it's even more dangerous than SQL statements built in the pages/scripts. First because it's kinda hidden in something that's by most believed to be safe and second because the statement is executed with the permissions of the stored procedure owner (usually dbo!) instead of the permissions of the user accessing the database!

        I can't say I never use this, but I do try to take this as a last resort solution. It's much better, even if not very convenient to branch out and write several SELECT statements. That way the server may generate and REMEMBER the query plans for each of the cases and there is nothing ad-hoc generated and therefore dangerous:

        CREATE PROCEDURE [dbo].[dynamic_example] @lastname varchar(50) = NULL, @firstname varchar(50) = NULL AS BEGIN IF (@lastname IS NULL) IF (@firstname IS NULL) SELECT * from Users ELSE SELECT * from Users WHERE (users.firstname LIKE @firstname) ELSE IF (@firstname IS NULL) SELECT * from Users WHERE (users.lastname LIKE @lastname) ELSE SELECT * from Users WHERE (users.lastname LIKE @lastname) AND (users.firstname LIKE @firstname) END GO
        I know it's inconvenient, especially as the number of conditions increases, but it's the safest and most efficient solution. It's a shame I can't use some macros in SQL :-(

      Could you show me how do you assign the sort_id in your first strategy? I don't have access to an SQL Server at the moment, but the difference in the speed of those two strategies sound too big. Do you use

      declare @paging_buffer table( sort_id int not null identity(1,1) PRIMARY KEY, user_id int not null, firstname varchar(50), lastname varchar(50) );
      ?

        I suspect that I was a fool when I tested that method, and that the sort_id is a perfectly reasonable approach. The dynamic paging text is from something I wrote awhile, and I didn't retest my conclusions - my apologies.

        I'll correct the benchmark data provided and add a note about your correction to the end - thanks for catching my error.

        Update - Redid benchmark

        The sort_id approach works better. Thanks again for pointing out the error.
Re: Moving SQL from perl to Stored Procedures
by derby (Abbot) on Sep 09, 2006 at 10:32 UTC

    and treat it as I would a class API

    ++ and given a decent naming convention (we use NOUN_I, NOUN_GET, NOUN_U, NOUN_D - I couldn't convince them to use _C and _R - dbas are a strange breed), you can use your dbs stock utilities to automagically create an API class.

    -derby
Re: Moving SQL from perl to Stored Procedures
by dws (Chancellor) on Sep 10, 2006 at 05:56 UTC

    The idea of moving logic into stored procedure makes me squeamish. I've only had direct experience with a significant number of stored procedures once, but lots of indirect exposure through friends.

    There were two common themes.

    The first is that it always seemed to split development into two camps, the non-database developers and the database developers. And too often, friction between the camps built up and hindered progress.

    The second one, lack of ability to unit test business logic that had migrated into stored procedures, may have been resolved by now. But when I was dealing with it, the difficulty of writing unit tests for stored procedures really sucked, and slowed down development. You could test functionally, by throwing data at the stored procedure and then probing the database to make sure the right things had happened, but such tests seem to require a lot of maintenance, and tend to miss problems of the type that a mock object would spot right away.

    I would need to see a pretty compelling performance argument before I would consider putting any significant business logic in stored procedures. But that's me. YMMV.

Re: Moving SQL from perl to Stored Procedures
by gam3 (Curate) on Sep 10, 2006 at 13:38 UTC
    It seems to me that there are several different arguments going on here and one that has nothing to do with Stored procedures is:

    Should there be SQL mixed with perl

    This question has nothing to do with the stored procedure question. If you are using code like
    my $sql = "EXEC dynamic_example @firstname = ?"; $dbh->execute('chris%')
    to get SQL out of your perl code, then I whould suggest that you should use code like
    $dbp->execute('dynamic_example', firstname => 'chris');
    where $dbp holds an object that can either use ad-hoc SQL or stored procedures. The advantages to this approach are that interface to the database is a single object and all the procedures or sql that is accessing the database are well known. The main disadvantage is that if the database needs to accesed in a new way the $dpb object must be extended. And if Stored procedures are used, then the database has to be extended as well.
    -- gam3
    A picture is worth a thousand words, but takes 200K.
Re: Moving SQL from perl to Stored Procedures
by Anonymous Monk on Sep 09, 2006 at 11:13 UTC
    Stored procedures *shakes head*.

    Any university student will know that dividing application logic between application and SQL server is messy at best. It's a direct violation of the principle of cohesion which seeks to keep related logic together.

    Indeed, the principle of coupling is also attacked using stored procedures.

    I see industry racing to use these "cool ideas" but in reality so many of them are unsound. Too many cowboys in industry..

      You're looking at it wrong -- we're maintaining coupling in the database -- I'm keeping the data with its associated logic.

      Should I leave the data to be misinterpreted by each application's programmers?

      Should data security be moved from the database and to the application?

      Should each application implement the same accessors or business logic? (remember -- they might not all be using the same programming language)

      Any person with experience knows that every situation is different, and there are advantages and disadvantages to almost every method. There may be times when it makes sense to keep all of the business logic out of the database (eg, you have reason to believe that your company may be switching database vendors in the future, and you'll have to re-code all stored procedures), and there are times when keeping it all close to the database makes sense (eg, it's a large database that's used by multiple applications, and you need to keep applications from making calls that might adversely affect the overall performance of the database).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://572114]
Approved by planetscape
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (9)
As of 2024-04-18 13:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found