good chemistry is complicated, and a little bit messy -LW |
|
PerlMonks |
DBI recipesby gmax (Abbot) |
on Aug 17, 2003 at 18:47 UTC ( [id://284436]=perlmeditation: print w/replies, xml ) | Need Help?? |
Programming with the DBI becomes a matter of habit. You may choose to code directly with the DBI rather than using on of the many wrappers available on CPAN because of efficiency concerns, or because you are dealing with legacy code, or simply because you want to have a grip at the core of things. Whichever reason for using the DBI directly, the time comes when you have to face one of the simple problems listed here. If you were looking for answers, this is the place to go. If you found the answer on your own, let's compare notes. Either way, enjoy the reading. Table of contents
Checking for an existing tableIf you want to initiate action on a table, or a group of tables from a specific database, it would be better to make sure that such table exists. Sometimes you don't need to worry, since your scripts only apply to a well designed and never-changing database. Sometimes, though, you are not so lucky, or you simply need to check if a table - perhaps a temporary one - was already created. Standard SQL does not have a way to tell you that, even though some dialects provide a useful idiom (MySQL allows CREATE TABLE IF NOT EXISTS and DROP TABLE IF EXISTS). If your database of choice doesn't give you such facility, or you want to create something portable to more than one DBMS, this technique could be handy.
This routine uses DBI's tables method as a primary way of checking. Unfortunately, not every DBMS answers to tables in a consistent way. Therefore, I use a double standard. If tables returns a non empty array, then I test each element until I find one matching the table being searched. If the array is empty, it could be either that the database is empty, or that the DBMS fails to report. In this case, just to be on the safe side, I use the second method, trying to issue a SELECT statement with a false WHERE clause. If this method returns an error, it means that the table does not exist. Using eval and localizing the database handler's attributes ensures some degree of safety. I can't guarantee that it will work with every database driver, but it is the best I could think of. Inserting recordsThere are basically two ways of inserting records through the DBI. The easiest one is using the do method, passing the whole query, without worrying about parameters. Basically, the same query that you'd type in a native database client would also go through a do method. This is something you should do if you have to insert just one record. If you need to insert many of them within one loop, then it is strongly advisable to use prepare//execute. Using prepare and /execute for insertion means (most of the times) using placeholders, which is the recommended method of dealing with this problem. Placeholders are efficient and they guarantee that the values are properly quoted. The only thing sort of unpleasant with them is that you can get confused about how many placeholders to enter or about the order of the values to submit. In the first case, don't worry, since the DBI will complain about a wrong number of placeholders. In the second case, it could become tricky, if you swap things like first and last name. But you don't have to rely only on your patience and accuracy. You can use some idioms to fortify your query. Creating an insertion queryYour first priority when inserting records is creating an insertion query that has the right list of columns and a matching number of placeholders. To ensure that you are always referring to the right list of fields, it is useful to store such list into an array.
In this example, there are only three columns, but the idiom would work the same way if we had 100 or more. From the @fields array it is easy to create a string of fields to be used either in an INSERT or a SELECT statement. In this case, the INSERT query is
Now we can insert some records Inserting scalarsThe easiest insertion is when using scalars as arguments for execute.
It is obvious that executing a query with scalars is only useful when the scalars are updated. For example, within a loop, we may call a function that fills the values with appropriate contents, so that we can call execute with valid items.
When dealing with large data sets, scalars aren't always the best choice. Data is more likely to come in more complex structures, such as lists of lists or lists of hashes. Inserting a list of listsSyntactically, a lol is perhaps the most natural structure to pass to an execute method. This is because execute expects a list of values. Therefore an array is the logical candidate for this function.
The only issue with this insertion is that we must insure that the array elements are in the same order as the fields to insert. No shortcuts available here. Either we pay attention when we encode the array, or we ensure that the items are filled in the proper order when we get the data from an external source. Using a list of lists we trade speed for accuracy. However, if we have some means of making sure that the order is correct, this is the fastest insertion method available from the DBI. I must mention that recent versions of the DBI allow also the method execute_array with the related bind_param_array, and it is useful for mass insertion. However, unless your DBMS supports such a method directly, execute_array is actually slower than other methods. Inserting a list of hashesIf we want to be sure that each value is going to the corresponding column in the database, we can use a data structure where the values are referred by name, rather than by position. A list of hashes looks like the best solution for this problem. The naive approach would be to iterate the list and access the items by name.
But this style of coding, although correct, may take long for a large number of columns, and we can still make a mistake and swap positions, thus resulting in an incorrect insertion. It's better to take advantage of our column list and exploit Perl capability of creating hash slices.
In the above example, not only we write less, but we are also sure that the order of values comes according to the columns list, which was the one we used to create the insertion query in the first place. For large datasets with many columns, this idiom is definitely the best way to go. Read more about this method in chromatic's article about the DBI (off site). About fetching recordsOnce you have successfully inserted records into a database, you'd like to get them back, possibly in a format that suits your current needs, rather than the needs you had when the data was created. Fetching data from a database is a collaborative effort. First, you instruct the database about what you want to get, and the database will eventually comply and prepare for you a dataset ready for consumption. It's important to remember that, no matter how complex your query, the ultimate result will always be a bi-dimensional array, rows and columns filled with the data you asked for. But you don't want always bi-dimensional arrays. Your application needs may be less (or more) demanding than that. Therefore you need to transform the result through the DBI, converting the dataset into the data structure more suitable for your current task. Even though you can't force the DBMS to change the only structure it can deal with, it will assist you in your translation. The following sections examine a few of the idioms offered by the DBI when fetching records. Unlike the DBI manual, where you get help on specific instructions, the matter in this post is presented by task. Hopefully, you can find the recipe tailored for your application. Creating an array from a columnLet's assume you need to get all the names from the employee table, and store them into an array. Depending on how your query is written, you have several choices. Before examining the idiomatic solutions, let's have a look at the brute force approach:
This code serves the purpose and it gets the job done. However, there are better ways. Using map and selectall_arrayrefAn easier way is to fetch the entire dataset as an array ref (actually, it's a reference to an array of references) and then filter the wanted column with map.
Notice that we could have just selected "name", like we did in the previous example. The reason for the extra columns is to demonstrate how to use this method in cases where you have limited control over the columns returned by the DBMS. For example, the output for the DESCRIBE command in Oracle or MySQL will return multiple columns, and you may want to isolate only one of them. Coming back to the example at hand, the result from selectall_arrayref is a reference to a bi-dimensional array. Using the @{} construct it is turned into an array, and map will only filter the second element from each row. As a result, we get a flat array containing only the names. Using selectcol_arrayrefThe DBI has also a specialized function to accomplish this task. selectcol_arrayref returns a reference to an mono-dimensional array. If we have a query that should return only one row, or having the wanted row in first position, then this method returns what we want. The only thing we have to do is de-referencing the array.
using selectall_arrayref and an array sliceThe last one was easy. But we can make our life more complicated, if we want, using yet another approach.
The cryptic looking code in this example is just for educational purposes. I wouldn't do it in a real application, but again it could be useful to know that it is possible. fetchall_arrayref accepts an optional parameter, a "slice," telling the DBI which columns we want and in which format. If we use an array slice, as in this example, its indexes mark the columns to fetch. [0, -2] means take the first and second-to-last column. Therefore, our dataset will have two columns. Then, map will filter the second column, which is the name. Of course, in a three-column dataset, [0, -2] is the same as [0, 1], or [-3, -2]. Just to show what you can do. Getting a list of listsNo sweat here. If we want a list of lists, that's what a RDBMS is always happy to provide you and the DBI is glad to help. There are two methods to get a lol from the DBI. Either you use the prepare /execute / fetchall sequence, or you use selectall_arrayref, which is a handy shortcut. The following snippets are equivalent.
Getting a list of hashesA list of hashes is a common need. For example, if you are using HTML::Template, you should find this method useful. You can either build the loh manually, fetching a hash and pushing it to an array, but the DBI can do it for you quite efficiently.
Notice the empty hashref {} given as argument to fetchall_arrayref. That's an instruction to the DBI that you want each row of the result as an hash reference. A shortcut for the above idiom is to use selectall_arrayref with the Slice parameter.
Be aware that, when fetching hashes, there is a possible pitfall that you should be aware of. See the relevant section below. Getting a partial list of hashesIf you want some degree of control over the result, you may limit the columns to fetch, by including in the hash reference the names of the wanted keys.
The values you pass are not important. You could also say {name=> undef, salary => undef} and it would have been the same. Only the keys are relevant. Notice also that the order of your hashref can be different from the order of the resulting hashref, since hashes in Perl don't have a guaranteed order (You all should know that, but a reminder shouldn't hurt :) ). Getting the whole dataset into a hashIf you aren't satisfied with the default data structures offered by the DBI, you can build your own ones, using some imagination. Transforming a dataset into a hash is easy. Using map and selectall_arrayrefselectall_arrayref returns a list of lists. Using map, you can convert it into a simple hash.
Using map, shift and selectall_arrayrefYou can build a more complex structure, for example assigning to each hash value an array reference. Depending on which column you want to transform into a key, you can either assign the key directly or use shift if the desired key is in first position.
If you wanted to assign "name" as the hash key, then you could either modify the query to have "name" as first column in the dataset or modify the map code to go with your wishes ( map {$_->[1], [ $_->[0], $_->[2] ] }). Getting a hash of hashesThe default DBI method for a hash of hashes is selectall_hashref, which needs as arguments the query to execute and the column to use as key, indicated by position. The position starts with 1. Don't get confused with the array index that would be 0 to get the same column from one row of a lol.
Unlike the manual example above, where the hash was created with map, you can only create a hash of hashes, and the key used is also repeated among the values. The usefulness of such structure is left entirely to the reader imagination. A pitfall with hash of hashesAs always when using hashes, there may be some hidden traps that you need to take into account. If we modify the latest example and we use "salary" as key, we get the following data.
It looks almost the same as the previous one, except that there are only four records instead of five. Where is "Fred?" If you remember how hashes work, the answer is easy. Since the key was "salary", when a new hash with key "5000" was created, its value was overwritten, and the previous one discarded. Lesson learned: whenever you use selectall_hashref, be sure to use a unique key or you'll suffer a data loss. Binding columns and fetchingfetchall_arrayref is sometimes the fastest method for retrieving records. It depends on the number of rows to fetch and on their size. (see Speeding up the DBI for a comparison.) The fastest method, which you can always count on, is fetchrow_arrayref. It is even faster when you combine it with binding. What is binding? To understand what it is and why it is convenient, let me tell you how the DBI fetches records from the DBMS. Whenever a dataset is ready, the DBI uses one variable to transport the record from the DBMS to your application. Then, the data is copied from that variable into yours. With binding, you are telling the DBI "Don't use your bucket to fetch water. Use mine, so we don't waste time pouring from one to the other." The DBI has two methods to bind column results to your variables.
Let's see how to use them. Binding scalarsBinding scalars is straightforward. Just have your scalar variables ready and call bind_col for each column of the dataset. Columns are specified by position. The variable bound must be a reference to your scalar.
Apart from speed, the great advantage of binding scalars is that you can call columns by name, rather than by position, making your application clearer. Notice that the binding must happen after the call to execute. According to the DBI docs, this is not mandatory and a DBD driver could modify this behavior. If you write code with portability in mind, try to follow this rule, though. Binding a listTo simplify the binding process, you can use a list instead of binding each scalar individually.
Remember that the syntax \( list ) returns a list of references to each element of the list. It does not work with arrays, for which you shoud refer to the next section. Remember also that bind_columns requires that you pass as many elements as there are columns in the SELECT statement. Binding an arrayLet's say you are willing to relinquish the easy way of referring to the columns by name and want to bind the result to an array, because you don't want to pollute your application with many variables. Therefore you decide to bind the result set to an array. You can't use the syntax in the previous section, since \@array is a reference to an array, while bind_columns requires a list of references. Once more, map to the rescue, to help passing the reference of each array element to bind_columns. The only tricky part here is to establish the right number of elements for the array
You can also combine scalars and arrays and do something hybrid like this one, even though I don't know what good it can do to you. If you have such an array already established for other purposes, remember that also this idiom is available. Notice that, since the array elements are already references, map intervention is no longer needed.
Fetching a hash without speed penaltyIn another article I measured the relative speed of fetching methods, and fetchrow_hashref came as the slowest one. So much slow, in fact, that you often choose not to use it, reasoning that you'd rather call columns by position than suffer the speed penalty. However, if your only concern is to refer to columns by name, you can use a hash with almost the same efficiency you'd get with an array reference. Binding a hashBinding a hash is not difficult, once you know the syntax. It is as easy as binding a scalar, since you are not binding a hash, but the hash values.
There are two distinct advantages here. The first is speed. You are fetching records at the same speed as fetching an array reference. And second, you are referring to columns by name. Not only that, but you are using the names that you have chosen, not the ones coming from the database, as in the case of fetchrow_hashref.
The difference between this idiom and fetchrow_hashref that you should keep in mind, is that with fetchrow_hashref you are creating a new hash at each loop, while here you are using the same data structure over and over. Therefore, if your purpose was to use a bound hash to push it into an array, forget it, because you would create an array of references where each element is a reference to the same hash. Use this idiom only if you want to improve your application readability, and referring to the columns by name. Binding a large hashDatasets can be quite large. Actually it is in such cases that you want to use hashes instead of arrays, because it is easy to use a record by position when you have just a handful of columns. If your result has fifty columns, instead, referring to them by position is a secure recipe for disaster. On the other hand, binding a large hash to a result set could be a long task. Therefore we can use an idiom similar to the ones that have simplified our life when inserting records. To carry out this task, we create an array of column names that must be in the same order as the ones in the SELECT statement. It could be the same list (recommended!), or an equivalent list with the names of our choice. After that, our ubiquitous map will get the job done.
Comparing efficiency between fetchrow_hashref and manual hash fetchingIf you want to be convinced of this hash fetching method efficiency, here is a simple comparison. Feel free to modify this code to suit your needs. The important thing to note is that a bound hash is almost as fast as an array reference.
A trap to avoid with fetchrow_hashrefBefore closing this long review, I would like to warn against another hidden problem with hashes. It is not something that you find in your everyday applications, but it can happen. So here it goes. The problem arises when your result set, by mean of a JOIN, has one or more columns with the same name. In this case, an arrayref will report all the columns without even noticing that a problem was there, while a hashref will lose the additional columns. Let's consider a simple example (which is meaningless in its stand-alone format, but it's just for the sake of the example).
The CROSS JOIN in this query will return one row with two columns, both containing '1'. Using a hashref, you'll get only one, while using an array-based fetching method you get both of them. Of course, you can solve this problem by using aliases in your query (SELECT a.id AS id1, b.id AS id2) but you never know when code written by somebody else could end up this way. Just be aware of this remote possibility if you find less columns than you were expecting. ConclusionThis is hardly the end of it. Perl is such a rich language and DBI is such a vast module that you can discover new idioms every day. Nevertheless, the list of idioms in this article should be enough to satisfy the average user's curiosity. If you have any good idiom with the DBI, let me hear about it! Update (1) Update (2) Update (3) Update (3) Update (4) _ _ _ _ (_|| | |(_|>< _|
Back to
Meditations
|
|