http://qs321.pair.com?node_id=7563

The easiest way to read data from a database with Perl is by using the DBI module. DBI is available from CPAN, and is quite possibly already installed on your system.

For those in a hurry, here's the quick answer:

use strict; use DBI; # Connect to the database # See footnote 1 my $dbh = DBI->connect('DBI:mysql:my_database', 'my_username', 'my_pas +sword') or die "Couldn't open database: $DBI::errstr; stopped"; # Prepare the SQL query for execution my $sth = $dbh->prepare(<<End_SQL) or die "Couldn't prepare statement: + $DBI::errstr; stopped"; SELECT field_1, field_2, field_3 FROM my_table WHERE field_1 = 'my_con +dition' End_SQL # Execute the query $sth->execute() or die "Couldn't execute statement: $DBI::errstr; stop +ped"; # Fetch each row and print it while ( my ($field1, $field2, $field3) = $sth->fetchrow_array() ) { print STDOUT "Field 1: $field1 Field 2: $field2 Field 3: $field +3\n"; } # Disconnect from the database $dbh->disconnect();

In Detail

DBI is Perl's DataBase Interface. There are many different databases: Oracle, mSQL, MySQL, Ingres... even files containing data separated by columns could be considered databases. DBI works with practically all of them.

To get information from a database, typically we have to:

At the same time, we need to constantly make sure that all is well with our database connection. We'll begin by examining the code snippet above in detail, then move on to making your database code efficient, robust, and elegant. (Well, at least two of those.)

This document assumes that you know enough about SQL to write your own SELECT statement. I don't go into too much detail on the internals of DBI; get the DBI book for more information.

How DBI Works

There are many different databases out there, and each one has its own way of communicating. For that reason, DBI requires a database driver module to talk to the database. The driver module handles all of the details about connecting to and communicating with the database. DBI sits on top of the driver; its job is to make reading from (and writing to) all databases as similar as possible. That way, if your project grows from something that could be handled with mSQL to something needing Oracle, you can make that change without altering the code very much.

Database driver modules are Perl modules like any other, and often they contain documentation. They are all inside the root package DBD. So, to get information on an Oracle driver you've installed, you can type:

perldoc DBD::Oracle

Other than installing the module, and possibly reading the documentation, you rarely need to deal directly with the database driver module. DBI handles all of that.

Connecting To The Database

# Connect to the database my $dbh = DBI->connect('DBI:mysql:my_database', 'my_username', 'my_pas +sword') or die "Couldn't open database: '$DBI::errstr'; stopped";

In order to get information out of the database, you must first connect to it. Since DBI deals with many different kinds of databases, you need to tell it what kind of database you have before it can connect to it. Also, you'll need to give it your username and password.

The string 'DBI:mysql:my_database' is the connect string. Different databases accept different formats for the connect string, but the usual format is:

DBI:driver:database

The 'driver' is the type of database system you have: 'Oracle', 'mysql', etc. 'Database' is the name of the database. Some database systems don't have names for individual databases; you can leave those blank.

'my_username' and 'my_password' are the username and password that you use to access the database system. These are not the username and password that you use to log in to your account! Some databases don't need a special username and password; in that case, leave them out. Check out the driver documentation for details.

If the connection fails for some reason, connect() will return false, and DBI will put its error message into $DBI::errstr. It's important to include that 'or die' in the statement, or else if there are any problems with your database, they won't show up until deep inside your program, and you'll waste hours trying to debug it.

Preparing And Executing The Query

Next, you need to create a query to send to the database, then send it to the database.

You prepare a query using the prepare statement. Its syntax is pretty simple:

my $sth = prepare(q{SELECT foo FROM bar WHERE baz='bap'}) or die "Couldn't prepare statement: $DBI::errstr; stopped";

$sth here is a statement handle. You're going to use it to get all of your information from the database.

Next, to send your query to the database, use the execute() method on the statement handle:

$sth->execute() or die "Couldn't execute statement: $DBI::errstr; stop +ped";
Again, it's important to put the "or die" on the end of the statement.

Reading Data From The Database

Now that you've sent the query, your database will grind a bit and spit out the rows that satisfy it. To read the data from those rows, you need to fetch them, one row at a time. There are three methods that you can use to fetch the data: fetchrow_array(), fetchrow_arrayref(), and fetchrow_hashref(). These methods are all similar. Each time that you call them, they'll return the data for a row returned by the query. When there are no more rows left, they'll return a false value. Fetchrow_array() returns each row as an array of scalar values; it returns an empty list when it's done. It's generally used like so:

# Fetch each row and print it while ( my ($field1, $field2, $field3) = $sth->fetchrow_array() ) { print STDOUT "Field 1: $field1 Field 2: $field2 Field 3: $field +3\n"; }
Fetchrow_arrayref() is similar. It returns each row as a reference to an array. This method is slightly more efficient than calling the fetchrow_array() method, since we don't need to spend time and energy copying the array values. Here's how it's used:
# Fetch each row and print it-- fetchrow_arrayref while ( my $fields = $sth->fetchrow_arrayref() ) { print STDOUT "Field 1: $fields->[0] Field 2: $fields->[1] Field + 3: $fields->[2]\n"; }
Fetchrow_hashref is one of the most flexible of these methods. It returns each row as a reference to a hash. The keys of the hash are the names of the columns, and the values are the values of that column for this particular row. Use it like so:
# Fetch each row and print it-- fetchrow_hashref while ( my $field_hash = $sth->fetchrow_hashref() ) { print STDOUT "Field 1: $field_hash->{'field1'} Field 2: $field_h +ash->{'field2'} Field 3: $field_hash->{'field3'}\n"; }
With this flexibility comes some inefficiency, though-- the computer uses some time creating the hash table, and looking things up in the hash takes longer than finding them in an array. Still, it's very useful for applications where efficiency isn't super-important.

Which brings us to...

Elegance, Efficiency, and Robustness

All of the above is fine when you know precisely what the query will be in advance. What about when you need to create a query where one or two of the variables might change? For example, say that you're writing an application that looks up prices for furniture in a database. The user enters the name of the item of furniture, the program sends the query to the database, and bammo, you've got the price. (Sorry, I'm shopping for sofas right now and getting a bit obsessed.) How do you do it?

Well, one way would be to create a new SQL statement each time you run the query. This will mostly work, but it's not recommended:

# A bad way of doing it while (my $furniture_name = <STDIN>) { chomp($furniture_name); my $sth = $dbh->prepare("SELECT price FROM furniture_prices WHERE +furniture_name='$furniture_name'") or die "Couldn't prepare: '$DBI::errstr'; stopped"; $sth->execute() or die "Couldn't execute: '$DBI::errstr'"; my ($price) = $sth->fetchrow_array(); print STDOUT "Item: $furniture_name Price: $price\n"; }

What's the problem with this? (Other than that there's no user interface.) Well, what if one of the chairs is a "Tess of the D'Ubervilles Chair"? Then we'd suddenly get an error, as the apostrophe would get read as an SQL end-quote. The second problem is efficiency. The computer must waste time in every query preparing a statement which is almost exactly like the statement it ran a second ago.

How can we solve these problems? Well, DBI allows the use of things called placeholders in statements. Placeholders tell the database, "at execution time, we're going to plug a value into here." Then, you pass the values to fill into the placeholders using the execute() command. Using placeholders, the code would look like this:

# A better way of doing it my $sth = $dbh->prepare("SELECT price FROM furniture_prices WHERE furn +iture_name=?") or die "Couldn't prepare: '$DBI::errstr'; stopped"; while (my $furniture_name = <STDIN>) { chomp($furniture_name); $sth->execute($furniture_name) or die "Couldn't execute: '$DBI::er +rstr'"; my ($price) = $sth->fetchrow_array(); print STDOUT "Item: $furniture_name Price: $price\n"; }
There, the statement handle is only prepared once, and we pass $furniture_type via the execute() method. Plus, when placeholders are used, DBI and the driver know enough to escape any troublesome characters like "'".

All of the above works relatively quickly if there's only one row in the response. But what if we wanted to print out a list of all of the different kinds of loveseats in the database? One might write something like this:

# One way of reading multiple rows my $sth = $dbh->prepare("SELECT furniture_name, price FROM furniture_p +rices WHERE furniture_type=?") or die "Couldn't prepare: '$DBI::errstr'; stopped"; while (my $furniture_type = <STDIN>) { chomp($furniture_type); $sth->execute($furniture_type) or die "Couldn't execute: '$DBI::er +rstr'"; while ( my ($furniture_name, $price) = $sth->fetchrow_array() ) { print STDOUT "Item: $furniture_name Price: $price\n"; } }
Each fetchrow_array() adds a little bit of extra time. The variable has to be read out of the database, copied to the array, then copied to the database. It'll work, but you can do it much faster. (And cooler.) How? Well, many databases allow you to bind variables directly to a statement, so that as soon as the row is fetched, the variable is filled in with the column's value.

You bind a scalar variable to a column using the bind_col() method.

$sth->bind_col($column, \$scalar)
So, a better way of doing the above would be:
# Faster way of reading multiple rows my $sth = $dbh->prepare("SELECT furniture_name, price FROM furniture_p +rices WHERE furniture_type=?") or die "Couldn't prepare: '$DBI::errstr'; stopped"; while (my $furniture_type = <STDIN>) { my ($furniture_name, $price); chomp($furniture_type); $sth->execute($furniture_type) or die "Couldn't execute: '$DBI::er +rstr'"; $sth->bind_col(1, \$furniture_name); $sth->bind_col(2, \$price); while ( $sth->fetch() ) { print STDOUT "Item: $furniture_name Price: $price\n"; } }
That's the most efficient way of reading from the database. The fetch() call above is really just a handy alias to fetchrow_arrayref(), but since we're not doing anything with the array reference, it's clearer to say fetch().

So that's about it. When you're done with the database, call $dbh->disconnect() to close the connection. If you need or want more information on DBI, do 'perldoc DBI', go to the DBI pages. Programming the Perl DBI is an excellent book by Alligator Descartes and Tim Bunce.

Note 1: Different database drivers sometimes handle connection differently; at time of writing DBD::Pg is one of these. See your database driver documentation for how to connect if you're having problems connecting.

Replies are listed 'Best First'.
Extra Information
by Revelation (Deacon) on Mar 26, 2002 at 17:06 UTC
    I'm going to go through some things that I thought were lacking in this tutorial. They aren't necessary, but are helpful.

    Selectrow_array:

    One thing many perl coders try to do is make their code as short as possible. One way to shorten the code is to use selectrow_array, instead of fetcrow_array. This is helpful if you are only executing a query once, and only getting one row of data.
    my @row = $dbh->selectrow_array(q{SELECT my_data1, my_data2 WHERE furn +iture_type=?},undef, $furniture_type);

    This is an extremely simple, and elegant way of selecting, and will be as optimized as any selectrow, that is executed without binding, as $furniture_type is the data executed.

    RaiseError:

    I find RaiseError to be extremely helpful for mission critical queries, and the majority of queries are mission critical to a script. Therefore it may be easyer to just set RaiseError, and stop using or die.
    $dbh = DBI->connect($DBDSN, $DBUser, $DBPassword, {RaiseError => 1});

    To unset RaiseError on a query in which you want to specify something else to do, if the query doesn't work just add the code: $dbh->{RaiseError} = 0;, and set raiserror back to 1, after you are done.

    Bind_columns:

    my ($furniture_name, $price); chomp($furniture_type); $sth->execute($furniture_type) or die "Couldn't execute: '$DBI::er +rstr'"; $sth->bind_col(1, \$furniture_name); $sth->bind_col(2, \$price);
    You don't need to bind a specific column here. Instead you can just bind all the columns in order, with bind_columns.
    So instead of all that code you can write:
    chomp($furniture_type); $sth->execute($furniture_type) or die "Couldn't execute: '$DBI::errstr +'"; $sth->bind_columns(\my ($furniture_name, $price));

    This gets rid of predeclaring the variables, and using bind_col twice.

    Update: Did some research yesterday and found that bind_columns is faster for *all* queries! Strange, but it seems that bind_columns is faster than a fetchrow_arrayref, even if only one row is being caught. This renders fetchrow_array(ref) obselete in a sense, without using bind_columns, as a few lines of code makes a query much faster, regardless of how many rows are being returned.

    You may wish to check out this to see the benchmarking, and query analysis that shows that.

    Finishing Queries:

    One thing that is rather interesting is the code: $sth->finish! Most tutorials don't teach users to finish their queries, as it's not really necessary, in the majoiry of situation. I would advise reading the finish section of perldoc DBI, as finish is pertinant in some situations. A link to that is here, as there's no reason for me to describe what has so aptly been described by Carlos Ramirez. It's really code optimization, but since you went into bind_col, I figured some more optimization would be helpful.

    Would like to commend the writer of this tutorial on the vast amount of information! Great job.
    Gyan Kapur
    gyan.kapur@rhhllp.com
Re: Reading from a database
by Anonymous Monk on Jul 09, 2001 at 17:08 UTC
    excellent info on optimization.. thanks!
Re: Reading from a database
by Anonymous Monk on Oct 24, 2002 at 23:24 UTC
    The detailed explanation on the code was excellent. I also liked the explanation on elegance, efficiency and robustness
Re: Reading from a database
by Sandy (Curate) on Dec 09, 2003 at 18:33 UTC
    As always, I am in a hurry

    I needed a quick answer on how to use place-holders (saw it in some pro-c code), and I thought, oh heck, lets read the tutorials before I dive into the DBI documentation.

    Lo and behold, the answeres to everything I wanted to ask were here and in the following responses.

    Thankyou

Re: Reading from a database
by witandhumor (Pilgrim) on Apr 24, 2004 at 02:28 UTC
    I am just beginning my journey into using DBI and I found this tutorial very helpful. Thank you for taking the time to post this!!!
Re: Reading from a database
by Anonymous Monk on May 20, 2004 at 17:28 UTC
    How to pass more than one placeholder values in s SQL statement? "SELECT * FROM TABLENAME WHERE COL_1 = ? AND COL_2 = ?"
Re: Reading from a database
by sanPerl (Friar) on Jul 31, 2006 at 20:56 UTC
    A very good tutorial, for a beginner like me. But there is no mention of 'selectall_arrayref'. I think this one eliminates need of separate prepare-execute-fetch processs
Re: Reading from a database
by Anonymous Monk on Jan 09, 2001 at 07:29 UTC
    i would have been interested in reading more about extraction and transformation of data from various sources into different databases using perl scripts.