Very short guide to DBI *********************** About SQL ========= SQL stands for `Structured Query Language'. It was invented by Oracle in the 1970s. It's a straightforward soltuion to the problem of how to get data out of a database. To SQL, a database is made of tables. Each table is a big rectangle with several columns. For example, here's a table named `people': LASTNAME FIRSTNAME ID POSTAL_CODE AGE SEX Gauss Karl 119 19107 30 M Smith Mark 3 T2V 3V4 53 M Noether Emmy 118 19107 31 F Smith Jeff 28 K2G 5J9 19 M Hamilton William 247 10139 2 M Rows of the tables are called `rows' or `records' (or sometimes `tuples'.) Each row represents one object. For example, the first row of the table represents a 30-year-old male whose name is Mark Dominus, who lives at postal code 19107, and whose ID number is 119. The columns are called `fields' or `columns'. There are four important things one can do with such a table: SELECT - Find all the records that have a certain property INSERT - Add new records DELETE - Remove old records UPDATE - Modify records that are already there Those are the four most important SQL commands. Here are examples: SELECT firstname FROM people WHERE lastname = 'Smith' (Locate the first names of all the Smiths.) DELETE FROM people WHERE id = 3 (Delete Mark Smith from the table) UPDATE people SET age = age+1 WHERE id = 247 (William Hamilton just had a birthday.) INSERT INTO people VALUES ('Euler', 'Leonhard', 248, NULL, 58, 'M') (Add Leonhard Euler to the table.) There are a bunch of other SQL commands for creating and discarding tables, for granting and revoking access permissions, for comitting and abandoning transactions, and soforth. More details are outside the scope of the course, but you should be able to find a book in any good bookstore. About Databases =============== Every SQL database is a little different. You talk to some database over the network and make requests of the database engine; other databases you talk to through files or something else. Typically when you buy a commercial database, you get a `library' with it. The library has some functions defined in it for talking to the database. You can write a C program that calls the functions in the library when it wants to talk to the database. Every vendor's library is different. The names of the functions vary, and the order in which you call them varies, and the details of passing queries to the functions and getting the data back out will vary. What DBI is For =============== DBI is designed to protect you from the details of the vendor libraries. It has a very simple interface for saying what SQL queries you want to make, and for getting the results back. DBI doesn't know how to talk to any particular database, but it does know how to locate and load in `DBD' modules. The DBD modules have the vendor libraries in them and know how to talk to the real database. When you ask DBI to make a query for you, it sends to to the appropriate DBD module, which does whatever is necessary to communicate with the real database, gets the results back, and passes them to DBI. Then DBI gives you the results. +-------------+ | | +--+ DBD::Oracle | .......... +--------------+ +-------+ | | +--------+ . . | | | +-+ | |Library |........ Oracle . | Your program +----+ D B I | +----+--------+ . . | | | +-+ .......... +--------------+ +-------+ | +----------+ +--+ | .............. | DBD::CSV | . . | |..... Text Files . +----------+ . . .............. There are DBD modules for talking to every important kind of SQL database. For example, DBD::Oracle will talk to Oracle, and DBD::Sybase will talk to Sybase. DBD::ODBC will talk to any ODBC database including Microsoft Acesss. (There is no DBD module for talking to Access directly; I was mistaken about this in the lecture.) DBD::CSV allows SQL queries on plain text files. A complete list of DBD modules is at http://www.cpan.org/modules/00modlist.long.html#7)DatabaseInterf You can download these from http://www.cpan.org/modules/by-module/DBD/ DBI is available from http://www.cpan.org/modules/by-module/DBI/ How to Use DBI ============== Here's a typical program. When you run it, it waits for you to type a last name. Then it searches the database for people with that last name and prints out the full name and ID number for each person it finds. For example: Enter name> Smith 3: Mark Smith 28: Jeff Smith Enter name> Noether 118: Emmy Noether Enter name> Snonkopus No names matched `Snonkopus'. Enter name> ^D Here is the code: use DBI; my $dbh = DBI->connect('DBI:Oracle:payroll') or die "Couldn't connect to database: " . DBI->errmsg; my $sth = $dbh->prepare('SELECT * FROM people WHERE last_name = ?') or die "Couldn't prepare statement: " . $dbh->errmsg; print "Enter name> "; while ($last_name = <>) { # Read input from the user my @data; chomp $last_name; $sth->execute($last_name) # Execute the query or die "Couldn't execute statement: " . $sth->errmsg; if ($sth->num_rows == 0) { print "No names matched `$last_name'.\n\n"; } # Read the matching records and print them out while (@data = $sth->fetchrow_array()) { my $first_name = $data[0]; my $id = $data[2]; print ">> $id: $first_name $last_name\n"; } print "\n"; print "Enter name> "; } $sth->finish; $dbh->disconnect; Explanation of the Example ========================== use DBI; This loads in the DBI module. Notice that we don't have to load in any DBD module. DBI will do that for us when it needs to. my $dbh = DBI->connect('DBI:Oracle:payroll'); or die "Couldn't connect to database: " . DBI->errmsg; The `connect' call tries to connect to a database. The first argument, `DBI:Oracle:payroll', tells DBI what kind of database it is connecting to. The `Oracle' part tells it to load DBD::Oracle and to use that to communicate with the database. The `payroll' is the name of the database we will be searching. If we were going to supply a username and password to the database, we would do it in the `connect' call: my $dbh = DBI->connect('DBI:Oracle:payroll', 'username', 'password'); or die "Couldn't connect to database: " . DBI->errmsg; If DBI connects to the database, it returns a `database handle' object, which we store into $dbh. This object represents the database connection. We can be connected to many databases at once and have many such database connection obejcts. If DBI can't connect, it returns an undefined value. In this case, we use `die' to abort the program with an error message. DBI->errmsg returns the reason why we couldn't connect---``Bad password'' for example. my $sth = $dbh->prepare('SELECT * FROM people WHERE last_name = ?') or die "Couldn't prepare statement: " . $dbh->errmsg; The `prepare' call prepares a query to be executed by the database. The argument is any SQL at all. On high-end databases, `prepare' will send the SQL to the database server, which will compile it. If `prepare' is successful, it returns a `statement handle' object which represents the statement; otherwise it returns an undefined value and we abort the program. $dbh->errmsg will return the reason for failure, which might be ``Syntax error in SQL''. It gets this reason from the actual database, if possible. The `?' in the SQL will be filled in later. Most databases can handle this. For some databases that don't understand the `?', the DBD module will emulate it for you and will pretend that the database understands how to fill values in later, even though it doesn't. print "Enter name> "; Here we just print a prompt for the user. while ($last_name = <>) { # Read input from the user ... } This loop will repeat over and over again as long as the user enters a last name. If they type a blank line, it will exit. The Perl <> symbol means to read from the terminal or from files named on the command line if there were any. my @data; This declares a variable to hold the data that we will get back from the database. chomp $last_name; This trims the newline character off the end of the user's input. $sth->execute($last_name) # Execute the query or die "Couldn't execute statement: " . $sth->errmsg; `execute' executes the statement that we prepared before. The argument `$last_name' is substituted into the SQL in place of the `?' that we saw earlier. `execute' returns a true value if it succeeds and a false value otherwise, so we about if for some reason the execution fails. if ($sth->num_rows == 0) { print "No names matched `$last_name'.\n\n"; } The `num_rows' method returns the number of rows of the database that were selected. If no rows were selected, then there is nobody in the database with the lastname that the user is looking for. In that case, we print out a message. while (@data = $sth->fetchrow_array()) { ... } `fetchrow_array' returns one of the selected rows from the database. You get back an array whose elements contain the data from the selected row. In this case, the array you get back has six elements. The first element is the person's first name; the second element is the last name; the third element is the ID, and then the other elements are the postal code, age, and sex. Each time you call `fetchrow_array', you get back a different record from the database. When there are no more matching records, `fetchrow_array' returns the empty list and the `while' loop exits. my $first_name = $data[0]; my $id = $data[2]; These lines extract the first name and the ID number from the record data. print ">> $id: $first_name $last_name\n"; This prints out the result. print "\n"; print "Enter name> "; Once we're done reporting about the result of the query, we print another prompt so that the user can enter another name. $sth->finish; $dbh->disconnect; When the user has finished querying the database, they type a blank line and the main `while' loop exits. `finish' tells the database that we won't need this statement handle any more and it can forget the SQL in it. `disconnect' closes the connection to the database. Miscellaneous ============= `do' ---- If you're doing an UPDATE, INSERT, or DELETE there is no data that comes back from the database, so there is a short cut. You can say $dbh->do('DELETE FROM people WHERE age > 65'); for example, and DBI will prepare the statement, execute it, and finish it. `do' returns a true value if it succeeded, and a false value if it failed. Actually, if it succeeds it returns the number of affected rows. In the example it would return the number of rows that were actually deleted. (This value is true even if it is 0, which is bizarre, because 0 is usually false in Perl. But it's convenient becaus eyou can use it either as a number or as a success code, and it works both ways.) Auto-Commit ----------- When you make the `connect' call, you can specify an `Auto-Commit' option which will perform an automatic `COMMIT' operation after every successful query. See the manual for full details. Automatic Error Handling ------------------------ When you make the `connect' call, you can specify a `Raise Errors' option that handles errors for you automatically. When an error occurs, DBI will aport your program instead of returning a failure code. If all you want is to abort the program on an error, this can be convenient. Again, see the manual for details. Don't do This ------------- People might try to write the example program like this: while ($last_name = <>) { my $sth = $dbh->prepare('SELECT * FROM people WHERE last_name = $last_name'); $sth->execute(); # and so on ... } Here we interpolated the value of $last_name directly into the SQL in the `prepare' call. This is a bad thing to do for two reasons. First, `prepare' calls can take a long time. The database server has to compile the SQL and figure out how it is going to run the query. If you have many similar queries, that is a waste of time. Second, it will not work if `$last_name' contains O'Malley D'Amico or some other name with an ', because the ' has a special meaning to SQL, which will not understand when you ask it to prepare a statement that looks like SELECT * FROM people WHERE last_name = O'Malley It will see the ' and complain that don't have a matching ' somewhere else. If you use a last name with a space in it, like `Van Beethoven', it will fail for a similar reason. Just use the `?' form that I showed earlier. It is faster and safer. There is no problem with names that have ' or spaces in them. Any query can have as many ?'s as you want. For example: SELECT * FROM people WHERE age > ? AND sex = ? INSERT INTO people VALUES (?, ?, next_id(), ?, ?, ?)