This section introduces this library’s basic features and covers some practical issues with database programming in general and with this library in particular.
The following annotated program demonstrates how to connect to a database and perform simple queries. Some of the SQL syntax used below is PostgreSQL-specific, such as the syntax of query parameters (1γγ« rather than ?).
First we create a connection. Replace user, db, and password below with the appropriate values for your configuration (see Base Connections for other connection examples):
Use query-exec method to execute a SQL statement for effect.
"create temporary table the_numbers (n integer, d varchar(20))")"insert into the_numbers values (0, 'nothing')")"insert into the_numbers values (1, 'the loneliest number')")
When a query contains a SQL value that isn’t constant, refer to it through a “query parameter” rather than by dynamically computing the SQL query string (see also SQL Injection). Just provide the parameter values after the SQL statement in the query function call:
Every standard query function accepts query parameters. The SQL syntax for query parameters depends on the database system (see Statements). Other options for running parameterized queries are discussed below.
The query function is a more general way to execute a statement. It returns a structure encapsulating information about the statement’s execution. (But some of that information varies from system to system and is subject to change.)
(simple-result '((insert-id . #f) (affected-rows . 1)))
(rows-result
'(((name . "n") (typeid . 23) (type-size . 4) (type-mod . -1))
((name . "d") (typeid . 1043) (type-size . -1) (type-mod . 24)))
'(#(0 "nothing") #(2 "company")))
When the query is known to return rows and when the field descriptions are not needed, it is more convenient to use the query-rows function.
'(#(0 "nothing") #(2 "company"))
Use query-row for queries that are known to return exactly one row.
'#(0 "nothing")
Similarly, use query-list for queries that produce rows of exactly one column.
'("nothing" "the loneliest number" "company" "a crowd")
When a query is known to return a single value (one row and one column), use query-value .
4
query-value: query returned wrong number of rows
statement: "select d from the_numbers where n = 5"
expected: 1
got: 0
When a query may return zero or one rows, as the last example, use query-maybe-row or query-maybe-value instead.
#f
The in-query function produces a sequence that can be used with Racket’s iteration forms:
0: nothing
1: the loneliest number
2: company
3: a crowd
6
Errors in queries generally do not cause the connection to disconnect.
query-value: relation "nosuchtable" does not exist
SQLSTATE: 42P01
"okay to proceed!"
Queries may contain parameters. The easiest way to execute a parameterized query is to provide the parameters “inline” after the SQL statement in the query function call.
"select d from the_numbers where n = 1γγ«"2)"company"
"select n from the_numbers where n > 1γγ« and n < 2γγ«"03)'(1 2)
Alternatively, a parameterized query may be prepared in advance and executed later. Prepared statements can be executed multiple times with different parameter values.
'(0)
'(0 1)
When a connection’s work is done, it should be disconnected.
Database security requires both that the database back end be secured against unauthorized use and that authorized clients are not tricked or subverted into violating the database’s security.
Securing database back ends is mostly beyond the scope of this manual. In brief: choose sufficiently strong authentication methods and keep credentials secure, and follow the principle of least privilege: create and use roles that have the minimum permissions needed.
The following is an incomplete list of security issues related to database client programming.
SQL injection happens when part of a SQL statement that was intended as SQL literal data is instead interpreted as SQL code—possibly malicious SQL code.
Avoid dynamically creating SQL query strings by string concatenation or interpolation (eg, with string-append or format ). In most cases, it is possible to use parameterized queries instead. For example, instead of this
;WRONG! DANGER!usernew-passwd))
;for PostgreSQL, SQLite;for MySQL, SQLite, ODBC
The first form would choke on names like "Patrick O'Connor". Worse, it would be susceptible to attack by malicious input like "me' OR user='root'", which yields the following SQL statement:
UPDATE users SET passwd='whatever' WHERE user='me' OR user='root'
In contrast, using a parameterized query causes the parameterized SQL and its arguments to be submitted to the back end separately; the back end then combines them safely.
Only SQL literal values can be replaced with parameter placeholders; a SQL statement cannot be parameterized over a column name or a sort order, for example. In such cases, constructing the query dynamically may be the only feasible solution. But while the query construction may be influenced by external input, it should never directly incorporate external input without validation. That is, don’t do the following:
;WRONG! DANGER!
Instead, select the inserted SQL from known good alternatives:
;BETTER
Cross-site scripting—which should probably be called “HTML injection” or “markup injection”—is when arbitrary text from an untrusted source is embedded without escaping into an HTML page. The unstructured text from the untrusted source is reinterpreted as markup from the web server; if the reinterpreted markup contains embedded Javascript code, it executes with the security privileges associated with the web server’s domain.
This issue has little to do with databases per se except that such text is often stored in a database. This issue is mitigated by using structured markup representations like SXML or X-expressions (xexprs), since they automatically escape “markup” characters found in embedded text.
When connecting to a database server over a network, use TLS/SSL and take the steps necessary to use it securely. Without TLS, your data and possibly (depending on the authentication mechanism used) your connection credentials are exposed to any attackers with access to the network.
Both postgresql-connect and mysql-connect support TLS; connect with #:ssl'yes and #:ssl-contextsecure-context, where
If the server has a certificate issued by a well-known, trusted certificate authority (CA), you can probably just use (ssl-secure-client-context ) with the default verification sources managed by your operating system.
Otherwise, you must create a context that trusts your server’s certificate. Obtain the server’s certificate or the certificate of its CA as a PEM file; suppose the file is located at "/path/to/server-cert.pem". Create the context as follows:
See also ssl-load-verify-source! for other kinds of verification sources.
For ODBC connections, as always, consult the back end and ODBC driver documentation.
Achieving good database performance mostly consists of good database design and intelligent client behavior.
On the database design side, most important are wise use of indexes and choosing appropriate data representations. As an example of the latter, a regexp-based search using LIKE will probably be slower than a specialized full-text search feature for large data sets. Consult your database back end’s manual for additional performance advice.
The following sections describe a few client-side aspects of performance.
A common mistake is to fetch a large amount of data by running a query to get a set of initial records and then running another query inside a loop with an iteration for each of the initial records. This is sometimes called the “n+1 selects problem.” For example:
(make-contestant-recordnamewins))
The same information can be retrieved in a single query by performing a LEFT OUTER JOIN and grouping the results:
"FROM contestants LEFT OUTER JOIN contests ""ON contestants.id = contests.winner")#:group'(#("name""id"))#:group-mode'(list))])(make-contestant-recordnamewins))
The one-query form will perform better when database communication has high latency. On the other hand, it may duplicate the contents of the non-key name column, using more bandwidth. Another approach is to perform two queries:
"FROM contestants LEFT OUTER JOIN contests ""ON contestants.id = contests.winner")#:group'(#("id"))#:group-mode'(list))])
Compared with the one-query form, the two-query form requires additional communication, but it avoids duplicating name values in the OUTER JOIN results. If additional non-key contestant fields were to be retrieved, the bandwidth savings of this approach would be even greater.
Using transactions can dramatically improve the performance of bulk database operations, especially UPDATE and INSERT statements. As an extreme example, on commodity hardware in 2012, SQLite is capable of executing thousands of INSERT statements per second within a transaction, but it is capable of only dozens of single-INSERT transactions per second.
Connections cache implicitly prepared statements (that is, statements given in string form directly to a query function). The effect of the cache is to eliminate an extra round-trip to the server (to send the statement and receive a prepared statement handle), leaving just a single round-trip (to send parameters and receive results) per execution.
Currently, prepared statements are only cached within a transaction. The statement cache is flushed when entering or leaving a transaction and whenever a DDL statement is executed.
When testing the performance of database-backed programs, remember to test them in environments with realistic latency and bandwidth. High-latency environments may be roughly approximated with the high-latency-connection function, but there’s no substitute for the real thing.
Database systems use transactions to guarantee properties such as atomicity and isolation while accommodating concurrent reads and writes by the database’s clients. Within a transaction a client is insulated from the actions of other clients, but the transaction may be aborted and rolled back if the database system cannot reconcile it with other concurrent interactions. Some database systems are more adept at reconciling transactions than others, and most allow reconciliation to be tuned through the specification of isolation levels.
PostgreSQL supports very fine-grained reconciliation: two transactions that both read and modify the same table concurrently might both be allowed to complete if they involve disjoint sets of rows. However, clients should be prepared to retry transactions that fail with a exn:fail:sql exception with SQLSTATE matching #rx"^40...$"—typically "40001", “could not serialize access due to concurrent update.”
MySQL’s transaction behavior varies based on the storage drivers in use. Clients should be prepared to retry transactions that fail with a exn:fail:sql exception with SQLSTATE matching #rx"^40...$".
SQLite enforces a very coarse-grained policy: only one transaction is allowed to write to the database at a time, and thus concurrent writers are very likely to conflict. Clients should be prepared to retry transactions that fail with a exn:fail:sql exception with SQLSTATE of 'busy.
An alternative to retrying whole SQLite transactions is to start each transaction with the appropriate locking level, since a transaction usually fails when it is unable to upgrade its lock level. Start a transaction that only performs reads in the default mode, and start a transaction that may perform writes in 'immediate mode (see start-transaction ). That converts the problem of retrying whole transactions into the problem of retrying the initial BEGIN TRANSACTION statement, and this library already automatically retries individual statements that fail with 'busy errors. Depending on the length and frequency of the transactions, you may need to adjust busy-retry-limit (see sqlite3-connect ).
ODBC’s behavior varies depending on the driver and back end. See the appropriate database system’s documentation.
Using database connections in a web servlet is more complicated than in a standalone program. A single servlet potentially serves many requests at once, each in a separate request-handling thread. Furthermore, the use of send/suspend , send/suspend/dispatch , etc means that there are many places where a servlet may start and stop executing to service a request.
Why not use a single connection to handle all of a servlet’s requests? That is, create the connection with the servlet instance and never disconnect it. Such a servlet would look something like the following:
"bad-servlet.rkt"
....db-conn....)
The main problem with using one connection for all requests is that multiple threads accessing the same connection are not properly isolated. For example, if one thread is accessing the connection within a transaction and another thread issues a query, the second thread may see invalid data or even disrupt the work of the first thread.
A secondary problem is performance. A connection can only perform a single query at a time, whereas most database systems are capable of concurrent query processing.
The proper way to use database connections in a servlet is to create a connection for each request and disconnect it when the request has been handled. But since a request thread may start and stop executing in many places (due to send/suspend , etc), inserting the code to connect and disconnect at the proper places can be challenging and messy.
A better solution is to use a virtual connection, which automatically creates a request-specific (that is, thread-specific) “actual connection” by need and disconnects it when the request has been handled (that is, when the thread terminates). Different request-handling threads using the same virtual connection are assigned different actual connections, so the requests are properly isolated.
"better-servlet.rkt"
....db-conn....)
This solution preserves the simplicity of the naive solution and fixes the isolation problem but at the cost of creating many short-lived database connections. That cost can be eliminated by using a connection pool:
"best-servlet.rkt"
....db-conn....)
By using a virtual connection backed by a connection pool, a servlet can achieve simplicity, isolation, and performance all at the same time.