Motivated by the first commentor's suggestion on my MSAccess question: Python SQL insert to MSAccess with VBScript
I am moving to use SQLite databases for a python application and I created this query builder class to help implement the SQL logic. One constraint is having to use Python 3.6 so that is why I used .format()
instead of the more modern F-string for string formatting. I'm looking for an overall review of my code quality, etc. Any suggested improvements to the code...
import sqlite3
from typing import Union, List, Optional
import logging
class QueryBuilder:
'''
Query Builder for SQLite databases
Usage:
query_builder = QueryBuilder("database.db")
query_builder.insert("users", {"username": "John", "password": "johnspassword"})
query_builder.update("users", "password = ?", "username = ?")
query_builder.delete("users", {"username": "John"})
query_builder.select("users", ["username", "password"], "username = ?")
'''
def __init__(self, db_name: str, log_file_path: str = 'querybuilder.log'):
'''
Establish database connection and cursor creation and configure logging.
Arguments:
- db_name: The name of the SQLite database
Usage:
query_builder = QueryBuilder("database.db")
'''
logging.basicConfig(filename=log_file_path, level=logging.INFO,
format='%(asctime)s:%(levelname)s:%(message)s')
self.conn = sqlite3.connect(db_name, isolation_level=None)
self.cursor = self.conn.cursor()
def __execute_query(self, query: str, params: Union[tuple, list] = None):
'''
Execute SQL query
Arguments:
- query: SQL query as a string
- params: Parameters for the SQL query
'''
try:
if params:
self.cursor.execute(query, params)
else:
self.cursor.execute(query)
return self.cursor.fetchall()
except sqlite3.Error as error:
logging.error("Error while executing SQLite Script: {}".format(error))
raise
def select(self, table: str, fields: List[str] = ['*'], where: Optional[str] = None,
join: Optional[str] = None, group_by: Optional[str] = None,
order_by: Optional[str] = None):
'''
Select records from a table with various clauses
Arguments:
- table: Table to select from
- fields: List of fields to be selected
- where: WHERE clause
- join: JOIN clause
- group_by: GROUP BY clause
- order_by: ORDER BY clause
Usage:
result = query_builder.select("users", ["username", "password"], "username = ?", "LEFT JOIN orders ON users.id = orders.user_id", "username", "orders.id DESC")
'''
selected_fields = ', '.join(fields)
base_query = "SELECT {} FROM {}".format(selected_fields, table)
if join:
base_query += " {}".format(join)
if where:
base_query += " WHERE {}".format(where)
if group_by:
base_query += " GROUP BY {}".format(group_by)
if order_by:
base_query += " ORDER BY {}".format(order_by)
return self.__execute_query(base_query)
def insert(self, table: str, data: dict):
'''
Insert a new record into a table
Arguments:
- table: Table to insert into
- data: Dictionary of columns and values to insert
Usage:
query_builder.insert("users", {"username": "John", "password": "johnspassword"})
This will execute: INSERT INTO users (username,password) VALUES (?,?) with parameters ('John', 'johnspassword')
'''
fields = ', '.join(data.keys())
placeholder = ', '.join('?' * len(data))
query = "INSERT INTO {} ({}) VALUES ({})".format(table, fields, placeholder)
return self.__execute_query(query, list(data.values()))
def update(self, table: str, data: Optional[dict] = None, where: Optional[dict] = None):
'''
Update an existing record in a table
Arguments:
- table: Table to update
- data: Dictionary of new data for the record
- where: Dictionary for WHERE clause to select the record
Usage:
query_builder.update("users", {"password": "new_password"}, {"username": "John"})
This will execute: UPDATE users SET password = ? WHERE username = ? with parameters ('new_password', 'John')
'''
if data:
set_query = ', '.join(["{} = ?".format(k) for k in data.keys()])
query = "UPDATE {} SET {}".format(table, set_query)
else:
print("Error: No data provided to update.")
return
if where:
where_query = ' AND '.join(["{} = ?".format(k) for k in where.keys()])
query += " WHERE {}".format(where_query)
params = tuple(list(data.values()) + list(where.values() if where else []))
return self.__execute_query(query, params)
def delete(self, table: str, where: Optional[dict] = None):
'''
Delete an existing record from a table
Arguments:
- table: Table to delete from
- where: WHERE clause to select the record
Usage:
query_builder.delete("users", {"username": "John"})
This will execute: DELETE FROM users WHERE username = ? with parameter ('John')
'''
query = "DELETE FROM {}".format(table)
if where is not None:
condition = ' AND '.join(["{} = ?".format(key) for key in where.keys()])
query += " WHERE {}".format(condition)
params = tuple(where.values())
self.__execute_query(query, params)
return self.__execute_query(query, params)
def close_connection(self):
'''
Close the connection to the database
Usage:
query_builder.close_connection()
'''
self.conn.close()
def __del__(self):
'''
Class destructor. Closes the connection to the database
'''
self.conn.close()
2 Answers 2
It's good that there is some logging built-in, but this is not the right way to do it. It is inflexible: the output goes to a file (console is useful for debugging) and the format cannot even be customized (by way of optional arguments).
This is all simple really. Do logging in your code, but let the caller decide on destination and formatting. After all this class, like any other class, is meant to be used by an other piece of code.
It is enough to do this:
import logging
class QueryBuilder:
def __init__(self, db_name: str):
self.logger = logging.getLogger(__name__)
And decide in the upstream routines what you want to do with the log messages emitted by your libs, or any other third-party libs doing logging. Imagine if they were all doing their own thing, and logging to different files for example. That negates the benefits of logging, which is to consolidate messages from multiple sources. At times, you will want to mute them. At other times you will want to decrease or increase verbosity level.
That being said, the logging is underutilized at the moment. In update
, instead of:
print("Error: No data provided to update.")
return
you could do logger.exception
and even raise a ValueError or a custom exception. At the very least you should return something meaningful, even return False to signal a problem. You have passed on this opportunity. There is no easy way for the caller routine to find out if the call went through as expected. This is problematic.
As for the insert
function, what happens if I pass an empty dict? I haven't tested but I expect the code to crash. You don't seem to be testing the data much in depth here.
close_connection
is not in use at the moment and essentially repeats what __del__
already does.
This implementation looks too basic to be useful at the moment. Of course it's light when compared to SQL Alchemy but is still very limited and handles only SQLite.
There is no support for bulk operations or transactions either. If an insert/update/delete operation crashes in the middle, then you should roll back to preserve data integrity.
There could be a limited use case for bulk operations, provided that you make the code more robust. The code is not bad, but does this class help you very much in your day to day development?
This could work as long as the queries are very basic. Say I want to do a select with a join of two tables or more, I'll have to examine your code to figure out the proper syntax to use. If that is even possible. If I need to do a group by, then I will probably go back to SQL Alchemy or plain SQL.
version
use Python 3.6
I'm still not believing the constraint,
given that interpreter should come from a venv
rather than from /usr/bin
, but fine, let's roll with it.
from typing import Union, List, Optional
We would typically use list[int]
rather than List[int]
,
and float | str | None
rather than those.
Adding this line just after the imports would help
set expectations and explain our intent:
assert sys.version_info[:2] >= (3, 6)
credentials
Thank you for using bind parameters.
query_builder.update("users", "password = ?", "username = ?")
Storing a plaintext password in the table would be Bad.
Prefer to store an
argon2id
hash
.
Kudos on adding those optional type annotations.
name mangling
I don't understand this signature:
def __execute_query(self, query ...
Typically mangling is not desired -- recommend
you call it def execute_query
.
If it is desired, a docstring or comment should explain the reason mangling is beneficial here. Usually that would involve setting out some rules for maintenance engineers to follow, perhaps restrictions on how they interact with inheritance.
Maybe what you really wanted was just a single _
underscore
to show it's private?
bind params
if params:
self.cursor.execute(query, params)
Good, no injection!
We know that params
is a list
at this point.
If you're able to support it being a dict
,
consider preferring that, or using it exclusively.
It's a debugging concern.
By the time there's five or six ?
question mark place holders
in there, it can become slightly challenging for a maintenance
engineer to insert a seventh parameter in the middle
and get everything to line up properly.
Sometimes swapped parameters may "look right" and be silently ignored.
In contrast, a line of source code like "age": age,
is hard to get wrong, we won't accidentally mistake it for weight
.
Hmmm, your def insert()
actually goes in the opposite direction,
converting a dict
by extracting its .values()
, interesting.
logging
except sqlite3.Error as error:
logging.error("Error while executing SQLite Script: {}".format(error))
raise
Excellent, we see a proper stack trace from raise
,
with a nice stack trace.
Someone reading the log won't see that. And there might be multiple candidate call sites. Consider separately logging the query, or even the params, for the benefit of a maintenance engineer trying to diagnose a hard-to-reproduce issue.
Modern interpreters make it easy to inspect the stack trace and reveal the call site for the benefit of a logger. I imagine that even back in 3.6 days it wasn't too difficult to obtain such details -- consider revealing the caller location in the log.
str vs None
Clearly this works:
def select(self, table: str, fields: List[str] = ['*'], where: Optional[str] = None,
join: Optional[str] = None, group_by: Optional[str] = None,
order_by: Optional[str] = None):
IDK, for Optional[str]
maybe just use str
and default it to the empty string?
Also, there's no trouble with unconditionally
tacking on a default empty-string to the evolving query.
mutable default
For that fields
parameter,
you might prefer that it defaults to None
.
And then we can assign ... = ', '.join(fields or ['*'])
to accomplish the defaulting.
Creating the list
just once at class definition time,
rather than N times for N calls, can yield surprising behavior.
SO supplies
answers
describing the effect.
It's the sort of thing we try to habitually avoid,
even if it causes no trouble in this code,
out of kindness to future maintenance engineers.
table alias
It's pretty common for queries to look like this:
SELECT * FROM my_wonderful_dataset mwd ...
.
Especially when JOINing several tables.
As it stands, I anticipate callers will slightly abuse
your API by sneaking the occasional alias into what
is supposed to be a table
.
Maybe use some vague terminology like table_clause
to bless such usage?
The join
parameter is a bit looser, but maybe
call it join_clause
?
Similarly I would expect caller to sometimes tack on
HAVING to a group_by
.
Consider adding a limit
parameter.
injection
base_query += " WHERE {}".format(where)
...
return self.__execute_query(base_query)
I don't see any way for Little Bobby Tables' mom or any other caller to offer bind parameters and have them passed along to the helper.
Your mention of , "username = ?",
seemed promising,
so I'm sure it's just an oversight.
queries don't have side effects
def insert(self, ... ): ...
query = "INSERT INTO ...
Consider calling that dml
, command
, or just sql
.
Similarly for update()
and delete()
.
WHERE clause
Defining different Public APIs for "same thing" is slightly odd.
update()
handles it differently from select()
:
if where:
where_query = ' AND '.join(["{} = ?".format(k) for k in where.keys()])
query += " WHERE {}".format(where_query)
Let's think about how caller would specify a blind query. That's right, pass in a condition that is true for every row:
result = query_builder.select("users", ["username"], "TRUE")
Let's apply that insight to the update()
WHERE clause.
We could unconditionally start with WHERE true
.
Then tack on zero or more AND conjuncts.
Any way you slice it we obtain valid sql syntax.
No biggie, it's just a little more convenient.
High level item is to offer consistent behavior to caller across the various methods that support WHERE clauses.
You might want to write a single, tiny helper for this,
and have {select
, update
, delete
} call it.
exceptions
print("Error: No data provided to update.")
Please raise
a ValueError here.
Caller messed up, and should be notified.
A silent return
makes it too easy for a caller bug
to remain in the source repo without being repaired.
Also, why do we have , data: Optional[dict] = None,
in the signature?
We intend it should be mandatory, right?
And then the check is looking for an empty dict
.
repeat delete
I don't understand this.
def delete(self, ... ):
...
self.__execute_query(query, params)
return self.__execute_query(query, params)
That first one snuck in by accident, right?
closing
I guess this is pretty OK.
def close_connection(self):
...
self.conn.close()
def __del__(self):
'''
Class destructor. Closes the connection to the database
'''
self.conn.close()
I have my doubts about that __del__
, but maybe it's needed.
Let's take a step back and think about design of Public API.
First let's think about dynamic allocation in C,
which gave us the infamous {malloc
, free
} API.
Now of course, it's a wonderful API.
But it turns out humans are incapable of matching up
a bunch of free()'s with the corresponding malloc()'s.
Thus were born various OO languages to clean up the
garbage automatically.
Returning to your query builder, we want to give caller
convenient access in a way that they're unlikely to mess up.
Soooo, maybe we don't even offer a public close() method?
That is, maybe we want to strongly encourage caller
to always use a with
context handler when obtaining
a DB connection.
Then there is simply nothing to think about later,
it always gets dealt with.
Think about how often you lately have done f = open(...)
and subsequently f.close()
.
Very few times, right?
Because with
is just too convenient.
Same notion here.
BTW, maybe issue a helpful COMMIT just before .close()
?
More generally, caller is going to need to be able
to BEGIN ... COMMIT transactions that are small or large.
Consider only offering a session (transaction) interface.
Possibly you'd care to offer a .rollback()
method,
but wait for the need to arise in calling code
before you implement it.
raw SQL
It's not clear that you already have calling code which exercises all these {select, insert, update, ...} verbs. Consider saving yourself some work by offering raw access to the underlying DB connection. Then wait for a bunch of app code to be authored, with lots of calls into this library. Look at "why was the raw interface needed here?", "what should the library have offered?", in order to guide subsequent library development.
Be sure to write new test functions when you enhance the library. Think of each test function as a "stand in" for the application code which motivated the enhancement. If the tests still show Green bar, then the app code is winning!
elevator pitch
Why use this library?
There are competing ORMs and similar middlewares, including SQL Alchemy. Add a module docstring describing the use case where this code wins.
I suspect that "support for 3.6" may figure into such an argument.
code coverage
When you run your automated
test suite
against this target code,
use pytest --cov --cov-report=term-missing
so the coverage
plugin runs.
It will tell you which of those many if
s
your test suite is not yet exercising.
Also, maybe you want to offer two layers:
- Synthesize a DML statement from args, and return it.
- Synthesize and execute (with bind parameters) some DML.
A test suite could take advantage of the _private
interface
offered by layer 1.
Applications would just call public layer 2.
This evolving codebase benefits from the careful design that clearly went into it. No doubt it will continue to improve. It achieves most of its design goals.
I would be reluctant to support an unmaintained 3.6 setup in a production setting, but if, say, 3.8 were used then I would be willing to delegate or accept maintenance tasks on this code base.