I'm trying to find the best way to format an sql query string. When I'm debugging my application I'd like to log to file all the sql query strings, and it is important that the string is properly formated.
Option 1
def myquery():
sql = "select field1, field2, field3, field4 from table where condition1=1 and condition2=2"
con = mymodule.get_connection()
...
- This is good for printing the sql string.
- It is not a good solution if the string is long and not fits the standard width of 80 characters.
Option 2
def query():
sql = """
select field1, field2, field3, field4
from table
where condition1=1
and condition2=2"""
con = mymodule.get_connection()
...
Here the code is clear but when you print the sql query string you get all these annoying white spaces.
u'\nselect field1, field2, field3, field4\n_____from table\n____where condition1=1 \n_____and condition2=2'
Note: I have replaced white spaces with underscore _, because they are trimmed by the editor
Option 3
def query():
sql = """select field1, field2, field3, field4
from table
where condition1=1
and condition2=2"""
con = mymodule.get_connection()
...
- I don't like this option because it breaks the clearness of the well tabulated code.
Option 4
def query():
sql = "select field1, field2, field3, field4 " \
"from table " \
"where condition1=1 " \
"and condition2=2 "
con = mymodule.get_connection()
...
- I don't like this option because all the extra typing in each line and is difficult to edit the query also.
For me the best solution would be Option 2 but I don't like the extra whitespaces when I print the sql string.
Do you know of any other options?
16 Answers 16
Security disclaimer: DO NOT use any kind of string interpolation in SQL queries if the interpolated values can be manipulated by any attacker. See e.g. this question for an example of why this is dangerous, leading directly to SQL injection vulnerabilities. Use parametrized queries instead, which all database engines support. The only reason why the below answer is safe is that the interpolated strings are literals defined inside the same function (they are not related to any kind of user input).
Sorry for posting to such an old thread -- but as someone who also shares a passion for pythonic 'best', I thought I'd share our solution.
The solution is to build SQL statements using python's String Literal Concatenation (http://docs.python.org/), which could be qualified a somewhere between Option 2 and Option 4
Code Sample:
sql = ("SELECT field1, field2, field3, field4 "
"FROM table "
"WHERE condition1=1 "
"AND condition2=2;")
Works as well with f-strings:
fields = "field1, field2, field3, field4"
table = "table"
conditions = "condition1=1 AND condition2=2"
sql = (f"SELECT {fields} "
f"FROM {table} "
f"WHERE {conditions};")
Pros:
- It retains the pythonic 'well tabulated' format, but does not add extraneous space characters (which pollutes logging).
- It avoids the backslash continuation ugliness of Option 4, which makes it difficult to add statements (not to mention white-space blindness).
- And further, it's really simple to expand the statement in VIM (just position the cursor to the insert point, and press SHIFT-O to open a new line).
6 Comments
""" and use textwrap.dedent() before outputting"sql query" to avoid messing with SQL strings (which use single quotes as standard)?You've obviously considered lots of ways to write the SQL such that it prints out okay, but how about changing the 'print' statement you use for debug logging, rather than writing your SQL in ways you don't like? Using your favourite option above, how about a logging function such as this:
def debugLogSQL(sql):
print ' '.join([line.strip() for line in sql.splitlines()]).strip()
sql = """
select field1, field2, field3, field4
from table"""
if debug:
debugLogSQL(sql)
This would also make it trivial to add additional logic to split the logged string across multiple lines if the line is longer than your desired length.
Comments
Cleanest way I have come across is inspired by the sql style guide.
sql = """
SELECT field1, field2, field3, field4
FROM table
WHERE condition1 = 1
AND condition2 = 2;
"""
Essentially, the keywords that begin a clause should be right-aligned and the field names etc, should be left aligned. This looks very neat and is easier to debug as well.
1 Comment
Using 'sqlparse' library we can format the sqls.
>>> import sqlparse
>>> raw = 'select * from foo; select * from bar;'
>>> print(sqlparse.format(raw, reindent=True, keyword_case='upper'))
SELECT *
FROM foo;
SELECT *
FROM bar;
Comments
You can use inspect.cleandoc to nicely format your printed SQL statement.
This works very well with your option 2.
Note: the print("-"*40) is only to demonstrate the superflous blank lines if you do not use cleandoc.
from inspect import cleandoc
def query():
sql = """
select field1, field2, field3, field4
from table
where condition1=1
and condition2=2
"""
print("-"*40)
print(sql)
print("-"*40)
print(cleandoc(sql))
print("-"*40)
query()
Output:
----------------------------------------
select field1, field2, field3, field4
from table
where condition1=1
and condition2=2
----------------------------------------
select field1, field2, field3, field4
from table
where condition1=1
and condition2=2
----------------------------------------
From the docs:
inspect.cleandoc(doc)
Clean up indentation from docstrings that are indented to line up with blocks of code.
All leading whitespace is removed from the first line. Any leading whitespace that can be uniformly removed from the second line onwards is removed. Empty lines at the beginning and end are subsequently removed. Also, all tabs are expanded to spaces.
Comments
sql = ("select field1, field2, field3, field4 "
"from table "
"where condition1={} "
"and condition2={}").format(1, 2)
Output: 'select field1, field2, field3, field4 from table
where condition1=1 and condition2=2'
if the value of condition should be a string, you can do like this:
sql = ("select field1, field2, field3, field4 "
"from table "
"where condition1='{0}' "
"and condition2='{1}'").format('2016-10-12', '2017-10-12')
Output: "select field1, field2, field3, field4 from table where
condition1='2016-10-12' and condition2='2017-10-12'"
3 Comments
format() with SQL strings, it's a major code smell.where condition1=:field1 and then passing in the values as parameters. If you are using .format(), there's going to be a way to pop a ';DROP TABLE Users into your SQL. Have a look PEP-249 for how to use parameters correctly. python.org/dev/peps/pep-0249/#paramstyle This is slightly modified version of @aandis answer. When it comes to raw string, prefix 'r' character before the string. For example:
sql = r"""
SELECT field1, field2, field3, field4
FROM table
WHERE condition1 = 1
AND condition2 = 2;
"""
This is recommended when your query has any special character like '\' which requires escaping and lint tools like flake8 reports it as error.
Comments
The Google style guide: https://google.github.io/styleguide/pyguide#310-strings
Multi-line strings do not flow with the indentation of the rest of the program. If you need to avoid embedding extra space in the string, use either concatenated single-line strings or a multi-line string with
textwrap.dedent()to remove the initial space on each line:Yes: import textwrap long_string = textwrap.dedent("""\ This is also fine, because textwrap.dedent() will collapse common leading spaces in each line.""")
Strings can be surrounded in a pair of matching triple-quotes: """ or '''.
End of lines do not need to be escaped when using triple-quotes, but they will be included in the string.
It is possible to prevent the end of line character by adding a \ at the end of the line.
The following uses one escape to avoid an unwanted initial blank line.
example = """\
SELECT FROM"""
So option 2 modified:
import textwrap
def query():
sql = textwrap.dedent("""\
SELECT field1, field2, field3, field4
FROM table
WHERE condition1=1
AND condition2=2""")
con = mymodule.get_connection()
...
The repr(sql):
'SELECT field1, field2, field3, field4\nFROM table\nWHERE condition1=1\nAND condition2=2'
Comments
The answer to the question is: you don't use string formatting to form queries. This is open to SQL Injection. This is still ranked by OWASP as one of the top 10 sources of vulnerabilities in websites globally. It even has a famous XKCD:
The correct way to supply values to your query is parameterization. There is a Database API specification for this. Different dialects support different styles from those options, but a simple illustration for a generic cursor object:
# Add placeholders to the query
query = """
SELECT
field_1,
field_2,
field_3,
field_4
FROM
my_table
WHERE
condition_1 = :cond_1
AND condition_2 = :cond_2
"""
# Create a dictionary of the parameters you want to pass to the engine
params = {"cond_1": 1, "cond_2": 2}
# Now evecute the query on your cursor
cursor.execute(query, params)
As for the long queries over 80 chars, it's not clear to me why you don't just format it by hand as I have above. Perhaps you were not aware that triple quotes allows for uninterrupted multi-line strings so there is nothing forcing you to have it on a single line.
Typical argument
But roganjosh, I don't have untrusted inputs, so I'll just use string formatting
This argument holds no water. All of the escaping you need is handled by the placeholders so no need for r strings or other contortions to escape your input. I have never seen a single instance of string formatting that ends up being easier than doing it the correct way from the start.
Table names
This is where it gets a bit more tricky. Technically, parameterization will not work on table names, only data parameters. In the case of psycopg2 you can use the sql module to make table names dynamic and also sanitized but I'm not sure whether all other dialects support such features. In this case it is acceptable to use string formatting to add the table name but, if the code takes any kind of user input, you should compare it to an approved list. This could be a simple hard-coded list or you could generate a list of table names from a schema report from the DB.
5 Comments
mogrify method implemented by psycopg* and pymysql: it will return a string with escaped values interpolated, without executing the query.If your application uses more than one SQL query, I would also add one more option: templating with Jinja2.
It helps to keep my code clean from SQL strings, but would be an overkill if your application only uses a simple SQL query.
In Python:
import jinja2
jinja_env = jinja2.Environment(loader=jinja2.FileSystemLoader("templates_folder_path/"), trim_blocks=True, lstrip_blocks=True)
template = jinja_env.get_template("sql_template.jinja2")
config = {"cols_keep": ["col1", "col2", "col3"], "from": "datasource", "where": {"equal_to": {'col1': 'value1', 'col2': 'value2'}}
query = template.render(config=config)
The config is a dictionary, that could be derived from a file, e.g. YAML, JSON, etc.
In sql_template.jinja2
SELECT
{% for col in config.cols_keep -%}
{{ col }}
{%- if not loop.last -%}
,
{% endif %}
{% endfor +%}
FROM {{ config.from }}
{# EQUAL #}
{% if config.where.equal_to %}
WHERE
{% for col, value in config.where.equal_to.items() %}
{% if value is string %}
{{ col }} = "{{ value }}"
{% else %}
{{ col }} = {{ value }}
{% endif %}
{% if not loop.last %} AND {% endif %}
{% endfor -%}
{% endif -%}
For formatting, if you find difficult with Jinja2, you can use the sqlparse library. However, you probably can format only with Jinja2 if you keep tweaking the whitespace control. That said, here is a sqlparse example:
import sqlparse
query_parsed = sqlparse.format(query, reindent=True, keyword_case='upper')
References:
- Jinja2 Whitespace Control: https://jinja.palletsprojects.com/en/3.0.x/templates/#whitespace-control
- https://towardsdatascience.com/a-simple-approach-to-templated-sql-queries-in-python-adc4f0dc511
- https://towardsdatascience.com/jinja-sql-️-7e4dff8d8778
- https://medium.com/analytics-and-data/jinja-the-sql-way-of-the-ninja-9a64fc815564
- https://pypi.org/project/sqlparse/
There is also another library that could help - JinjaSQL:
Hope that helps, Thanks!
Comments
textwrap.dedent doesn't handle nested indents nor end of line. This worked:
re.sub(r' ?\n', ' ', re.sub(r' {2,}', '', f'''
select
x,
y
from zzz
where
a = 'a'
and (
b = 'b'
or c = 'c'
)
group by
d,
e;
''')).strip()
Removes all end of line characters and any occurrence of whitespace of 2 or more. Result:
"select x, y from zzz where a = 'a' and ( b = 'b' or c = 'c' ) group by d, e;"
If you're gonna use it a lot, better compile the regexp though
Comments
you could put the field names into an array "fields", and then:
sql = 'select %s from table where condition1=1 and condition2=2' % (
', '.join(fields))
3 Comments
I would suggest sticking to option 2 (I'm always using it for queries any more complex than SELECT * FROM table) and if you want to print it in a nice way you may always use a separate module.
Comments
For short queries that can fit on one or two lines, I use the string literal solution in the top-voted solution above. For longer queries, I break them out to .sql files. I then use a wrapper function to load the file and execute the script, something like:
script_cache = {}
def execute_script(cursor,script,*args,**kwargs):
if not script in script_cache:
with open(script,'r') as s:
script_cache[script] = s
return cursor.execute(script_cache[script],*args,**kwargs)
Of course this often lives inside a class so I don't usually have to pass cursor explicitly. I also generally use codecs.open(), but this gets the general idea across. Then SQL scripts are completely self-contained in their own files with their own syntax highlighting.
Comments
To avoid formatting entirely, I think a great solution is to use procedures.
Calling a procedure gives you the result of whatever query you want to put in this procedure. You can actually process multiple queries within a procedure. The call will just return the last query that was called.
MYSQL
DROP PROCEDURE IF EXISTS example;
DELIMITER //
CREATE PROCEDURE example()
BEGIN
SELECT 2+222+2222+222+222+2222+2222 AS this_is_a_really_long_string_test;
END //
DELIMITER;
#calling the procedure gives you the result of whatever query you want to put in this procedure. You can actually process multiple queries within a procedure. The call just returns the last query result
call example;
Python
sql =('call example;')
Comments
sql = """\
select field1, field2, field3, field4
from table
where condition1=1
and condition2=2
"""
[edit in responese to comment]
Having an SQL string inside a method does NOT mean that you have to "tabulate" it:
>>> class Foo:
... def fubar(self):
... sql = """\
... select *
... from frobozz
... where zorkmids > 10
... ;"""
... print sql
...
>>> Foo().fubar()
select *
from frobozz
where zorkmids > 10
;
>>>
6 Comments
select. My answer doesn't have leading spaces. What lead you to form the opinion that they are the same?
textwrap.dedentfor output to sql file or debug logging.