0

I'm building a Python function that creates SQL statements as a string that will later be passed to the database to extract data using a pyodbc connection.

I have trouble finding the right way to handle different combinations of filter arguments in both SELECT and WHERE statement without using multiple if/else statements and without breaking the syntax of the SQL.

Here is a minimal reproducible example:

sqldata.py

import pandas as pd
class SqlData(object):
 def __init__(self, customers=None, last_version=None, attributes=None, age=None):
 self.customers = customers
 self.last_version = last_version
 self.attributes = attributes
 self.default_columns = ["customer_id", "date", "value"]
 self.age = age
 def buildSQL(self):
 default_columns = ", ".join(self.default_columns)
 if self.attributes == None:
 attributes = " "
 else:
 attributes = " ," + ", ".join(self.attributes) + " "
 if self.customers == None:
 customers = " "
 else:
 customers = "customer_id in ('" + "','".join(self.customers) + "')"
 if self.last_version == None:
 last_version = " "
 else:
 last_version = " AND last_version = 'Y'"
 if self.age == None:
 age = " "
 else:
 age = " AND age > " + self.age
 self.sql = "SELECT " + default_columns + attributes + \
 "FROM myTable " + \
 "WHERE " + customers + last_version + age
 return self.sql

__init__.py

from sqldata import SqlData

__main__.py

from sqldata import SqlData
data = SqlData(customers = ["Arthur", "Ben"],
 last_version = True, 
 attributes = ["name", "age"],
 age = "14")
print(data.buildSQL())

If all arguments are given it works just fine and I get the following:

SELECT customer_id, date, value ,name, age 
FROM myTable 
WHERE customer_id in ('Arthur','Ben') AND last_version = 'Y' AND age > 14

But if customers is not given, it makes no more sense:

SELECT customer_id, date, value ,name, age 
FROM myTable 
WHERE AND last_version = 'Y' AND age > 14

Ideally, when customers is not given, I would like to have the following:

SELECT customer_id, date, value ,name, age 
FROM myTable 
WHERE last_version = 'Y' AND age > 14

All in all, I would like to know the following:

  1. Is it possible to "bypass" the use of those multiple if/else statements ? There must be a better way.
  2. How to handle the case where the customers variable is not given, hence breaking the syntax of the SQL statement?
mkrieger1
24.2k7 gold badges68 silver badges84 bronze badges
asked Jul 21, 2021 at 10:31
3
  • You are already testing if customers are given. Why don't you use the same test to determine whether you should output AND or not? Commented Jul 21, 2021 at 10:36
  • you should consider even more possibilities. like what if customers, last_version, and age are all None? then you shouldn't have a WHERE clause at all. I don't really have an answer for you, but my suggestion is you should separate your code to smaller functions, that deal with the different clauses of the query separately, then create a function that combines them in the correct manner Commented Jul 21, 2021 at 10:45
  • Also, for the if/else statements, take a look at this answer Commented Jul 21, 2021 at 10:48

1 Answer 1

2

I have used a dictionary to represent each column. I then use that dictionary to build the list of columns and conditions:

class SqlData(object):
 def __init__(self, customers=None, last_version=None, attributes=None, age=None):
 self.columns = {}
 self.columns["customer_id"] = customers
 self.columns["last_version"] = 'Y' if last_version else None
 self.columns["age"] = int(age) if age else None
 self.default_columns = ["customer_id", "date", "value"]
 self.attributes = attributes
 def buildSQL(self):
 # Build a list of column names
 if self.attributes:
 default_columns = ",".join(self.default_columns + self.attributes)
 else:
 default_columns = ",".join(self.default_columns)
 # Build a list of columns
 opts = []
 for k,v in self.columns.items():
 if self.columns[k]:
 if isinstance(self.columns[k], int):
 opts.append(f"{k}>{v}")
 elif isinstance(self.columns[k], list):
 # Put words in single quotes
 lst = ','.join([f"'{i}'" for i in v])
 opts.append(f"{k} in ({lst})")
 else:
 opts.append(f"{k}='{v}'")
 if 0 == len(opts):
 return f"SELECT {default_columns} FROM myTable"
 else:
 return f"SELECT {default_columns} FROM myTable WHERE {' AND '.join(opts)}"
test = SqlData(customers = ["Arthur", "Ben"],
 last_version = True, 
 attributes = ["name", "age"],
 age = 14)
print(test.buildSQL())
test = SqlData(age = 14)
print(test.buildSQL())
test = SqlData()
print(test.buildSQL())

Output:

SELECT customer_id,date,value,name,age FROM myTable WHERE customer_id in ('Arthur','Ben') AND last_version='Y' AND age>14
SELECT customer_id,date,value FROM myTable WHERE age>14
SELECT customer_id,date,value FROM myTable

Note: depending on where the input comes from, doing things this way opens you up to sql injection (see: Little Bobby Tables). Preferably you would use prepared statements but I don't know if Python (or pyodbc) supports that (Does Python support MySQL prepared statements?).

answered Jul 21, 2021 at 11:22
Sign up to request clarification or add additional context in comments.

1 Comment

Clear. The use of dictionary really makes it more readable. Many thanks.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.