I need to perform SQL query which will get products from database based on few dimensions which I will pass in URL.
import itertools
sql_where = ''
dimensions = ['a','b', 'c', 'd', 'e', 'f', 'f2', 'g']
for dimension in dimensions:
for onetwo, direction in zip( range(1, 3), itertools.cycle('><') ):
globals()[dimension+str(onetwo)] = str( request.GET.get( dimension+str(onetwo) ) )
sql_where += " AND ( `{dimension}` {direction}= '{get_dimension}' OR '{get_dimension}' = ('None' OR '') )".format(dimension=dimension, direction=direction, get_dimension=globals()[dimension+str(onetwo)])
sql = """SELECT * FROM t_wishbone_dimensions WHERE 1=1""" + sql_where
print sql
Example output for /search?a1=34&a2=37&c1=50&c2=75
SELECT * FROM t_wishbone_dimensions
WHERE 1=1
AND ( `a` >= '34' OR '34' = ('None' OR '') )
AND ( `a` <= '37' OR '37' = ('None' OR '') )
AND ( `b` >= 'None' OR 'None' = ('None' OR '') )
AND ( `b` <= 'None' OR 'None' = ('None' OR '') )
AND ( `c` >= '50' OR '50' = ('None' OR '') )
AND ( `c` <= '75' OR '75' = ('None' OR '') )
AND ( `d` >= 'None' OR 'None' = ('None' OR '') )
AND ( `d` <= 'None' OR 'None' = ('None' OR '') )
AND ( `e` >= 'None' OR 'None' = ('None' OR '') )
AND ( `e` <= 'None' OR 'None' = ('None' OR '') )
AND ( `f` >= 'None' OR 'None' = ('None' OR '') )
AND ( `f` <= 'None' OR 'None' = ('None' OR '') )
AND ( `f2` >= 'None' OR 'None' = ('None' OR '') )
AND ( `f2` <= 'None' OR 'None' = ('None' OR '') )
AND ( `g` >= 'None' OR 'None' = ('None' OR '') )
AND ( `g` <= 'None' OR 'None' = ('None' OR '') )
What do you think about my solution?
2 Answers 2
Since I didn't want to setup a webserver for this, I added a bare class, which should behave in the same way as your request.GET
:
class request:
GET = dict(a1=34, a2=37, c1=50, c2=75)
You should avoid setting keys in globals()
, it clutters the namespace too much. Instead just use the default
keyword of the get
method of a dictionary, which will give you the specified value if the key is not present.
get_dimension = request.GET.get(dim, 'None')
You can make your string formatting a bit easier to understand if you split it up a bit. I made the string itself a constant (avoiding having to allocate it multiple times) and used positional arguments, instead of keywords (which, admittedly, hurts the readability slightly).
Instead of doing a string addition every loop, it is better to append to a list and do one join
at the end, because string addition means creating an intermediate new string, since strings are immutable.
sql = "SELECT * FROM t_wishbone_dimensions WHERE 1=1" + "\n".join(sql_where)
I would also use itertools.product
to replace your double loop, so something like:
directions = {'1': '>', '2': '<'}
dimensions = ['a', 'b', 'c', 'd', 'e', 'f', 'f2', 'g']
for dimension, i in itertools.product(dimensions, ('1', '2')):
direction = directions[i]
...
To improve readability you should separate the AND
s with a newline each (like you did in your example output).
Lastly, python has an official style guide, PEP8, which recommends having a blank after a comma in an argument list (in dimensions
after 'a',
one was missing), as well as not adding unnecessary whitespace (so usually no str( something )
, but just str(something)
).
Final code:
import itertools
class request:
# should handle the same same as your `request.GET`
GET = dict(a1=34, a2=37, c1=50, c2=75)
dimensions = ['a', 'b', 'c', 'd', 'e', 'f', 'f2', 'g']
DIRS = {'1': '>', '2': '<'}
AND_STR = "AND ( `{0}` {1}= '{2}' OR '{2}' = ('None' OR '') )"
sql_where = ['']
for dimension, i in itertools.product(dimensions, ('1', '2')):
val = request.GET.get(dimension+i, 'None')
sql_where.append(AND_STR.format(dimension, DIRS[i], val))
sql = "SELECT * FROM t_wishbone_dimensions WHERE 1=1" + "\n".join(sql_where)
print sql
Output:
SELECT * FROM t_wishbone_dimensions WHERE 1=1
AND ( `a` >= '34' OR '34' = ('None' OR '') )
AND ( `a` <= '37' OR '37' = ('None' OR '') )
AND ( `b` >= 'None' OR 'None' = ('None' OR '') )
AND ( `b` <= 'None' OR 'None' = ('None' OR '') )
AND ( `c` >= '50' OR '50' = ('None' OR '') )
AND ( `c` <= '75' OR '75' = ('None' OR '') )
AND ( `d` >= 'None' OR 'None' = ('None' OR '') )
AND ( `d` <= 'None' OR 'None' = ('None' OR '') )
AND ( `e` >= 'None' OR 'None' = ('None' OR '') )
AND ( `e` <= 'None' OR 'None' = ('None' OR '') )
AND ( `f` >= 'None' OR 'None' = ('None' OR '') )
AND ( `f` <= 'None' OR 'None' = ('None' OR '') )
AND ( `f2` >= 'None' OR 'None' = ('None' OR '') )
AND ( `f2` <= 'None' OR 'None' = ('None' OR '') )
AND ( `g` >= 'None' OR 'None' = ('None' OR '') )
AND ( `g` <= 'None' OR 'None' = ('None' OR '') )
-
\$\begingroup\$ Thank you very much for your involvement! I added unnecessary whitespaces between bracklets because it causes better readability for me. Especially in long lines of code. Even if it is not properly way. Why did you use square brackets in
sql_where
var? itertools.product() is better solution than combine zip() with itertools.cycle() :) \$\endgroup\$user3041764– user30417642016年08月30日 05:58:41 +00:00Commented Aug 30, 2016 at 5:58 -
\$\begingroup\$ I made
sql_where
a list, because string addition is costly, and list appending followed by a singlejoin
not so much. I know why you put the space there, and I sometimes agree. In the end, it is a style guide. But adhering to it as much as possible ensures that other people can easily read your code as well as you can. \$\endgroup\$Graipher– Graipher2016年08月30日 08:54:12 +00:00Commented Aug 30, 2016 at 8:54
The query that ends up being generated is VERY difficult to read/understand and will likely cause you problems in debugging down the line.
I guess I am not understanding why you have a bunch of WHERE clause fragments for parameters/dimensions that are not even pertinent to the filtering operation.
Taking your example URL of:
/search?a1=34&a2=37&c1=50&c2=75
I would think you should strive for a resulting WHERE clause that looks like:
WHERE a BETWEEN '34' AND '37'
AND c BETWEEN '50' AND '75'
Note that you would probably need to work with the parameter values as integers rather than strings as shown in this example if you did not want a lexicographical comparison.
Note that this could also greatly improve the performance of the query itself as you will no longer be requiring the query execution to have to look at all the indexes for all fields even in the cases where the field filters are totally meaningless.
The bottom line here is to write a good query first, then figure out how to generate it dynamically in your code.
To get to this point, I think you need to abandon the approach of iterating through each known dimension/field, concatenating meaningless where clauses in the process and actually work with the parameters passed via GET. Those are what you should be iterating on.
You might also consider using a URL format like
/search?a[]=34&a[]=37&c[]=50&c[]=75
Where you can use the bracket notation to indicate an array of values. This allows for you to handle a list of values of arbitrary length rather than have a series of a*, b*, c*, etc. values that you need to account for throughout your code. Of course, you might need to use getlist()
or similar to get the list of values rather than just the last item from the list.
In the query, you might consider not using SELECT *
. This is a bit of lazy code writing, IMO, and it makes it harder for someone reading the code, who may not have visibility into the database schema itself, to understand what the expected field values are. This can also cause more data being returned than is necessary as in most complex applications, the database tables may have a number of timestamp/audit type of fields that are not really meant to be consumed by the application, so why send this field data to the application if it is not needed?
Finally, I would point out that some of your lines of code are too long, making the code hard to read.
This line for example is particularly bad:
sql_where += " AND ( `{dimension}` {direction}= '{get_dimension}' OR '{get_dimension}' = ('None' OR '') )".format(dimension=dimension, direction=direction, get_dimension=globals()[dimension+str(onetwo)])
Break this up across multiple lines so that it is easier to read and understand.
Update - data validation
You also are not currently validating your input at all. I think if you implement proper input validation, this would go hand in hand with answering your question about how to build SQL query filter based only on those parameters passed.
At the very start of your code, I would iterate through each of the parameters passed in the request and validate that:
- the parameters match your expected available dimensions; and
- that the values of the parameters match whatever data type and value rules you might consider (i.e. integer values, non-zero length strings, etc.)
In doing this, you now know exactly which dimensions you should be applying to your filter - only those that pass this validation.
How you decide to handle cases of unexpected parameters or validation failure is really up to you, but clearly in those cases, you would not want to apply filtering on those dimensions.
-
\$\begingroup\$
WHERE a IN ('34', '37') AND c IN ('50', '75')
buta
is not 34 or 37 but34 <= a <= 37
. When will I work only with parameters passed via GET, what with security then? Should I compare passed parameters with allowed list? I usedSELECT *
because that table include only dimensions columns, so I need get everything. But I know what you mean. \$\endgroup\$user3041764– user30417642016年08月30日 17:36:42 +00:00Commented Aug 30, 2016 at 17:36 -
\$\begingroup\$ @user3041764 O I didnt get the meaning of the filtering fields a1, a2 as being bounds - more meaningful parameter names and field names would REALY help in this case. It just goes to show how unintelligible the current query is. Yes. You should validate all passed parameters in this and every other application you write. This is user input and should never be the basis for forming a query without such validation as you currently have SQL injection vulnerability. You could potentially keep your iteration the same, but perhaps just not modify the where clause if parameter not present. \$\endgroup\$Mike Brant– Mike Brant2016年08月30日 18:02:14 +00:00Commented Aug 30, 2016 at 18:02
-
\$\begingroup\$ @user3041764 I modified my answer to show the use of a "BETWEEN" clause instead of IN based on your comment. This actually introduces the problem of needing to work with these values perhaps as integers instead of strings, as you are likely not wanting to have cases where '11' evaluates as being between '1' and '2' as would be the case if you were searching lexicographically. \$\endgroup\$Mike Brant– Mike Brant2016年08月30日 18:06:42 +00:00Commented Aug 30, 2016 at 18:06
-
\$\begingroup\$ I changet
GET
method toPOST
and I do$('form').serialize()
and it pass all inputs to request, even if they are empty. I tried$("form :input[value!='']").serialize()
but with no result. Have you idea how can I filter and build query based only on passed and not empty paremeters? \$\endgroup\$user3041764– user30417642016年09月01日 11:51:10 +00:00Commented Sep 1, 2016 at 11:51 -
1\$\begingroup\$ @user3041764 I guess I didn't understand the question as there was no front end code as part of this review at all. If you have a form that is POSTing to this python script - you have two simply choices. Either filter about empty form fields in javascript before posting or in python simply ignore fields with empty values as part of the validation process. \$\endgroup\$Mike Brant– Mike Brant2016年09月01日 17:12:11 +00:00Commented Sep 1, 2016 at 17:12
'34' = ('None' OR '')
makes no sense. What are you really trying to accomplish? \$\endgroup\$OR `a` == 'None' OR `a` == ''
? \$\endgroup\$"AND ( {0} {1}= '{2}' )"
and I eliminate the dimensions, which are not given, by the condition:val = request.GET.get(dimension+i, None) if val is not None: sql_where.append(AND_STR.format(dimension, DIRS[int(i)], val))
\$\endgroup\$