I have done this small piece of code used to search for a given word across several fields of a Django model.
Goal was to have first entries equals to the word, then entries starting by the world and finally just entries containing it.
I would like to have feedback on it :
# -*- coding: utf-8 -*-
"""Module created to search a specific term in several fields of a model"""
from itertools import chain
import operator
from django.db.models import Q
__author__ = 'trnsnt'
def search_for_term(model, term, fields, number=10):
"""
Generic method used to search term on list of fields.
Return first fields equals to term, then fields starting by the term, and at the end fields containing term
:param model: Django model class
:param term: Searched term
:param fields: Fields to look for
:param number: Number of wanted response
:return: List of objects
"""
searcher = ['iexact', 'istartswith', 'icontains']
objects = []
for search in searcher:
curr_filter = reduce(operator.or_, [Q(('%s__%s' % (field, search), term)) for field in fields])
# Need to search for number to handle duplicate objects
objects = uniquify(objects, model.objects.filter(curr_filter)[:number])[:number]
if len(objects) == number:
return objects
return objects
def uniquify(*queryests):
"""Take several queryset and return list of unique object
:param queryests: QuerySets we want to uniquify
"""
seen = set()
return [x for x in chain(*queryests) if not (x in seen or seen.add(x))]
-
\$\begingroup\$ Welcome to Code Review! Is there anything in particular that you're interested in getting feedback or responses on? \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年10月29日 16:10:22 +00:00Commented Oct 29, 2015 at 16:10
-
\$\begingroup\$ Thanks. I worked with Python & Django for several years, but for now I did not have the opportunity to share on what I am doing. I would like to some advises on what to improve ! \$\endgroup\$trnsnt– trnsnt2015年10月29日 16:18:15 +00:00Commented Oct 29, 2015 at 16:18
2 Answers 2
Nice doc-strings. I enjoyed seeing your code. I have just a few comments.
Use built-in functions instead of rolling your own
My guess would be that you could improve the efficiency of uniquify
and also make the code more expressive/easier to digest by returning the list version of a union of all the sets in querysets
rather than rolling your own solution. Something like this should work (I didn't run it):
full_set = set().union(*query_sets)
return list(full_set)
Another option relying on built-in functionality would be to return a set out of the chaining you already do:
return set(chain(*queryests))
In fact coming back to this, I think you should return a set. It doesn't seem that you later need a list since you're just checking membership. I believe checking set
membership is much faster than checking list
membership.
Logic
I apologize if I am misunderstanding your logic, but it seems in search_for_term
that you want to return early when you find at least number
number of results. However, right now you only return early if you find exactly that number of results. Wouldn't you want to modify your early return as:
if len(objects) >= number:
return objects
If not it might be helpful to add a comment explaining the logic of returning when you have exactly a certain number of results.
Little nits
Notice you've got a typo in your parameter name queryests
. I would recommend using snake_case
as in my example above.
Since you're only using operator.or_
, consider importing just that function.
-
1\$\begingroup\$ Thanks for your review, really instructive. To clarify about
len(objects) == number
, it is because previous line I doobjects = uniquify(queryset)[:number]
, solen(objects)
cannot be greater than number (am I wrong ?) \$\endgroup\$trnsnt– trnsnt2015年10月29日 20:19:07 +00:00Commented Oct 29, 2015 at 20:19 -
\$\begingroup\$ @trnsnt you're not wrong, but you have a slight bug however. What if
number
is10
and the'iexact'
query returns20
results. Then out of the ten firsts, you got two duplicates... You'll be throwing the ten last exact results in favor of oneistartswith
. Kinda counterintuitive. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2015年10月29日 22:53:56 +00:00Commented Oct 29, 2015 at 22:53 -
1\$\begingroup\$ if
number
is10
andiexact
return 20 resultsuniquify(models, queryset[:number])[:number]
will havelen
equals tonumber
, no ? \$\endgroup\$trnsnt– trnsnt2015年10月30日 19:49:25 +00:00Commented Oct 30, 2015 at 19:49
What you're doing here is executing three database queries and combining the results in Python. Potentially the database returns \3ドルn\$ results, of which you throw away \2ドルn\$. This is wasted effort. When \$n\$ is small you might be able to live with this, but to handle large \$n\$ it would be a good idea to get the database to run just one query and return just the results you need.
How can you do that? Well, if you were writing plain SQL, then you'd write something like this:
SELECT table.*
FROM table, (SELECT 0 AS score
UNION ALL
SELECT 1 AS score
UNION ALL
SELECT 2 AS score) AS _s
WHERE _s.score = 0 AND table.field1 ILIKE 'term'
OR _s.score = 0 AND table.field2 ILIKE 'term'
OR ...
OR _s.score = 1 AND table.field1 ILIKE 'term%'
OR _s.score = 1 AND table.field2 ILIKE 'term%'
OR ...
OR _s.score = 2 AND table.field1 ILIKE '%term%'
OR _s.score = 2 AND table.field2 ILIKE '%term%'
OR ...
GROUP BY table.id
ORDER BY MIN(_s.score)
LIMIT 10;
The idea behind this query is to score each match according to how good it is using a join with a table of scores. The GROUP BY table.id
clause ensures that a table row is returned at most once; the ORDER BY
clause ensure that we return the best match.
So you could execute this as a raw SQL query, perhaps like this:
import django.db
_OPS = [('iexact', '{}'), ('istartswith', '{}%'), ('icontains', '%{}%')]
_SCORE = ' UNION ALL '.join('SELECT {} AS score'.format(i)
for i in range(len(_OPS)))
_QUERY = '''
SELECT {table}.*
FROM {table}, ({score}) AS _s
WHERE {where}
GROUP BY {table}.{primary_key}
ORDER BY MIN(_s.score)
LIMIT {max_results}
'''
def search_for_term(model, term, fields, max_results=10):
quote = django.db.connection.ops.quote_name
table = quote(model._meta.db_table)
clauses = []
params = []
for score, (op, match) in enumerate(_OPS):
op = django.db.connection.operators[op]
for field in map(quote, fields):
clauses.append("_s.score={} AND {}.{} {}"
.format(score, table, field, op))
params.append(match.format(term))
query = _QUERY.format(table=model._meta.db_table,
score=_SCORE,
where=' OR '.join(clauses),
primary_key=quote(model._meta.pk.name),
max_results=max_results)
return model.objects.raw(query, params)
(If the search terms need to contain percent signs or backslashes then you'll need to escape them.)
However, you would almost certainly be much better off using your database's full text search capability. For example, see the documentation for Sqlite3 or PostgreSQL.