Django Haystack with Elasticsearch and Postgres
Haystack provides modular search for Django. It features a unified, familiar API that allows us to plug in different search backends (such as Solr and Elasticsearch, etc.) without having to modify our code.
This Elasticsearch application using Haystack can be found on GitHub Einsteinish/Django-Haystack-Elasticsearch.
tree-1.png
Let's install packages on virtualenv:
$ virtualenv venv $ source venv/bin/activate (venv) pip install -r requirements.txt
By default, Postgres uses an authentication scheme called "peer authentication" for local connections. Basically, this means that if the user's operating system username matches a valid Postgres username, that user can login with no further authentication.
That's why we get the following error when we tried to access Postgres with a user name 'k' which is the current user:
(venv)k@laptop:~/TEST/DJ2$ psql psql: FATAL: database "k" does not exist
Actually, during the Postgres installation, an operating system user named postgres was created to correspond to the postgres PostgreSQL administrative user. We need to change to this user to perform administrative tasks:
(venv)k@laptop:~/TEST/DJ2$ sudo su - postgres
We should now be in a shell session for the postgres user. Log into a Postgres session by typing:
postgres@laptop:~$ psql psql (9.3.9) Type "help" for help.
First, we will create a database for our Django project (haystack-elasticsearch). Each project should have its own isolated database for security reasons. We will call our database searchdb :
postgres=# CREATE DATABASE search_app_db; CREATE DATABASE postgres-# CREATE USER k WITH PASSWORD 'password'; CREATE ROLE
Afterwards, we'll modify a few of the connection parameters for the user we just created. This will speed up database operations so that the correct values do not have to be queried and set each time a connection is established.
We're setting the default encoding to UTF-8, which Django expects. We're also setting the default transaction isolation scheme to "read committed", which blocks reads from uncommitted transactions. Lastly, we are setting the timezone. By default, our Django projects will be set to use UTC:
postgres=# ALTER ROLE k SET client_encoding TO 'utf8'; ALTER ROLE postgres=# ALTER ROLE k SET default_transaction_isolation TO 'read committed'; ALTER ROLE postgres=# ALTER ROLE k SET timezone TO 'UTC'; ALTER ROLE
Now, all we need to do is give our database user access rights to the database we created:
postgres=# GRANT ALL PRIVILEGES ON DATABASE search_app_db TO k; GRANT
We can list the databases using \l:
postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges ---------------+----------+----------+-------------+-------------+----------------------- bogotobogo | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | bogotobogo2 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | search_app_db | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres + | | | | | postgres=CTc/postgres+ | | | | | sfvue=CTc/postgres + | | | | | k=CTc/postgres searchdb | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres test1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | testdb | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | (9 rows)
As with most Django applications, we should add haystack to the INSTALLED_APPS within our settings.py.
INSTALLED_APPS = ( 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.sites', 'django.contrib.messages', 'django.contrib.staticfiles', 'django.contrib.admin', 'haystack', 'search_app', )
Now, let's add Haystack connection string into settings.py and set a default index name.
#HAYSTACK settings HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.BaseSignalProcessor' HAYSTACK_SEARCH_RESULTS_PER_PAGE = 12 HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine', 'URL': 'http://127.0.0.1:9200/', 'INDEX_NAME': 'haystack', }, }
SearchIndex objects are the way Haystack determines what data should be placed in the search index and handles the flow of data in. We can think of them as being similar to Django Models or Forms in that they are field-based and manipulate/store data.
We generally create a unique SearchIndex for each type of Model we wish to index, though we can reuse the same SearchIndex between different models if we take care in doing so and our field names are very standardized.
To build a SearchIndex, all that's necessary is to subclass both indexes. SearchIndex and indexes.Indexable, define the fields we want to store data with and define a get_model method.
We'll create the following DocumentIndex to correspond to our Document model. This code generally goes in a search_indexes.py file within the app it applies to, though that is not required. This allows Haystack to automatically pick it up. The DocumentIndex should look like this (search_app/search_indexes.py):
from haystack import indexes from search_app.models import Document class DocumentIndex(indexes.SearchIndex, indexes.Indexable): text = indexes.CharField(document=True, use_template=True) def get_model(self): return Document
Also, we're providing use_template=True on the text field. This allows us to use a data template (rather than error prone concatenation) to build the document the search engine will use in searching. We'll need to create a new template inside our template directory, search_app/templates/search/indexes/search_app/document_text.txt.
tree-1.png
We need to place the following into the code:
{{ object.name }} {{ object.body }}
Also to integrate Haystack with Django admin, create search_app/search_sites.py inside our application:
import haystack haystack.autodiscover()
Within our URLconf (search_app/urls.py), add the following line:
from django.conf.urls import patterns, include, url from django.contrib import admin from search_app import settings admin.autodiscover() # Uncomment the next two lines to enable the admin: # from django.contrib import admin # admin.autodiscover() urlpatterns = patterns('', url(r'^$', 'search_app.views.home', name='home'), url(r'^about/', 'search_app.views.about', name='about'), url(r'^admin/', include(admin.site.urls)), (r'^media/(?P<path>.*)$', 'django.views.static.serve', {'document_root': settings.MEDIA_ROOT}), (r'^search/', include('haystack.urls')), )
This will pull in the default URLconf for Haystack. It consists of a single URLconf that points to a SearchView instance. We can change this class's behavior by passing it any of several keyword arguments or override it entirely with our own view.
Our search template (search_app/templates/search/search.html for the default case) will likely be very simple. The following is enough to get going (our template/block names will likely differ):
{% extends 'index.html' %} {% block content %} <div class="page-header"> <h3> Search Results </h3> </div> <div class="row"> <div class="span12"> <table class="table table-bordered"> <thead> <tr> <th> Name </th> <th> Text </th> </tr> </thead> <tbody> {% for result in page.object_list %} <tr> <td>{{ result.object.name }} </td> <td>{{ result.object.body }} </td> </tr> {% empty %} <tr>No results found.</tr> {% endfor %} </tbody> </table> </div> </div> {% endblock %}
With default url configuration we need to make a get request with parameter named q to action /search in search_app/templates/index.html:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <link rel="stylesheet" href='/media/styles/bootstrap.min.css' type="text/css" media="screen"/> <link rel="stylesheet" href='/media/styles/bootstrap-responsive.min.css' type="text/css" media="screen"/> <title>Bogotobogo Django Haystack with Elasticsearch</title> <style type="text/css"> body { padding-top: 60px; } </style> </head> <body> <div class="navbar navbar-fixed-top"> <div class="navbar-inner"> <div class="container"> <a href="/" class="brand">Bogotobogo Django Haystack with Elasticsearch</a> <div class="nav-collapse"> <ul class="nav"> <li><a href="/">Home</a></li> <li><a href="/about">About</a></li> <li><a href="/admin">Admin</a></li> </ul> <ul class="nav pull-right"> <li class="divider-vertical"></li> <form action="/search" method="get" class="navbar-search pull-left"> <input type="text" placeholder="Search" class="search-query span2" name="q"> </form> </ul> </div> <!-- /.nav-collapse --> </div> </div> <!-- /navbar-inner --> </div> <div class="container"> {% block content %} <div class="row"> <div class="span12"> <div class="hero-unit"> <h1> Bogotobogo Django Haystack with Elasticsearch.</h1> <br/> <p> This example illustrates basic search features of <a href="http://www.einsteinish.com" target="_blank">einsteinish.com</a> (Search as a service powered by <a href="http://www.elasticsearch.org" target="_blank">Elastisearch</a>). </p> <p> Each CRUD operation on documents is reflected to search index in real time.</p> <p> To test Einsteinish's search features, please register and create resources/topics.</p> <p> The search feature is using <a href="http://haystacksearch.org/" target="_blank">Haystack</a> moduler search for Django. </p> </div> </div> </div> {% endblock %} <hr> <footer class="footer"><p>2016 einsteinish.com</p> <p>Built with<a href="http://twitter.github.com/bootstrap" target="_blank"> Bootstrap</a></p></footer> </div> </body> </html>
Please visit einsteinish.com to see the Haystack-Elasticsearch in action.
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization
Django 1.8
Introduction - Install Django and Project Setup
Creating and Activating Models
Hello World A - urls & views
Hello World B - templates
Hello World C - url dispatcher
Hello World D - Models and SQLite Database
MVC - Hello World
Hello World on a Shared Host A
Hello World on a Shared Host B
Hello World - Admin Site Setup
virtualenv
Creating test project on virtualenv
Test project's settings.py
Creating Blog app and setting up models
Blog app - syncdb A
Blog app - syncdb B
Blog app - views and urls
Blog app - templates
Blog app - class based templates
Image upload sample code - local host
Authentication on Shared Host using FastCGI
User Registration on Shared Host A
User Registration with a Customized Form on Shared Host B
Blogs on Shared Host
Serving Django app with uWSGI and Nginx
Image upload sample code - shared host
Managing (Deploying) Static files (CSS, Images, Javascript) on Shared Host
Forum application on a Shared Host
Django Python Social Auth : Getting App ID (OAuth2) - Facebook, Twitter, and Google
Django: Python social auth, Facebook, Twitter, and Google Auth
Django: Python social auth, Facebook, Twitter, and Google Auth with Static files
...
Django 1.8 hosted on Linode VPS ==>
1. Setup CentOS 7 hosted on VPS
1B. Setup CentOS 7 hosted on VPS (multi-domain hosting setup) - Name server and Zone File settings (from GoDaddy to Linode)
2. ssh login and firewall
3. Apache Install
4. Install and Configure MariaDB Database server & PHP
5. Install and Configure Django
6. Model
7. Model 2 : populate tables, list_display, and search_fields
8. Model 3 (using shell)
9. Views (templates and css)
10. Views 2 (home page and more templates)
11. TinyMCE
12. TinyMCE 2
13. ImageField/FileField : Serving image/video files uploaded by a user
14. User Authentication 1 (register & forms)
15. User Authentication 2 (login / logout)
16. User Authentication 3 (password reset) - Sent from Email (gmail) setup etc.
17. User Authentication 4 (User profile & @login_required decorator)
18. User Authentication 5 (Facebook login)
19. User Authentication 6 (Google login)
20. User Authentication 7 (Twitter login)
21. User Authentication 8 (Facebook/Google/Twitter login buttons)
22. Facebook open graph API timeline fan page custom tab 1
23. Facebook Open Graph API Timeline Fan Page Custom Tab 2 (SSL certificate setup)
24. Facebook open graph API timeline fan page custom tab 3 (Django side - urls.py, settings.py, and views.py)
...
A sample production site Django 1.8.7: sfvue.com / einsteinish.com ==>
A sample production app (sfvue.com) with virtualenv and Apache
2. Upgrading to Django 1.8.7 sfvue.com site sample with virtualenv and Apache
(*) Django 1.8.7 einsteinish.com site - errors and fixes
Django 1.8.12 pytune.com site - local with Apache mod_wsgi
Django 1.8.12 pytune.com site - local with Nginx and uWSGI
Django 1.8.12 pytune.com site - deploy to AWS with Nginx and uWSGI
Django Haystack with Elasticsearch and Postgres
Django Compatibility Cheat Sheet
Python tutorial
Python Home
Introduction
Running Python Programs (os, sys, import)
Modules and IDLE (Import, Reload, exec)
Object Types - Numbers, Strings, and None
Strings - Escape Sequence, Raw String, and Slicing
Strings - Methods
Formatting Strings - expressions and method calls
Files and os.path
Traversing directories recursively
Subprocess Module
Regular Expressions with Python
Regular Expressions Cheat Sheet
Object Types - Lists
Object Types - Dictionaries and Tuples
Functions def, *args, **kargs
Functions lambda
Built-in Functions
map, filter, and reduce
Decorators
List Comprehension
Sets (union/intersection) and itertools - Jaccard coefficient and shingling to check plagiarism
Hashing (Hash tables and hashlib)
Dictionary Comprehension with zip
The yield keyword
Generator Functions and Expressions
generator.send() method
Iterators
Classes and Instances (__init__, __call__, etc.)
if__name__ == '__main__'
argparse
Exceptions
@static method vs class method
Private attributes and private methods
bits, bytes, bitstring, and constBitStream
json.dump(s) and json.load(s)
Python Object Serialization - pickle and json
Python Object Serialization - yaml and json
Priority queue and heap queue data structure
Graph data structure
Dijkstra's shortest path algorithm
Prim's spanning tree algorithm
Closure
Functional programming in Python
Remote running a local file using ssh
SQLite 3 - A. Connecting to DB, create/drop table, and insert data into a table
SQLite 3 - B. Selecting, updating and deleting data
MongoDB with PyMongo I - Installing MongoDB ...
Python HTTP Web Services - urllib, httplib2
Web scraping with Selenium for checking domain availability
REST API : Http Requests for Humans with Flask
Blog app with Tornado
Multithreading ...
Python Network Programming I - Basic Server / Client : A Basics
Python Network Programming I - Basic Server / Client : B File Transfer
Python Network Programming II - Chat Server / Client
Python Network Programming III - Echo Server using socketserver network framework
Python Network Programming IV - Asynchronous Request Handling : ThreadingMixIn and ForkingMixIn
Python Coding Questions I
Python Coding Questions II
Python Coding Questions III
Python Coding Questions IV
Python Coding Questions V
Python Coding Questions VI
Python Coding Questions VII
Python Coding Questions VIII
Python Coding Questions IX
Python Coding Questions X
Image processing with Python image library Pillow
Python and C++ with SIP
PyDev with Eclipse
Matplotlib
Redis with Python
NumPy array basics A
NumPy Matrix and Linear Algebra
Pandas with NumPy and Matplotlib
Celluar Automata
Batch gradient descent algorithm
Longest Common Substring Algorithm
Python Unit Test - TDD using unittest.TestCase class
Simple tool - Google page ranking by keywords
Google App Hello World
Google App webapp2 and WSGI
Uploading Google App Hello World
Python 2 vs Python 3
virtualenv and virtualenvwrapper
Uploading a big file to AWS S3 using boto module
Scheduled stopping and starting an AWS instance
Cloudera CDH5 - Scheduled stopping and starting services
Removing Cloud Files - Rackspace API with curl and subprocess
Checking if a process is running/hanging and stop/run a scheduled task on Windows
Apache Spark 1.3 with PySpark (Spark Python API) Shell
Apache Spark 1.2 Streaming
bottle 0.12.7 - Fast and simple WSGI-micro framework for small web-applications ...
Flask app with Apache WSGI on Ubuntu14/CentOS7 ...
Selenium WebDriver
Fabric - streamlining the use of SSH for application deployment
Ansible Quick Preview - Setting up web servers with Nginx, configure enviroments, and deploy an App
Neural Networks with backpropagation for XOR using one hidden layer
NLP - NLTK (Natural Language Toolkit) ...
RabbitMQ(Message broker server) and Celery(Task queue) ...
OpenCV3 and Matplotlib ...
Simple tool - Concatenating slides using FFmpeg ...
iPython - Signal Processing with NumPy
iPython and Jupyter - Install Jupyter, iPython Notebook, drawing with Matplotlib, and publishing it to Github
iPython and Jupyter Notebook with Embedded D3.js
Downloading YouTube videos using youtube-dl embedded with Python
Machine Learning : scikit-learn ...
Django 1.6/1.8 Web Framework ...