(Comments)

Installing Elastic Search

Ubuntu

Dependencies

The easiest option is to install OpenJDK as follows:
sudo apt-get install openjdk-6-jre
Elastic search actually recommends using Oracle Java which can be installed as follows
sudo add-apt-repository -y ppa:webupd8team/java 
sudo apt-get update
sudo apt-get -y install oracle-java8-installer

Installation

  1. Obtain the GPG key for the source
    wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
  2. Add the source to ubuntu's sources list
    echo 'deb http://packages.elasticsearch.org/elasticsearch/1.4/debian stable main' | sudo tee /etc/apt/sources.list.d/elasticsearch.list
  3. Update the package list from the sources list
    sudo apt-get update
  4. Install the Elastic Search package
    sudo apt-get -y install elasticsearch=1.4.4

Configuration

Configuration can go on extensively depending upon what you require. Out of the box is good enough to get you stared with the following change.

  1. Edit the configuration file with a text editor (using vim for this example)
    sudo vim /etc/elasticsearch/elasticsearch.yml
  2. Search for the line that contains `network.host` and change it to the following (you may just need to uncomment it). This will only allow connection from the local computer.
    network.host: localhost

Running the Service

To start the service use:
sudo service elasticsearch restart
To start the service at boot:
sudo update-rc.d elasticsearch defaults 95 10

OSX

Dependencies

Download and install Java from https://www.java.com/en/download/

Installation

  1. Assuming that brew is installed
    brew install elasticsearch

Configuration

Configuration can go on extensively depending upon what you require. Out of the box is good enough to get you stared with the following change.

  1. Edit the configuration file with a text editor (using vim for this example)
    sudo vim /usr/local/opt/elasticsearch/config/elasticsearch.yml
  2. Search for the line that contains `network.host` and change it to the following (you may just need to uncomment it). This will only allow connection from the local computer.
    network.host: localhost

Running the Service

To start the service use:

elasticsearch --config=/usr/local/opt/elasticsearch/config/elasticsearch.yml

To start the service at boot:

ln -sfv /usr/local/opt/elasticsearch/*.plist ~/Library/LaunchAgents

Time to get it running with Django!

Okay cool we have Elastic Search running and I have a Django project.... now what?

This comes down to entirely what you need to do with your search. If you need very standard search that you don't have to customise or care too much about configuring, `django-haystack` is probably the way for you. If you need to customise what you need heavily and want to take advantage of specific features of Elastic Search then `elasticsearch` and `elasticsearch_dsl` are probably what you are looking for.

`django-haystack` allows the search backend (such as elastic search) to be swapped out. Because of this search backend specific functionality is not always supported but common search functionality are.

We will look at using `django-haystack` now.

Django Haystack

https://django-haystack.readthedocs.org

Django Haystack Installation

  1. Install using `pip`:
    pip install django-haystack
  2. Add the app to `INSTALLED_APPS` in `settings.py`:
    INSTALLED_APPS += ['haystack', ]
  3. Add configuration to `settings.py`:
    HAYSTACK_CONNECTIONS = {
        'default': {
            'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
            'URL': 'http://127.0.0.1:9200/',
            'INDEX_NAME': 'haystack',
        },
    }
  4. Add search views to your URL conf by adding the following URL pattern:
    url(r'^search/', include('haystack.urls')),

Django Haystack Installation

A template must be defined for the search view to use. It expects the template to be location at `search/search.html`. An example Django Haystack provides in their documentation to get you started is as follows.

{% extends 'base.html' %} 

{% block content %}
<h2>Search</h2>

<form method="get" action=".">
<table>
{{ form.as_table }}
<tr>
<td></td>
<td>
<input type="submit" value="Search">
</td>
</tr>
</table>
{% if query %}
<h3>Results</h3>
{% for result in page.object_list %}
<p>
<a href="{{ result.object.get_absolute_url }}">
{{ result.object.title }}
</a>
</p>
{% empty %}
<p>No results found.</p>
{% endfor %}
{% if page.has_previous or page.has_next %}
<div>
{% if page.has_previous %}
<a href="?q={{ query }}&page={{ page.previous_page_number }}">
{% endif %}
« Previous
{% if page.has_previous %}
</a>
{% endif %}
|
{% if page.has_next %}
<a href="?q={{ query }}&page={{ page.next_page_number }}">
{% endif %}
Next »
{% if page.has_next %}
</a>
{% endif %}
</div>
{% endif %}
{% else %}
{# Show some example queries to run, maybe query syntax, something else? #}
{% endif %}
</form>
{% endblock %}

Django Haystack Integrating with Models

Django Haystack uses `SearchIndexes` (which are similar to Django models in terms of field based storage) to determine which data should used for searching.

Search indexes are generally created for each model type but can occasionally be used across many similar model types.

It is easiest to put your search indexes in a file named `search_indexes.py` inside your app as this allows Django Haystack to automatically detect it.

Django Haystack Search Index

If we have the following model:

from django.db import models
from django.utils import timezone


class TestModel(models.Model):
    title = models.CharField(max_length=255)
    body = models.TextField()
    author = models.ForeignKey(settings.AUTH_USER_MODEL)
    publication_date = models.DateTimeField(default=timezone.now)

Django Haystack Search Index

The following index would be applicable

from django.utils import timezone
from haystack import indexes
from myapp.models import TestModel


class TestModelIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    author = indexes.CharField(model_attr='author')
    pub_date = indexes.DateTimeField(model_attr='publication_date')

    def get_model(self):
        return TestModel

    def index_queryset(self, using=None):
        """Used when the entire index for model is updated."""
        return self.get_model().objects.filter(publication_date__lte=timezone.now())

As we used `use_template=True` on the `text` index we must provide a template for it to use. If we don't provide a template and configure which outputs we want it will concatenate the field values.

Django Haystack Search Index

The template for an index follows the following pattern:

search/indexes/[the app name]/[model name used for the index]_[index field name].txt

So for the previous example the template would exist at:

search/indexes/myapp/testmodel_text.txt

Each of the queryset objects has the previous template rendered with the object referenced (in this case a `TestModel` instance) in the context as object, so we can output the relevant fields like as follows.

{{ object.title }}
{{ object.author.get_full_name }}
{{ object.body }}

Building and Rebuilding your index

To build your index run the following management command:

./manage.py rebuild_index

To update your index run the following management command:

./manage.py update_index 

The update index command can be scheduled with a cron job to run at set intervals on your site.

You now have basic search!

Current rating: 2

Comments