Solr Setup

The purpose of this document is to show the steps that were initially used to set up the instanc incase it needs to be re-created at some point.

Solr Setup

Overview

Our EC2 instance is a nano/small Ubuntu 18.04 server running Solr 8.2.0 via Jetty. Solr is installed manually via the suggested setup steps on the Solr website:

https://lucene.apache.org/solr/guide/8_1/taking-solr-to-production.html#taking-solr-to-production

EC2 Instance

The first thing is the EC2 instance setup. We don't want our instance available to the outside world, so the first thing to do is to lock-down the EC2 instance that Solr will be installed on.

For our Django servers and Solr to communicate, they need:

  • To be in the same VPC
  • Be in the same subnet (not 100% about this)

First find the instance and take note of the private IP, here 172.31.19.142:

Screenshot 1

Then click the "subnet ID" to be brought to the relevant subnet page:

Screenshot 2

Take note of the IPv4 CIDR value 172.31.16.0/20.

Then go to back to the (Solr) instance page and click on the "security group":

Screenshot 3

Hit "edit" and add a new rule that restricts connections to the subnet this instance is in. This now means that only AWS instance can communicate with our Solr instance.

SSH

Also note that SSH is set up on port 30000 and is publically available.

Installation

Install Java:

sudo apt install -y default-jdk
wget http://www-eu.apache.org/dist/lucene/solr/8.2.0/solr-8.2.0.tgz
tar zxf solr-8.2.0.tgz solr-8.2.0/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-8.2.0.tgz

You can start/stop and restart with

sudo service solr restart
sudo service solr stop
sudo service solr start

Create a core

Add a core:

sudo -u solr /opt/solr/bin/solr create -c findhello-staging -n data_driven_schema_configs

This will add our new core configuration and data to /var/solr/data/findhello-staging/

Generate a schema.xml

The schema is the file Solr uses to create its indexes. It explains what fields we want in our search documents. When we create a core, Solr adds a default schema but we need to generate one specific to our Django app. Thankfull, Haystack can do that for us.

So back on the Django server, run the command:

python manage.py build_solr_schema

This will output a schema file that you need to copy and paste to the Solr config on the Solr instance at /var/solr/data/findhello-staging/conf/schema.xml

WARNING: You will need to update the schema any time you make a change to the structure of our search documents (i.e. search_indexes.py) in Django. Haystack doesn't automatically tell Solr that the structure of our documents has changed, so you need to update the file manually and reindex/update the index.

Create a solrconfig.xml

Like the schema, we also need to update the Solr settings for our core so that it's compatible with our needs.

Get the default Haystack Solr Config from the Haystack Github page:

https://github.com/django-haystack/django-haystack/blob/master/haystack/templates/search_configuration/solrconfig.xml

and copy it the core config folder on the Solr instance:

sudo mv /var/solr/data/findhello-staging/conf/solrconfig.xml /var/solr/data/findhello-staging/conf/solrconfig.xml.bak
sudo vim /var/solr/data/findhello-staging/conf/solrconfig.xml
...

Configure Django

Make sure your Haystack settings are correct. In you .env:

HAYSTACK_ADMIN_URL="http://172.31.19.142:8983/solr/"
HAYSTACK_URL="http://172.31.19.142/solr/findhello-staging/"
HAYSTACK_ENGINE="haystack.backends.solr_backend.SolrEngine"

Note that 172.31.19.142 is the private IP of the AWS instance with Solr installed.


Making changes to Solr

If you change your search_indexes in Django, you need to replace the schema.xml and solrconfig.xml mentioned above with the new versions. Ideally, you generate these files via the manage.py build_schema_files command from Haystack locally on your machine and then commit them to Github.

Then, take the commited files and update them on the Solr instance:

sudo vim /var/solr/data/findhello-staging/conf/solrconfig.xml
sudo vim /var/solr/data/findhello-staging/conf/schema.xml

You then need to make sure to reload the core:

curl 'http://127.0.0.1:8983/solr/admin/cores?action=RELOAD&core=findhello-staging&wt=json'

You should then rebuild the index from the Django staging/production instance

python manage.py rebuild_index

Updating the index

http://docs.haystacksearch.org/en/master/searchindex_api.html#keeping-the-index-fresh

The index needs to be updated periodically:

python manage.py update_index -r

(Note: the inclusion of the -r flag ensures that the index removes any records that were deleted directly from the DB)

So we have a crontab job that does this hourly:

crontab -e
DJANGO_SETTINGS_MODULE=config.settings.production
*/5 * * * * /home/ubuntu/.venvs/find_hello/bin/python /home/ubuntu/find-hello-api/manage.py update_index -r >> /home/ubuntu/find-hello-api/logfiles/cron.out 2>&1

Note: it's important to include the DJANGO_SETTINGS_MODULE environment variable here so that cron uses the correct settings file when running the Django Haystack management command. Exclusing this was previously responsible for us missing the fact that the index wasn't being updated.

Accesing the admin panel

Becaues we have blocked port :8983 to the public, we can't see the admin panel. One trick here is to perform a port-forward via SSH.

In your local ~/.ssh/config add the following:

Host solr.usahello.org
    HostName 52.38.105.19
    User ubuntu
    Port 30000
    LocalForward 8001 localhost:8983

This will allow you to connect to the remote Solr admin by accessing http://localhost:8001 on your own machine.


Troubleshooting

Error regarding currency.xml

After installing the schema with Haystack, I was noticing this error:

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core mycore: Error while parsing currency configuration file currency.xml

This is due to the default haystack schema that it generates includes fields for currency, for example:

<dynamicField name="*_c"   type="currency" indexed="true"  stored="true"/>
...
<fieldType name="currency" class="solr.CurrencyField" precisionStep="8" defaultCurrency="USD" currencyConfig="currency.xml" />
...

So delete them! Or you can add an example currency.xml to the core's config folder:

https://raw.githubusercontent.com/apache/lucene-solr/master/solr/example/files/conf/currency.xml

Error regarding QueryElevationComponent

If you see this:

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error initializing QueryElevationComponent

Delete the following from the solrconfig.xml:

<!-- Query Elevation Component

       http://wiki.apache.org/solr/QueryElevationComponent

       a search component that enables you to configure the top
       results for a given query regardless of the normal lucene
       scoring.
    -->
  <searchComponent name="elevator" class="solr.QueryElevationComponent" >
    <!-- pick a fieldType to analyze queries -->
    <str name="queryFieldType">string</str>
    <str name="config-file">elevate.xml</str>
  </searchComponent>

  <!-- A request handler for demonstrating the elevator component -->
  <requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
    </lst>
    <arr name="last-components">
      <str>elevator</str>
    </arr>
  </requestHandler>