In this tutorial I am going to look at the possibility of using Drupal 7 as a content management system that powers another high performance application. To illustrate the latter, I will use the Silex PHP microframework and Elasticsearch as the data source. The goal is to create a proof of concept, demonstrating using these three technologies together.
The article comes with a git repository that you should check out, which contains more complete code than can be presented in the tutorial itself. Additionally, if you are unfamiliar with either of the three open source projects being used, I recommend following the links above and also checking out the documentation on their respective websites.
The tutorial will be split into two pieces, because there is quite a lot of ground to cover.
In this part, we’ll set up Elasticsearch on the server and integrate it with Drupal by creating a small, custom module that will insert, update, and delete Drupal nodes into Elasticsearch.
In the second part, we’ll create a small Silex app that fetches and displays the node data directly from Elasticsearch, completely bypassing the Drupal installation.
Elasticsearch
The first step is to install Elasticsearch on the server. Assuming you are using Linux, you can follow this guide and set it up to run when the server starts. There are a number of configuration options you can set here.
A very important thing to remember is that Elasticsearch has no access control so, once it is running on your server, it is publicly accessible through the (default) 9200 port. To avoid having problems, make sure that in the configuration file you uncomment this line:
network.bind_host: localhost
And add the following one:
script.disable_dynamic: true
These options make sure that Elasticsearch is not accessible from the outside, nor are dynamic scripts allowed. These are recommended security measures you need to take.
Drupal
The next step is to set up the Drupal site on the same server. Using the Elasticsearch Connector Drupal module, you can get some integration with the Elasticsearch instance: it comes with the PHP SDK for Elasticsearch, some statistics about the Elasticsearch instance and some other helpful submodules. I’ll leave it up to you to explore those at your leisure.
Once the connector module is enabled, in your custom module you can retrieve the Elasticsearch client object wrapper to access data:
$client = elastic_connector_get_client_by_id('my_cluster_id');
Here, my_cluster_id
is the Drupal machine name that you gave to the Elasticsearch cluster (at admin/config/elasticsearch-connector/clusters
). The $client
object will now allow you to perform all sorts of operations, as illustrated in the docs I referenced above.
Inserting data
The first thing we need to do is make sure we insert some Drupal data into Elasticsearch. Sticking to nodes for now, we can write a hook_node_insert() implementation that will save every new node to Elasticsearch. Here’s an example, inside a custom module called elastic
:
/**
* Implements hook_node_insert().
*/
function elastic_node_insert($node) {
$client = elasticsearch_connector_get_client_by_id('my_cluster_id');
$params = _elastic_prepare_node($node);
if ( ! $params) {
drupal_set_message(t('There was a problem saving this node to Elasticsearch.'));
return;
}
$result = $client->index($params);
if ($result && $result['created'] === false) {
drupal_set_message(t('There was a problem saving this node to Elasticsearch.'));
return;
}
drupal_set_message(t('The node has been saved to Elasticsearch.'));
}
As you can see, we instantiate a client object that we use to index the data from the node. You may be wondering what _elastic_prepare_node()
is:
/**
* Prepares a node to be added to Elasticsearch
*
* @param $node
* @return array
*/
function _elastic_prepare_node($node) {
if ( ! is_object($node)) {
return;
}
$params = array(
'index' => 'node',
'type' => $node->type,
'body' => array(),
);
// Add the simple properties
$wanted = array('vid', 'uid', 'title', 'log', 'status', 'comment', 'promote', 'sticky', 'nid', 'type', 'language', 'created', 'changed', 'revision_timestamp', 'revision_uid');
$exist = array_filter($wanted, function($property) use($node) {
return property_exists($node, $property);
});
foreach ($exist as $field) {
$params['body'][$field] = $node->{$field};
}
// Add the body field if exists
$body_field = isset($node->body) ? field_get_items('node', $node, 'body') : false;
if ($body_field) {
$params['body']['body'] = $body_field;
}
// Add the image field if exists
$image_field = isset($node->field_image) ? field_get_items('node', $node, 'field_image') : false;
if ($image_field) {
$params['body']['field_image'] = array_map(function($img) {
$img = file_load($img['fid']);
$img->url = file_create_url($img->uri);
return $img;
}, $image_field);
}
return $params;
}
It is just a helper function I wrote, which is responsible for “serializing” the node data and getting it ready for insertion into Elasticsearch. This is just an example and definitely not a complete or fully scalable one. It is also assuming that the respective image field name is field_image
. An important point to note is that we are inserting the nodes into the node
index with a type = $node->type
.
Updating data
Inserting is not enough, we need to make sure that node changes get reflected in Elasticsearch as well. We can do this with a hook_node_update() implementation:
/**
* Implements hook_node_update().
*/
function elastic_node_update($node) {
if ($node->is_new !== false) {
return;
}
$client = elasticsearch_connector_get_client_by_id('my_cluster_id');
$params = _elastic_prepare_node($node);
if ( ! $params) {
drupal_set_message(t('There was a problem updating this node in Elasticsearch.'));
return;
}
$result = _elastic_perform_node_search_by_id($client, $node);
if ($result && $result['hits']['total'] !== 1) {
drupal_set_message(t('There was a problem updating this node in Elasticsearch.'));
return;
}
$params['id'] = $result['hits']['hits'][0]['_id'];
$version = $result['hits']['hits'][0]['_version'];
$index = $client->index($params);
if ($index['_version'] !== $version + 1) {
drupal_set_message(t('There was a problem updating this node in Elasticsearch.'));
return;
}
drupal_set_message(t('The node has been updated in Elasticsearch.'));
}
We again use the helper function to prepare our node for insertion, but this time we also search for the node in Elasticsearch to make sure we are updating and not creating a new one. This happens using another helper function I wrote as an example:
/**
* Helper function that returns a node from Elasticsearch by its nid.
*
* @param $client
* @param $node
* @return mixed
*/
function _elastic_perform_node_search_by_id($client, $node) {
$search = array(
'index' => 'node',
'type' => $node->type,
'version' => true,
'body' => array(
'query' => array(
'match' => array(
'nid' => $node->nid,
),
),
),
);
return $client->search($search);
}
You’ll notice that I am asking Elasticsearch to return the document version as well. This is so that I can check if a document has been updated with my request.
Deleting data
The last (for now) feature we need is the ability to remove the data from Elasticsearch when a node gets deleted. hook_node_delete() can help us with that:
/**
* Implements hook_node_delete().
*/
function elastic_node_delete($node) {
$client = elasticsearch_connector_get_client_by_id('my_cluster_id');
// If the node is in Elasticsearch, remove it
$result = _elastic_perform_node_search_by_id($client, $node);
if ($result && $result['hits']['total'] !== 1) {
drupal_set_message(t('There was a problem deleting this node in Elasticsearch.'));
return;
}
$params = array(
'index' => 'node',
'type' => $node->type,
'id' => $result['hits']['hits'][0]['_id'],
);
$result = $client->delete($params);
if ($result && $result['found'] !== true) {
drupal_set_message(t('There was a problem deleting this node in Elasticsearch.'));
return;
}
drupal_set_message(t('The node has been deleted in Elasticsearch.'));
}
Again, we search for the node in Elasticsearch and use the returned ID as a marker to delete the document.
Please keep in mind though that using early returns such as illustrated above is not ideal inside Drupal hook implementations unless this is more or less all the functionality that needs to go in them. I recommend splitting the logic into helper functions if you need to perform other unrelated tasks inside these hooks.
This is enough to get us started using Elasticsearch as a very simple data source on top of Drupal. With this basic code in place, you can navigate to your Drupal site and start creating some nodes, updating them and deleting them.
One way to check if Elasticsearch actually gets populated, is to disable the remote access restriction I mentioned above you need to enable. Make sure you only do this on your local, development, environment. This way, you can perform HTTP requests directly from the browser and get JSON data back from Elasticsearch.
You can do a quick search for all the nodes in Elasticsearch by navigating to this URL:
http://localhost:9200/node/_search
…where localhost points to your local server and 9200 is the default Elasticsearch port.
For article nodes only:
http://localhost:9200/node/article/_search
And for individual articles, by the auto generated Elasticsearch ids:
http://localhost:9200/node/article/AUnJgdPGGE7A1g9FtqdV
Go ahead and check out the Elasticsearch documentation for all the amazing ways you can interact with it.
Conclusion
We’ve seen in this article how we can start working to integrate Elasticsearch with Drupal. Obviously, there is far more we can do based on even the small things we’ve accomplished. We can extend the integration to other entities and even Drupal configuration if needed. In any case, we now have some Drupal data in Elasticsearch, ready to be used from an external application.
That external application will be the task for the second part of this tutorial. We’ll be setting up a small Silex app that, using the Elasticsearch PHP SDK, will read the Drupal data directly from Elasticsearch. As with part 1, above, we won’t be going through a step-by-step tutorial on accomplishing a given task, but instead will explore one of the ways that you can start building this integration. See you there.
Frequently Asked Questions (FAQs) about Elasticsearch and Drupal Integration
How can I troubleshoot issues with Elasticsearch and Drupal integration?
Troubleshooting issues with Elasticsearch and Drupal integration can be a complex process. First, ensure that you have correctly followed all the installation and integration steps. Check your Drupal and Elasticsearch logs for any error messages. If you’re using the Elasticsearch Connector module, you can enable the “Elasticsearch Helper” module, which provides additional debugging information. If you’re still experiencing issues, consider reaching out to the Drupal and Elasticsearch communities for further assistance.
Can I use Elasticsearch with older versions of Drupal?
Yes, you can use Elasticsearch with older versions of Drupal. However, the process may vary depending on the version of Drupal you’re using. The Elasticsearch Connector module, for example, supports Drupal 7 and 8. Always check the compatibility of the module or plugin you’re using with your Drupal version.
How can I optimize the performance of Elasticsearch with Drupal?
Optimizing the performance of Elasticsearch with Drupal involves several steps. First, ensure that your Elasticsearch server has sufficient resources (CPU, memory, and disk space). You can also optimize your Elasticsearch queries and indices. Additionally, consider using Drupal’s caching mechanisms to reduce the load on your Elasticsearch server.
How can I secure my Elasticsearch and Drupal integration?
Securing your Elasticsearch and Drupal integration involves several steps. First, ensure that your Elasticsearch server is not publicly accessible. You can do this by configuring your firewall rules or using a VPN. Additionally, consider using HTTPS for communication between Drupal and Elasticsearch. Finally, always keep your Drupal and Elasticsearch software up-to-date to benefit from the latest security patches.
Can I use Elasticsearch with Drupal multisite?
Yes, you can use Elasticsearch with Drupal multisite. However, you may need to configure each site separately, depending on your specific requirements. The Elasticsearch Connector module, for example, allows you to configure different Elasticsearch clusters for each site.
How can I index custom fields in Elasticsearch with Drupal?
Indexing custom fields in Elasticsearch with Drupal involves configuring your Elasticsearch index mappings and Drupal’s search API fields. This process may vary depending on the specific module or plugin you’re using. Always refer to the documentation of the module or plugin for specific instructions.
How can I use Elasticsearch for faceted search in Drupal?
Using Elasticsearch for faceted search in Drupal involves configuring your Elasticsearch index and Drupal’s search API fields to support faceting. You can then use modules like Facet API or Search API Facets to create and display facets in your Drupal site.
Can I use Elasticsearch with Drupal Commerce?
Yes, you can use Elasticsearch with Drupal Commerce. This can be achieved by indexing your Drupal Commerce products in Elasticsearch and then using Drupal’s search API to query and display the products. You may need to configure your Elasticsearch index mappings and Drupal’s search API fields to support product attributes.
How can I monitor the performance of Elasticsearch with Drupal?
Monitoring the performance of Elasticsearch with Drupal can be achieved using various tools. Elasticsearch itself provides a monitoring API that you can use to track various metrics. Additionally, you can use tools like Kibana or ElasticHQ to visualize your Elasticsearch performance data.
How can I handle multilingual content with Elasticsearch and Drupal?
Handling multilingual content with Elasticsearch and Drupal involves configuring your Elasticsearch index and Drupal’s search API fields to support multiple languages. You can then use Drupal’s multilingual features to manage and display content in different languages.
Daniel Sipos is a Drupal developer who lives in Brussels, Belgium. He works professionally with Drupal but likes to use other PHP frameworks and technologies as well. He runs webomelette.com, a Drupal blog where he writes articles and tutorials about Drupal development, theming and site building.