Flexible searching and indexing for web applications and sites is almost always useful and sometimes absolutely essential. While there are many complex solutions that manage data and allow you to retrieve and interact with it through HTTP methods, ElasticSearch has gained popularity due to its easy configuration and incredible malleability.
Elasticsearch is an open-source search engine built on top of Apache Lucene, a full-text search-engine library.
CRUD stands for create, read, update, and delete. These are all operations that are needed to effectively administer persistent data storage. Luckily, these also have logical equivalents in HTTP methods, which makes it easy to interact using standard methods. The CRUD methods are implemented by the HTTP methods POST, GET, PUT, and DELETE respectively.
In order to use ElasticSearch for anything useful, such as searching, the first step is to populate an index with some data. This process known as indexing.
Documents are indexed—stored and made searchable—by using the index API.
In ElasticSearch, indexing corresponds to both “Create” and “Update” in CRUD – if we index a document with a given type and ID that doesn’t already exists it’s inserted. If a document with the same type and ID already exists, it’s overwritten.
From our perspectives as users of ElasticSearch, a document is a JSON object. As such a document can can have fields in the form of JSON properties. Such properties can be values such as strings or numbers, but they can also be other JSON objects.
In order to create a document, we make a PUT request to the REST API to a URL made up of the index name, type name and ID. That is: http://localhost:9200/// and include a JSON object as the PUT data.
Index and type are required while the id part is optional. If we don’t specify an ID ElasticSearch will generate one for us. However, if we don’t specify an id we should use POST instead of PUT. The index name is arbitrary. If there isn’t an index with that name on the server already one will be created using default configuration.
As for the type name it too is arbitrary. It serves several purposes, including:
Each type has its own ID space.
Different types can have different mappings (“schema” that defines how properties/fields should be indexed).
Although it’s possible, and common, to search over multiple types, it’s easy to search only for one or more specific type(s).
Let’s index something! We can put just about anything into our index as long as it can be represented as a single JSON object. For the sake of having something to work with we’ll be indexing, and later searching for, movies. Here’s a classic one:
Sample JSON object
To index the above JSON object we decide on an index name (“movies”), a type name (“movie”) and an ID (“1”) and make a request following the pattern described above with the JSON object in the body.
A request that indexes the sample JSON object as a document of type ‘movie’ in an index named ‘movies’
Execute the above request using cURL or paste it into sense and hit the green arrow to run it. After doing so, given that ElasticSearch is running, you should see a response looking like this:
Response from ElasticSearch to the indexing request.
The request for, and result of, indexing the movie in Sense.
As you see, the response from ElasticSearch is also a JSON object. It’s properties describe the result of the operation. The first three properties simply echo the information that we specified in the URL that we made the request to. While this can be convenient in some cases it may seem redundant. However, remember that the ID part of the URL is optional and if we don’t specify an ID the _id property will be generated for us and its value may then be of great interest to us.
Related Article: Sort the Results Using a Sort Property
The fourth property, _version, tells us that this is the first version of this document (the document with type “movie” with ID “1”) in the index. This is also confirmed by the fifth property, “created”, whose value is true.
Now that we’ve got a movie in our index let’s look at how we can update it, adding a list of genres to it. In order to do that we simply index it again using the same ID. In other words, we make the exact same indexing request as as before but with an extended JSON object containing genres.
Indexing request with the same URL as before but with an updated JSON payload.
This time the response from ElasticSearch looks like this:
The response after performing the updated indexing request.
Not surprisingly the first three properties are the same as before. However, the _version property now reflects that the document has been updated as it now has 2 a version number. The created property is also different, now having the value false. This tells us that the document already existed and therefore wasn’t created from scratch.
It may seem that the created property is redundant. Wouldn’t it be enough to inspect the _-
version property to see if its value is greater than one? In many cases that would work. However,
if we were to delete the document the version number wouldn’t be reset meaning that if we later
indexed a document with the same ID the version number would be greater than one.
So, what’s the purpose of the _version property then? While it can be used to track how many
times a document has been modified it’s primary purpose is to allow for optimistic concurrency
If we supply a version in indexing requests ElasticSearch will then only overwrite the document
if the supplied version is the same as for the document in the index. To try this out add a version
query string parameter to the URL of the request with “1” as value, making it look like this:
Indexing request with a ‘version’ query string parameter.
Now the response from ElasticSearch is different. This time it contains an error property with a message explaining that the indexing didn’t happen due to a version conflict.
Response from ElasticSearch indicating a version conflict.
Getting by ID
We’ve seen how to indexing documents, both new ones and existing ones, and have looked at how ElasticSearch responds to such requests. However, we haven’t actually confirmed that the documents exists, only that ES tells us so.
So, how do we retrieve a document from an ElasticSearch index? Of course we could search for it. However that’s overkill if we only want to retrieve a single document with a known ID. A simpler and faster approach is be to retrieve it by ID.
In order to do that we make a GET request to the same URL as when we indexed it, only this time the ID part of the URL is mandatory. In other words, in order to retrieve a document by ID from ElasticSearch we make a GET request to HTTP://LOCALHOST:9200///. Let’s try it with our movie using the following request:
As you can see the result object contains similar meta data as we saw when indexing, such as index, type and version. Last but not least it has a property named _source which contains the actual document body. There’s not much more to say about GET as it’s pretty straightforward. Let’s move on to the final CRUD operation.
In order to remove a single document from the index by ID we again use the same URL as for indexing and retrieving it, only this time we change the HTTP verb to DELETE.
Request for deleting the movie with ID 1.
curl -XDELETE “http://localhost:9200/movies/movie/1“
The response object contains some of the usual suspects in terms of meta data, along with a property named “_found” indicating that the document was indeed found and that the operation was successful.
Response to the DELETE request.
If we, after executing the DELETE request, switch back to GET we can verify that the document has indeed been deleted:
Response when making the the DELETE request a second time.