Building a simple SQL interface for ElasticSearch

Rengarajan
3 min readAug 10, 2020
Photo by Caspar Camille Rubin on Unsplash

ElasticSearch provides the ability to not only store and retrieve text in an efficient manner but can be used as an effective analytics store. While the ElasticSearch query DSL is fantastic and provides all the capabilities for a developer to retrieve data, it can quickly become verbose and complicated.

Use Case

One of the most common use cases of ElasticSearch is log analysis and we ingested logs from multiple sources into an ElasticSearch cluster mainly for troubleshooting purposes. The log analysis started with typical ingestion pipeline where the logs are parsed, cleansed, enriched with additional metadata and indexed in ElasticSearch.

The end users of the log analysis application we wrote are technical support engineers who are comfortable in writing SQL to query to analyze log data. By letting the users write their queries in a SQL like manner, helped us to open up the majority of the querying capabilities of ElasticSearch without them learning a new DSL.

Data Model

Data is stored as a collection of documents and indices. For the purpose of this post, let’s assume the following mapping to the relational data model.

  • Documents = Rows
  • Index = Table

So that we can try to build an interface which can take a SQL query

select * from employee where age >= 30 limit 20

and translate it into

This can be executed using standard ElasticSearch high level or low level clients to fetch the results.

SQL Parsing

For parsing the SQL query, we can use any of the available SQL parser, in this case we used presto-parser . Using the parser you can get an AST tree for the select statement.

Query 
QueryBody
QuerySpecification
Select
*
From
Table[employee]
Where
GREATER_THAN_OR_EQUAL
QualifiedName[age]
Long[30]
Limit: 20

Now that we have a tree, we can visit tree nodes and leverage the ElasticSearch QueryBuilder to build out the translation. The presto parser provides various tree traversal classes and for one such class is DefaultExpressionTraversalVisitor which can be extended to build and ExpressionVisitor for the where operations.For e.g the below code listing is a simplistic way of handling comparison operations in the where clause of the query.

Aggregations as Functions

In the previous section we saw how to handle a typical where clause. ElasticSearch comes with a battery of aggregations like,

  • Terms
  • Statistical Operations like min, max, median etc.
  • Date Histograms for time series data

The approach we took in this case was to model these aggregations as SQL functions. For e.g a simple query to find the departments where the employees with age ≥ 30 are working can be found using

select terms(department) from employee where age >= 30 limit 0

Query Execution

The generated query is then executed using standard ElasticSearch Java clients. This helps in embedding this query layer into most of the existing Java applications and provide the application developers a familiar SQL like query capability for ElasticSearch.

Alternatives

--

--

Rengarajan

Software Engineer | Passionate Learner | Lifelong Student