Elasticsearch in Ruby

A tiny project using the search engine elasticsearch in Ruby.

I’m going to write about getting started with elasticsearch by doing a small project in Ruby. But what is this elasticsearch you say?

elasticsearch is a flexible and powerful open source, distributed real-time search and analytics engine for the cloud.

That pretty much sums it up. There is also a really good overview of the project on elasticsearch.org/overview

Installing elasticsearch

Elasticsearch is available from Homebrew

$ brew install elasticsearch

The current version of elasticsearch is 0.20.6 as of this writing.

elasticsearch-head

A nice little HTML5 front end for elasticsearch. Lets install it!

(Not using Homebrew? Replace the string in backticks with the path to the elasticsearch plugin binary)

`brew list elasticsearch|grep -m1 plugin` \
        -install mobz/elasticsearch-head

Ruby and libraries for elasticsearch

I’m going to use Ruby 2.0.0 but 1.9.3 should also work (with some minor changes).

Stretcher

The most popular RubyGems for talking to elasticsearch seems to be Tire and RubberBand, but I’m going to use Stretcher just for the fun of it.

Stretcher is designed to reflect the actual elastic search API as closely as possible, so you’ll be fine by looking directly at the elasticsearch query-dsl.

$ gem install stretcher

Now we’ll start elasticsearch (in the foreground)

$ elasticsearch -f

Dataset

We obviously need some data to search through. I’m going to use a dump of my tweets, retrieved from Your Twitter archive on the Twitter settings page.

Importing the data using Ruby

You would probably want to use something like the the CSV River Plugin in production.

I’ll just write a short import script using Ruby and the stretcher gem:

tweet_importer.rb

#!/usr/bin/env ruby
require 'csv'
require 'stretcher'

# Make sure we have piped some data to the script
if STDIN.tty?
  puts 'unzip -p tweets.zip tweets.csv | ./tweet_importer.rb'
  exit
end

# Connect to elasticsearch
es = Stretcher::Server.new('http://localhost:9200')

# Delete the tweets index if it exists
es.index(:tweets).delete if es.index(:tweets).exists?

# Bulk index the tweet documents
es.index(:tweets).bulk_index [].tap { |docs|
  # Parse the CSV data from STDIN
  CSV.parse(STDIN.read, headers: true) do |row|
    docs << row.to_hash.merge({
      '_type' => 'tweet',
      '_id'   => row['tweet_id']
    })
  end
}

Now lets import the tweets:

$ unzip -p tweets.zip tweets.csv | ./tweet_importer.rb

If everything went well, then you should be able to query the index using elasticsearch-head over at http://localhost:9200/_plugin/head/

Ok, what now?

Lets write a small web application that lets you find tweets matching a certain word. We’ll use the template language Slim and the web framework Sinatra.

Lets start by installing them:

$ gem install sinatra slim

Now we are ready to write our little app.

tweet_search.rb

require 'slim'
require 'sinatra'
require 'stretcher'

configure do
  ES = Stretcher::Server.new('http://localhost:9200')
end

class Tweets
  def self.match(text: 'elasticsearch', size: 1000)
    ES.index(:tweets).search size: size, query: {
      match: { text: text }
    }
  end
end

get "/" do
  redirect "/elasticsearch"
end

get "/:word" do
  slim :index, locals: {
    tweets: Tweets.match(text: params[:word])
  }
end

__END__
@@ layout
doctype html
html
  body== yield

@@ index
h1= "#{tweets.total} tweets matching “#{params[:word]}”"
ul
  - tweets.results.each do |tweet|
    li= tweet.text

Now you just need to start the app:

$ ruby tweet_search.rb
== Sinatra/1.4.2 has taken the stage on 4567
for development with backup from Thin
>> Thin web server (v1.5.0 codename Knife)
>> Maximum connections set to 1024
>> Listening on localhost:4567, CTRL+C to stop

Open your browser and go to http://localhost:4567/