I’m going to write about getting started with elasticsearch by doing a small project in Ruby. But what is this elasticsearch you say?
elasticsearch is a flexible and powerful open source, distributed real-time search and analytics engine for the cloud.
That pretty much sums it up. There is also a really good overview of the project on elasticsearch.org/overview
Installing elasticsearch
Elasticsearch is available from Homebrew
$ brew install elasticsearch
The current version of elasticsearch is 0.20.6
as of this writing.
elasticsearch-head
A nice little HTML5 front end for elasticsearch. Lets install it!
(Not using Homebrew? Replace the string in backticks with the path to the elasticsearch plugin binary)
`brew list elasticsearch|grep -m1 plugin` \
-install mobz/elasticsearch-head
Ruby and libraries for elasticsearch
I’m going to use Ruby 2.0.0 but 1.9.3 should also work (with some minor changes).
Stretcher
The most popular RubyGems for talking to elasticsearch seems to be Tire and RubberBand, but I’m going to use Stretcher just for the fun of it.
Stretcher is designed to reflect the actual elastic search API as closely as possible, so you’ll be fine by looking directly at the elasticsearch query-dsl.
$ gem install stretcher
Now we’ll start elasticsearch (in the foreground)
$ elasticsearch -f
Dataset
We obviously need some data to search through. I’m going to use a dump of my tweets, retrieved from Your Twitter archive on the Twitter settings page.
Importing the data using Ruby
You would probably want to use something like the the CSV River Plugin in production.
I’ll just write a short import script using Ruby and the stretcher gem:
tweet_importer.rb
#!/usr/bin/env ruby
require 'csv'
require 'stretcher'
# Make sure we have piped some data to the script
if STDIN.tty?
puts 'unzip -p tweets.zip tweets.csv | ./tweet_importer.rb'
exit
end
# Connect to elasticsearch
es = Stretcher::Server.new('http://localhost:9200')
# Delete the tweets index if it exists
es.index(:tweets).delete if es.index(:tweets).exists?
# Bulk index the tweet documents
es.index(:tweets).bulk_index [].tap { |docs|
# Parse the CSV data from STDIN
CSV.parse(STDIN.read, headers: true) do |row|
docs << row.to_hash.merge({
'_type' => 'tweet',
'_id' => row['tweet_id']
})
end
}
Now lets import the tweets:
$ unzip -p tweets.zip tweets.csv | ./tweet_importer.rb
If everything went well, then you should be able to query the index using elasticsearch-head over at http://localhost:9200/_plugin/head/
Ok, what now?
Lets write a small web application that lets you find tweets matching a certain word. We’ll use the template language Slim and the web framework Sinatra.
Lets start by installing them:
$ gem install sinatra slim
Now we are ready to write our little app.
tweet_search.rb
require 'slim'
require 'sinatra'
require 'stretcher'
configure do
ES = Stretcher::Server.new('http://localhost:9200')
end
class Tweets
def self.match(text: 'elasticsearch', size: 1000)
ES.index(:tweets).search size: size, query: {
match: { text: text }
}
end
end
get "/" do
redirect "/elasticsearch"
end
get "/:word" do
slim :index, locals: {
tweets: Tweets.match(text: params[:word])
}
end
__END__
@@ layout
doctype html
html
body== yield
@@ index
h1= "#{tweets.total} tweets matching “#{params[:word]}”"
ul
- tweets.results.each do |tweet|
li= tweet.text
Now you just need to start the app:
$ ruby tweet_search.rb
== Sinatra/1.4.2 has taken the stage on 4567
for development with backup from Thin
>> Thin web server (v1.5.0 codename Knife)
>> Maximum connections set to 1024
>> Listening on localhost:4567, CTRL+C to stop
Open your browser and go to http://localhost:4567/