Quick tour with Elasticsearch 6.x

When do researching to choose a good data storage technique for log collection, searching and analytic; I found elasticsearch is a ideal choice because of following reasons:

  • Performance: fast query with million records within miliseconds, it is thanks to indexing document technique with Lucene engine running under-the-hood.
  • Scalability: elasticsearch can be expanded by simply configuring new nodes when resource increase needed.
  • Integration : it is compatible with elastic stacks (beats: metric, file, heart, etc. ) and others (Fluentd, grafana, etc.) which support many purposes to monitor multiple system and services.

Installation

In the meanwhile, the latest ElasticSearch is in version 7.x with numerous breaking change and not 100% nicely work with our current services, then ES 6.8.5 is selected. The installation is documented in detailed from Official Website. Since our server is running CentOS 7, the RPM installation method is used.

1
2
3
4
5
6
yum install -y java-1.8.0-openjdk-devel

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.5.rpm
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.5.rpm.sha512
shasum -a 512 -c elasticsearch-6.8.5.rpm.sha512
sudo rpm --install elasticsearch-6.8.5.rpm

Following the recommend steps to get the ES started or stopped.

1
2
3
sudo chkconfig --add elasticsearch
sudo -i service elasticsearch start
sudo -i service elasticsearch stop

Configuration

System scope

This is where you can configure location of elasticsearch or java installation, keep it as default. In version 7.x, saperation installation of Java is not required.

1
nano /etc/sysconfig/elasticsearch

Application scope

elasticsearch.yml

Within elasticsearch.yml file, generic information for targeted cluster, node, data storage path, etc. can be customized and override the default settings. In this example, i did change the cluster, node name and elastic data location.

1
nano etc/elasticsearch/elasticsearch.yml

jvm.options

This is where the JVM resource configuration dedicated for your elasticsearch application. Initial and maximum heap are what should be noticed.

Best practice is to allocate haft of your available system memory to achieve the optimal performance with Elasticsearch. Our CentOS has 3GB RAM free when running along with other applications, then the heap size is recommended with 1.5GB. However, for experiment purpose with few small logs, I reserved only 1GB as below. Check your RAM available with command free -m

1
nano /etc/elasticsearch/jvm.options

Change elastic data location

In some cases, there is a need of store elastic data in separate drive for better management, below steps will help. For instance, I want to store data in /elasticdata/.

1
mkdir /elasticdata/

Check user/group running elasticsearch application:

1
cat /etc/passwd

Change owner of elasticdata location to elasticsearch application user

1
chown elasticsearch.elasticsearch /elasticdata/

Restart elasticsearch service, you can check the RAM status and running services after that.

line-numbers
1
2
3
service elasticsearch restart
free -m
ps -eaf|grep elastic

Log exploration

The log will be generated in /var/log/elasticsearch/

1
ls -latr /var/log/elasticsearch/

Explore the log:

1
cat tail /var/log/elasticsearch/elastic01.log

Quick tour with Elasticsearch

Physical data

You can check the data in our node which is currently node 0. It is mostly nothing as there are no indices avaliable.

1
ls /elasticdata/nodes/0/

Curl command

Now you can verify if the ElasticSearch is working by simple queries as below to check the cluster health and statistic:

1
curl -XGET 'localhost:9200/_cluster/health?pretty'

1
curl -XGET 'localhost:9200/_cluster/stats?human&pretty&pretty'

Python ingestion with requests library

At first, you can dump some faking data with simple python2 script as below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from datetime import timedelta, date, datetime
import random
import string
import requests
import json

f = requests.Session()
headers = {'content-type':'application/json'}
start_date = datetime(2019, 11, 25)
for x in range(60*48):
print x, '', '',
single_date = start_date + timedelta(minutes=x)
dt = single_date.strftime("%Y-%m-%d")
url = "http://localhost:9200/{}-fake/fake".format(dt)
doc = { "user" : "fauie.com",
"post_date" : single_date.isoformat(),
"message" : ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(12)) ,
"x": x,
"cpu": random.randint(0,100),
"io": random.randint(0,100)
}
resp = f.post(url, data=json.dumps(doc),headers=headers,verify=False)
print resp.text

Python API

Once get familiar, take the advantage of available python api client (see below link) to make your job done in a timely manner.
https://elasticsearch-py.readthedocs.io/en/master/

## References * [Mastering ElasticSearch 6.x and the Elastic Stack](https://www.packtpub.com/web-development/mastering-elasticsearch-6x-and-elastic-stack-video)
Grafana, the best visualization tool for monitoring? Setup LDAP for Apache Nifi

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×