When do researching to choose a good data storage technique for log collection, searching and analytic; I found elasticsearch is a ideal choice because of following reasons:
- Performance: fast query with million records within miliseconds, it is thanks to indexing document technique with Lucene engine running under-the-hood.
- Scalability: elasticsearch can be expanded by simply configuring new nodes when resource increase needed.
- Integration : it is compatible with elastic stacks (beats: metric, file, heart, etc. ) and others (Fluentd, grafana, etc.) which support many purposes to monitor multiple system and services.
In the meanwhile, the latest ElasticSearch is in version 7.x with numerous breaking change and not 100% nicely work with our current services, then ES 6.8.5 is selected. The installation is documented in detailed from Official Website. Since our server is running CentOS 7, the RPM installation method is used.
yum install -y java-1.8.0-openjdk-devel
Following the recommend steps to get the ES started or stopped.
sudo chkconfig --add elasticsearch
This is where you can configure location of elasticsearch or java installation, keep it as default. In version 7.x, saperation installation of Java is not required.
Within elasticsearch.yml file, generic information for targeted cluster, node, data storage path, etc. can be customized and override the default settings. In this example, i did change the cluster, node name and elastic data location.
This is where the JVM resource configuration dedicated for your elasticsearch application. Initial and maximum heap are what should be noticed.
Best practice is to allocate haft of your available system memory to achieve the optimal performance with Elasticsearch. Our CentOS has 3GB RAM free when running along with other applications, then the heap size is recommended with 1.5GB. However, for experiment purpose with few small logs, I reserved only 1GB as below. Check your RAM available with command
In some cases, there is a need of store elastic data in separate drive for better management, below steps will help. For instance, I want to store data in
Check user/group running elasticsearch application:
Change owner of elasticdata location to elasticsearch application user
chown elasticsearch.elasticsearch /elasticdata/
Restart elasticsearch service, you can check the RAM status and running services after that.
service elasticsearch restart
The log will be generated in
ls -latr /var/log/elasticsearch/
Explore the log:
cat tail /var/log/elasticsearch/elastic01.log
You can check the data in our node which is currently node 0. It is mostly nothing as there are no indices avaliable.
Now you can verify if the ElasticSearch is working by simple queries as below to check the cluster health and statistic:
curl -XGET 'localhost:9200/_cluster/health?pretty'
curl -XGET 'localhost:9200/_cluster/stats?human&pretty&pretty'
At first, you can dump some faking data with simple python2 script as below.
from datetime import timedelta, date, datetime
Once get familiar, take the advantage of available python api client (see below link) to make your job done in a timely manner.