Compact multiple small files on HDFS

Dec 5 2020 Store>Hadoop

Hadoop can handle with very big file size, but will encounter performance issue with too many files with small size. The reason is explained in detailed from here. In short, every single on a data node needs 150 bytes RAM on name node. The more files count, the more memory required and consequencely impacting to whole Hadoop cluster performance.

Quick tour with Elasticsearch 6.x

Nov 26 2019 Store

When do researching to choose a good data storage technique for log collection, searching and analytic; I found elasticsearch is a ideal choice because of following reasons:

Performance: fast query with million records within miliseconds, it is thanks to indexing document technique with Lucene engine running under-the-hood.
Scalability: elasticsearch can be expanded by simply configuring new nodes when resource increase needed.
Integration : it is compatible with elastic stacks (beats: metric, file, heart, etc. ) and others (Fluentd, grafana, etc.) which support many purposes to monitor multiple system and services.

Store

Compact multiple small files on HDFS

Quick tour with Elasticsearch 6.x

Your browser is out-of-date!