#hadoop

Hadoop can handle with very big file size, but will encounter performance issue with too many files with small size. The reason is explained in detailed from here. In short, every single on a data node needs 150 bytes RAM on name node. The more files count, the more memory required and consequencely impacting to whole Hadoop cluster performance.

Read More

The Kerberos is an authentication protocol which creates tickets to allow communication between nodes on non-secured network. Ticket must be periodically triggered by kinit command by each user. In Kerberos we call users as principals. We can divided principals basically into several groups:

  • System users – principals for communication between services in Hadoop cluster
  • Common users

Read More

YARN stands for Yet Another Resource Negotiator. It was introduced in Hadoop version 2 to extend other data processing framework to not only Map Reduce such as Spark, Storm, etc.

Read More

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×