This post provides a general setup to start with Spark development on local computer. This is helpful for getting started, experimenting the Spark functionalities or even run a small project.

Read More

Understand Spark hierarchy in term of hardware and software design will help you better in develop an optimized Spark application.

Read More

Apart from core APIs which requires external systems to install Kafka client for the integration, Kafka also supports Kafka Connect API with REST API for the more flexibility in communicating with different systems. This note is to collect list of basic APIs on frequent useage.

Read More

Once working with Kafka, I often need to quickly interact with Kafka cluster via command line. This post is my collection of frequent commands used in daily work with projects having Kafka integrated.

Read More

Spark is an unified engine designed for large scale distributed data processing and machine learning on compute clusters, whether running on-premise or cloud. It replaces Hadoop MapReduce with its in-memory storage for intermediate computations, making it much faster (100x) than Hadoop MapReduce.

Read More

The Kerberos is an authentication protocol which creates tickets to allow communication between nodes on non-secured network. Ticket must be periodically triggered by kinit command by each user. In Kerberos we call users as principals. We can divided principals basically into several groups:

  • System users – principals for communication between services in Hadoop cluster
  • Common users

Read More

YARN stands for Yet Another Resource Negotiator. It was introduced in Hadoop version 2 to extend other data processing framework to not only Map Reduce such as Spark, Storm, etc.

Read More

Apache Kafka is a distributed streaming platform. It is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault tolerant, wicked fast, and runs in production in thousands of companies.

Read More

Code-Server

Along with the widely cloud adoption, integrated development environment (IDE) on browser is a need to boot developers’ productivity. People can collaboratively view, edit and commit on any devices with internet accessed browser. Additionally, you’re no longer worry about setting up your local development config. You can consider Cloud9 (AWS) or paid service like codeanywhere.

Read More

kibana

Kibana is part of ELK stack to visualize data from elasticsearch. Further than that, Kibana is equipped with many features and plug-ins such as elastic nodes & infrastructure monitoring, user roles or life cycle management and query experiment elasticsearch database.

Spend sometime with the demo Kibana page to feel it. Click Here.

Read More

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×