#spark

Spark & Kafka docker-compose

Sep 20 2020 Process>Spark

To quickly launch spark and kafka for local development, docker would be the top choice as of its flexibily and isolated environment. This save lot of time for manually installing bunch of packages as well as conflicting issues.

Setup Spark local development

Aug 13 2020 Process>Spark

This post provides a general setup to start with Spark development on local computer. This is helpful for getting started, experimenting the Spark functionalities or even run a small project.

Spark hierarchy

Jul 12 2020 Process>Spark

Understand Spark hierarchy in term of hardware and software design will help you better in develop an optimized Spark application.

Spark fundamental

Jun 6 2020 Process>Spark

Spark is an unified engine designed for large scale distributed data processing and machine learning on compute clusters, whether running on-premise or cloud. It replaces Hadoop MapReduce with its in-memory storage for intermediate computations, making it much faster (100x) than Hadoop MapReduce.

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now