Setup Spark local development

Aug 13 2020 Data

This post provides a general setup to start with Spark development on local computer. This is helpful for getting started, experimenting the Spark functionalities or even run a small project.

Spark fundamental

Jun 6 2020 Data

Spark is an unified engine designed for large scale distributed data processing and machine learning on compute clusters, whether running on-premise or cloud. It replaces Hadoop MapReduce with its in-memory storage for intermediate computations, making it much faster (100x) than Hadoop MapReduce.

Kafka fundamental

Mar 20 2020 Data

Apache Kafka is a distributed streaming platform. It is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault tolerant, wicked fast, and runs in production in thousands of companies.

Code-server, the VSCode for cloud

Dec 31 2019 DevOps

Code-Server

Along with the widely cloud adoption, integrated development environment (IDE) on browser is a need to boot developers’ productivity. People can collaboratively view, edit and commit on any devices with internet accessed browser. Additionally, you’re no longer worry about setting up your local development config. You can consider Cloud9 (AWS) or paid service like codeanywhere.

Quick tour with Elasticsearch 6.x

Nov 26 2019 Data

When do researching to choose a good data storage technique for log collection, searching and analytic; I found elasticsearch is a ideal choice because of following reasons:

Performance: fast query with million records within miliseconds, it is thanks to indexing document technique with Lucene engine running under-the-hood.
Scalability: elasticsearch can be expanded by simply configuring new nodes when resource increase needed.
Integration : it is compatible with elastic stacks (beats: metric, file, heart, etc. ) and others (Fluentd, grafana, etc.) which support many purposes to monitor multiple system and services.

#tutorial

Setup Spark local development

Spark fundamental

Kafka fundamental

Code-server, the VSCode for cloud

Quick tour with Elasticsearch 6.x

Your browser is out-of-date!