Kafka

Spark is not only a powerful regarding data processing in batch but also in streaming. From version 2.x, Spark provides a new stream processing paradism called structure streaming based on Spark SQL library. This helps developer work with stream process easier compared to DStream API in earlier version. This post will walk through the basic understanding to get started with Spark Structure Streaming, and cover the setting to work with the most common streaming technology, Kafka.

Read More

Apart from core APIs which requires external systems to install Kafka client for the integration, Kafka also supports Kafka Connect API with REST API for the more flexibility in communicating with different systems. This note is to collect list of basic APIs on frequent useage.

Read More

Once working with Kafka, I often need to quickly interact with Kafka cluster via command line. This post is my collection of frequent commands used in daily work with projects having Kafka integrated.

Read More

Apache Kafka is a distributed streaming platform. It is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault tolerant, wicked fast, and runs in production in thousands of companies.

Read More

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×