#Hadoop

Churn rate prediction on Sparkify service

Feb 21 2021 Projects>Data Science

Spakify is a music streaming sevice as similar to Spotify. Every users’ activities on Sparkify application are logged and sent to Kafka cluster. To improve the business, the data team will collect data to a Big Data Platform for further processing, analysing and extracting insights info for respective actions. One of the focusing topic is churn user prediction.

Compact multiple small files on HDFS

Dec 5 2020 Store>Hadoop

Hadoop can handle with very big file size, but will encounter performance issue with too many files with small size. The reason is explained in detailed from here. In short, every single on a data node needs 150 bytes RAM on name node. The more files count, the more memory required and consequencely impacting to whole Hadoop cluster performance.

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now