Big Data Stack

Introduction

Below is the list of softwares in my main stack to work with Big Data


	Apache Hadoop http://hadoop.apache.org	Reliable, scalable, distributed computing and storage
	Apache Airflow https://airflow.apache.org	The scheduler to handle all job trigger
	Apache Spark https://spark.apache.org/	The high performance batch processing
	Apache Flink https://flink.apache.org	Stateful Computations over Data Streams.
	Apache HBase https://hbase.apache.org	NoSQL database
	Apache Cassandra http://cassandra.apache.org	Manage massive amounts of data, fast, without losing sleep
	Apache Kafka https://kafka.apache.org	Distributed streaming platform
	Apache Hive https://hive.apache.org	Data warehouse software
	PrestoDb http://prestodb.github.io	Distributed SQL Query Engine for Big Data
	Apache Superset https://superset.incubator.apache.org	Modern, enterprise-ready business intelligence web application
	Alluxio https://www.alluxio.org	Memory Speed Virtual Distributed Storage
	Druid http://druid.io	High performance real-time analytics database.

Detail Usage

Apache Airflow

Apache Spark

Apache Flink

I use it for my R&D streaming processing projects. Actually because of the requirement of bussiness, I only process data in batch. All Flink projects are R&D projects to familiar with Data Streaming.

duynguyenhoang.github.io