Big Data Stack

Introduction

Below is the list of softwares in my main stack to work with Big Data

     
Apache Hadoop http://hadoop.apache.org Reliable, scalable, distributed computing and storage
Apache Airflow https://airflow.apache.org The scheduler to handle all job trigger
Apache Spark https://spark.apache.org/ The high performance batch processing
Apache Flink https://flink.apache.org Stateful Computations over Data Streams.
Apache HBase https://hbase.apache.org NoSQL database
Apache Cassandra http://cassandra.apache.org Manage massive amounts of data, fast, without losing sleep
Apache Kafka https://kafka.apache.org Distributed streaming platform
Apache Hive https://hive.apache.org Data warehouse software
PrestoDb http://prestodb.github.io Distributed SQL Query Engine for Big Data
Apache Superset https://superset.incubator.apache.org Modern, enterprise-ready business intelligence web application
Alluxio https://www.alluxio.org Memory Speed Virtual Distributed Storage
Druid http://druid.io High performance real-time analytics database.

Detail Usage

Apache Airflow

Apache Spark

I use it for my R&D streaming processing projects. Actually because of the requirement of bussiness, I only process data in batch. All Flink projects are R&D projects to familiar with Data Streaming.