Big Data Stack
Introduction
Below is the list of softwares in my main stack to work with Big Data
![]() |
Apache Hadoop http://hadoop.apache.org | Reliable, scalable, distributed computing and storage |
![]() |
Apache Airflow https://airflow.apache.org | The scheduler to handle all job trigger |
![]() |
Apache Spark https://spark.apache.org/ | The high performance batch processing |
Apache Flink https://flink.apache.org | Stateful Computations over Data Streams. | |
![]() |
Apache HBase https://hbase.apache.org | NoSQL database |
![]() |
Apache Cassandra http://cassandra.apache.org | Manage massive amounts of data, fast, without losing sleep |
![]() |
Apache Kafka https://kafka.apache.org | Distributed streaming platform |
![]() |
Apache Hive https://hive.apache.org | Data warehouse software |
![]() |
PrestoDb http://prestodb.github.io | Distributed SQL Query Engine for Big Data |
![]() |
Apache Superset https://superset.incubator.apache.org | Modern, enterprise-ready business intelligence web application |
![]() |
Alluxio https://www.alluxio.org | Memory Speed Virtual Distributed Storage |
![]() |
Druid http://druid.io | High performance real-time analytics database. |
Detail Usage
Apache Airflow
Apache Spark
Apache Flink
I use it for my R&D streaming processing projects. Actually because of the requirement of bussiness, I only process data in batch. All Flink projects are R&D projects to familiar with Data Streaming.