I came across this series of blogs by carol mcdonald discussing the architecture of an end-to-end application that combines streaming data with machine learning to do real-time analysis and visualization of where and when Uber cars are clustered, so as to predict and visualize the most popular Uber locations.
As these blogs were originally published on different dates, I find it convenient to list them in one document and have a discussion around it.
- Part 1: End to End Application for Monitoring Real-Time Uber Data Using Apache APIs: Kafka, Spark, HBase – Part 1: Spark Machin… Published on November 28th 2016. Get an introduction to using Apache Spark’s machine learning K-means algorithm in order to cluster Uber data based on location. Source code: GitHub - caroljmcdonald/spark-ml-kmeans-uber
- Part 2: End to End Application for Monitoring Real-Time Uber Data Using Apache APIs: Kafka, Spark, HBase – Part 2: Kafka and Sp… Published January 5th 2017. Learn how to use a Spark ML model in a Spark Streaming application and how to integrate Spark Streaming with MapR Streams to consume and produce messages with Kafka API. Source Code: GitHub - caroljmcdonald/mapr-sparkml-streaming-uber
- Part 3: End To End Application For Monitoring Real-Time Uber Data Using Apache Apis: Kafka, Spark, Hbase – Part 3: Real-Time Dashboard Using Vert.X Published May 4th 2017. Building a real–time dashboard to visualize the cluster data on a Google map. Source code: GitHub - caroljmcdonald/mapr-sparkstreaming-vertx-uberheatmap
- Part 4: End to End Application for Monitoring Real-Time Uber Data Using Apache APIs: Kafka, Spark, HBase – Part 4: Spark Streami… Published June 9th 2017. Go over Spark Streaming writing to MapR-DB using the Spark HBase and MapR-DB Binary connector and reading from MapR-DB Binary using Spark SQL and DataFrames. Source code: GitHub - caroljmcdonald/mapr-sparkml-streaming-uber
Please share your experiences using the complete code, data and instructions to run this application on MapR Sandbox 5.2 (including MapR-ES and Spark 2.1) and ways to improve this app.