Building of an end-to-end data pipeline to ingest, process and store high speed, Uber trip data.
For this session we will explore the power of streaming real time events in the context of the IoT and connected cars.
We will look at a solution combines real-time data streams with iterative machine learning to predict and visualize popular Uber trip locations in New York City. Ingestion of the real time data (location, date,time) , analyzing it to provide location clusters, as well as providing real time dashboards will all be covered. You will see the end-to-end process required to build this application using Apache APIs for Kafka, Spark, HBase and other technologies.
According to Gartner, by 2020, smart cities will be using about 1.39 billion connected cars, IoT sensors and devices. The analysis of behavior patterns within cities will allow optimization of traffic, better planning decisions, and smarter advertising. You may be excited about the possibilities of exploiting data streams to gain actionable insights from continuously produced data in real-time but you may find it difficult to conceptualize how to implement such a solution and how this can fit into your business. In this presentation, we will walk you through an architecture that combines data streaming with machine learning to enhance a Uber service with an ability to analyze, predict and visualize the most popular taxi pick-up/drop-off locations by date and time so that drivers' locations can be optimized.
• Part 1 Spark machine learning
• Part 2 Kafka and Spark Streaming
• Part 3 Real time dashboard using Vert.x
• Part 4 Spark Streaming, Dataframes and HBase