Azharuddin

How to Perfect Lambda Architecture with Oracle Data Integrator (and Kafka / MapR-ES)

Discussion created by Azharuddin on Aug 2, 2017

"Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods. This approach to architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data. The two view outputs may be joined before presentation. The rise of lambda architecture is correlated with the growth of big data, real-time analytics, and the drive to mitigate the latencies of map-reduce."

In this blog, I'm going to show you how to configure MapR Streams (aka Kafka) on Oracle Data Integrator with Spark Streaming to create a true lambda architecture: a fast layer complementing the batch and serving layer.

I will skip the "hailing and praising" part for Oracle ODI Training in this post, but I only want to highlight one point: the mappings designed for this blog, just like every other mapping you would design, since the very first release of ODI, are going to run with native code on your Hadoop/Spark cluster, 100%, out of the box, with you coding zero line or worry about how and where.

 

I've done this on MapR so I can do a "two birds one stone"; showing you MapR Streams steps and Kafka. Since both aren't so much different in concept, or API implementation, you can easily apply the same steps if you are using Kafka.

If you are unfamiliar with MapR Streams and/or Kafka concepts, I suggest that you spend some time reading about them. The following content assumes that you know what MapR Streams and Kafka are (and of course, ODI). Otherwise, you'll still get a great idea of the possible capabilities.

 

Obviously, we need to have MapR Streams paths and topics created. Unlike Kafka, MapR uses its own APIs via the "maprcli" command line utility to create and define topics. Hence, this step would be slightly different if you are using commodity Kafka. The web has plenty of examples on how to create and configure Kafka topics and server, so you aren't alone.

For the sake of this demo, I've created one path and two topics under that path. We'll let ODI consume from one of those topics (registrations) and produce to another (registrations2). That way, you'll see how that works in action via ODI.

 

Source: datafloq

Outcomes