Apart from these what other components can fit as per the domain as mentioned in the diagram?
Great question, Sai Attaluri!! The Hadoop ecosystem can include many components, including many of the open source projects from the Apache Software Foundation. The Hadoop ecosystem is always growing and changing, so it is nearly impossible to keep an up-to-date comprehensive list.
I heard about other Hadoop components such as Apache Cassandra, Apachi Ambari, Apache Kafka. Where do these components play their role in MapR?
Sai Attaluri because MapR uses Hadoop APIs and is POSIX compliant, you can use any of these other tools (and many others) on MapR if you like! MapR also has its own message bus, MapR Streams, which uses the Kafka API. Apache Cassandra, Apache Drill and Apache Hive are all SQL-on-Hadoop options.
I hope this answers your question!
Another ecosystem table (from http://slideplayer.com/slide/4706189/ ). Diffcult to find one with all Hadoop framework linked.
If looking for stream aspect another one comparing Kafka Stream, Flink, Storm, Samza, Apex
I think impala belongs to Cloudera. I am I correct?
Impala is an Apache project (incubating status), open to contributors.
what is the difference between Batch Spark and Spark Streaming. This question an observation from the above diagram.
Spark Streaming works in real-time (or close to it). It is resource intensive, but faster. Batch Spark works as a batch job (you might run it overnight, while your computers are getting less use, and wait for results).
Spark Framework was meant for only real time processing. For the first time I am hearing spark using for offline processing. Did MapR customized Spark for Batch processing?
Retrieving data ...