What is Apache Spark?
Apache Spark is a general-purpose engine for large-scale data processing. Spark supports rapid application development for big data and allows for code reuse across batch, interactive and streaming applications. Spark also provides advanced execution graphs with in-memory pipelining to speed up end-to-end application performance.
Key Benefits of Running Spark on MapR
- Analytics on Consistent Data: The MapR Converged Data Platform enables data scientists to perform analytics on consistent data in both development and production environments through features such as mirroring and consistent snapshots.
- Secure Multi-Tenant Applications: The MapR Converged Data Platform enables development of reliable and secure multi-tenant applications.
- Run Streaming and NoSQL Workloads together: The MapR Converged Data Platform enables development of streaming and NoSQL applications on a single cluster. By using Spark Streaming, MapR-ES and MapR-DB together real-time operational applications can be developed that allow for data ingestion at high speeds and development of real-time dashboards.
- Faster Batch Applications: You can now develop and deploy batch applications that run 10-100x faster in production environments with in-memory processing of data.
Complex ETL Data Pipelines: You can leverage the complete Spark stack to build complex ETL pipelines that can merge streaming, machine learning and sql operations all in one program.
Advanced Analytics: You can leverage MlLib and GraphX to develop applications that combine the power of machine learning with graph technology. This can enable faster application development and enable data scientists to test new hypothesis faster.