We’re pleased to announce the general release of the MapR Ecosystem Pack (MEP) version 3.0.
As you know, MapR Ecosystem Packs are a way to deliver ecosystem upgrades decoupled from core platform upgrades, allowing customers to upgrade their tooling independently of their MapR Converged Data Platform.
MEP 3.0 focuses on making Spark truly enterprise-ready with a series of stability and security fixes and improving the speed of ETL and batch processing with a faster version of Hive. New features and upgrades:
Apache Spark 2.1.0
Spark 2.1 in MapR focuses on improvements in enterprise-ready stability and security, including:
- More than 1200 fixes on the Spark 2.X line
- MapR-SASL support for encrypted Thrift server connection
- Scalable partition handling
- Stable data type APIs
Apache Hive 2.1.1
As part of our MEP 3.0 release, we're providing a faster version of Hive, which will significantly improve the speed for data processing tasks, provide smaller latency for interactive queries, and increase throughput for batch queries.
Some key improvements include:
- 2X Faster ETL through an enhanced cost-based optimizer (CBO), faster type conversions, and dynamic partition pruning
- New HiveServer UI with new diagnostics and monitoring tools
- Dynamically partitioned hash joins, which provide unsorted inputs in order to eliminate the sorting step
- Vectorized query execution that greatly reduces the CPU usage for typical query operations, like scans, filters, aggregates, and joins
Apache Drill 1.10
Continuing with the iterative releases, Drill 1.10 is yet another important milestone for Apache Drill: numerous enhancements have been done in this release around BI tool integration, end-to-end security, performance, and usability. Some highlights of this release, which contains ~110 bug fixes and improvements:
- Tableau native connectivity
- Support for Kerberos and MapR-SASL authentication between the client and Drillbit
- Support for the CREATE TEMPORARY TABLE AS (CTTAS) command
- Ability to query data with Hue 3.12 (experimental only)
- Improved compatibility with Hive/Spark generated Parquet files
New Features & Additions
Native Spark Connector for MapR-DB JSON
This Native Spark Connector for MapR-DB JSON is a new API that makes it easier to build real-time or batch pipelines between your data and MapR-DB and to leverage Spark or Spark Streaming within the pipeline. Compared to other connectors for MapR-DB – such as the JDBC connector – the Native Spark Connector is more efficient, and the code is simpler to write. It includes:
- Two new APIs that allow you to load data from a MapR-DB JSON table to a Spark RDD or save a Spark RDD to a MapR-DB JSON table
- A custom partitioner that allows you to partition data for better performance.
- Data locality: When the connector reads data from MapR-DB, it uses the data locality feature of MapR-DB to spawn the Spark executors
Batch Data Transformation with MapR-DB as a Source and Destination for Spark
Spark HBase and MapR-DB Binary Connector
The new Spark HBase and MapR-DB Binary Connector provides the ability to write applications that consume binary tables from HBase and MapR-DB and use them in Spark. New features:
- It allows writing directly to HBase HFiles for bulk insertion into HBase
- Spark SQL can draw on tables that are represented in HBase
MapR-SASL support for encrypted Thrift server connection
In MEP 3.0, MapR introduces enhanced security for Spark with the Spark SQL Thrift JDBC/ODBC (Spark Thrift) server for MapR Spark 2.1. It includes the following:
- Secure connections using MapR-SASL in addition to Kerberos for inbound client connections to the Spark Thrift server
- Spark connections to Hive Metastore
- Support for impersonation on SELECT statements
For more information about this, please see the Spark documentation here.
MapR Streams C Applications
With MapR core Release 5.2.1, you can develop C applications for MapR Streams. The MapR Streams C Client is a distribution of librdkafka that integrates with MapR Streams.
MapR Streams Python Applications
With MapR core Release 5.2.1, you can create Python applications for MapR Streams using the MapR Streams Python client. The Streams Python client is a binding for librdkafka and contains support for high-level consumers.
All Components (*denotes re-release)
The following is a list of components included in the MEP 3.0 release, supported for MapR 5.2.X.
- MapR Ecosystem Packs (MEP) Overview
- MapR Ecosystem Packs (MEP) Matrix
- Upgrading MapR Ecosystem Packs
- Apache Hadoop & Related Components
Have a Question?
Ask in the comments below.