Rachel Silver

Announcing: MEP 3.0 Released

Blog Post created by Rachel Silver Employee on Apr 10, 2017

Announcing MapR Ecosystem Pack (MEP) 3.0!

Date: 4/10/2017


We’re pleased to announce the general release of the MapR Ecosystem Pack (MEP) version 3.0.


As you know, MapR Ecosystem Packs are a way to deliver ecosystem upgrades decoupled from core platform upgrades, allowing customers to upgrade their tooling independently of their MapR Converged Data Platform.


MEP 3.0 focuses on making Spark truly enterprise-ready with a series of stability and security fixes and improving the speed of ETL and batch processing with a faster version of Hive. New features and upgrades:


Key Upgrades


Apache Spark 2.1.0
Spark 2.1 in MapR focuses on improvements in enterprise-ready stability and security, including:

  • More than 1200 fixes on the Spark 2.X line
  • MapR-SASL support for encrypted Thrift server connection
  • Scalable partition handling
  • Stable data type APIs


Apache Hive 2.1.1
As part of our MEP 3.0 release, we're providing a faster version of Hive, which will significantly improve the speed for data processing tasks, provide smaller latency for interactive queries, and increase throughput for batch queries.
Some key improvements include:

  • 2X Faster ETL through an enhanced cost-based optimizer (CBO), faster type conversions, and dynamic partition pruning
  • New HiveServer UI with new diagnostics and monitoring tools
  • Dynamically partitioned hash joins, which provide unsorted inputs in order to eliminate the sorting step
  • Vectorized query execution that greatly reduces the CPU usage for typical query operations, like scans, filters, aggregates, and joins


Apache Drill 1.10

Continuing with the iterative releases, Drill 1.10 is yet another important milestone for Apache Drill: numerous enhancements have been done in this release around BI tool integration, end-to-end security, performance, and usability. Some highlights of this release, which contains ~110 bug fixes and improvements:

  • Tableau native connectivity
  • Support for Kerberos and MapR-SASL authentication between the client and Drillbit
  • Support for the CREATE TEMPORARY TABLE AS (CTTAS) command
  • Ability to query data with Hue 3.12 (experimental only)
  • Improved compatibility with Hive/Spark generated Parquet files



New Features & Additions


Native Spark Connector for MapR-DB JSON
This Native Spark Connector for MapR-DB JSON is a new API that makes it easier to build real-time or batch pipelines between your data and MapR-DB and to leverage Spark or Spark Streaming within the pipeline. Compared to other connectors for MapR-DB – such as the JDBC connector  the Native Spark Connector is more efficient, and the code is simpler to write. It includes:

  • Two new APIs that allow you to load data from a MapR-DB JSON table to a Spark RDD or save a Spark RDD to a MapR-DB JSON table
  • A custom partitioner that allows you to partition data for better performance.
  • Data locality: When the connector reads data from MapR-DB, it uses the data locality feature of MapR-DB to spawn the Spark executors


Batch Data Transformation with MapR-DB as a Source and Destination for Spark



Spark HBase and MapR-DB Binary Connector
The new Spark HBase and MapR-DB Binary Connector provides the ability to write applications that consume binary tables from HBase and MapR-DB and use them in Spark. New features:

  • It allows writing directly to HBase HFiles for bulk insertion into HBase
  • Spark SQL can draw on tables that are represented in HBase



MapR-SASL support for encrypted Thrift server connection

In MEP 3.0, MapR introduces enhanced security for Spark with the Spark SQL Thrift JDBC/ODBC (Spark Thrift) server for MapR Spark 2.1. It includes the following:

  • Secure connections using MapR-SASL in addition to Kerberos for inbound client connections to the Spark Thrift server
  • Spark connections to Hive Metastore
  • Support for impersonation on SELECT statements


For more information about this, please see the Spark documentation here.



MapR Streams C Applications
With MapR core Release 5.2.1, you can develop C applications for MapR Streams. The MapR Streams C Client is a distribution of librdkafka that integrates with MapR Streams.


MapR Streams Python Applications
With MapR core Release 5.2.1, you can create Python applications for MapR Streams using the MapR Streams Python client. The Streams Python client is a binding for librdkafka and contains support for high-level consumers.


All Components (*denotes re-release)
The following is a list of components included in the MEP 3.0 release, supported for MapR 5.2.X.



MEP 3.0 Contents
Release NotesDocumentation
Apache Drill 1.10Release NotesDocumentation
Apache Hive 2.1.1Release NotesDocumentation
Apache Flume 1.7Release NotesDocumentation
Apache HBase 1.1.8Release NotesDocumentation
AsyncHBase 1.7Release NotesDocumentation
Apache Mahout 0.12.0Release NotesDocumentation
Apache Myriad 0.1.0Release NotesDocumentation
Apache Oozie 4.3.0Release NotesDocumentation

Apache Pig 0.16*

Release NotesDocumentation
Apache Sentry 1.7Release NotesDocumentation
Apache Spark 2.1.0Release NotesDocumentation
Apache Sqoop 1.4.6Release NotesDocumentation
Apache Sqoop2 1.99.7Release NotesDocumentation
Apache Storm 0.10.0*Release NotesDocumentation
HttpFS 1.0*Release NotesDocumentation
Hue 3.12Release NotesDocumentation
Impala 2.7Release NotesDocumentation
Kafka Connect for MapR StreamsRelease NotesDocumentation
Kafka REST Proxy for MapR StreamsRelease NotesDocumentation
MapR Installer StanzasRelease NotesDocumentation
MapR Teradata Connector (Powered by TDCH)Release Notes (Sqoop)Documentation
Native Spark Connector for MapR-DB JSONRelease NotesDocumentation
Spark HBase and MapR-DB Binary ConnectorRelease NotesDocumentation
MapR Streams C ApplicationsRelease NotesDocumentation
MapR Streams Python ApplicationsRelease NotesDocumentation


MEP 3.0.0:

Index of /releases/MEP/MEP-3.0 

UI Installer:

Index of /releases/installer 




Related Resources

Have a Question?

Ask in the comments below.