Announcing MapR Ecosystem Pack (MEP) 2.0!
We’re pleased to announce the general release of the MapR Ecosystem Pack (MEP) version 2.0. This represents the second major release of a MapR Ecosystem Pack since the beginning of this new process of delivering ecosystem upgrades.
If you’re new to this process, MapR Ecosystem Packs are a way to deliver ecosystem upgrades decoupled from core upgrades - allowing you to upgrade your tooling independently of your MapR Converged Data Platform.
For more information about the MEP process, please see our post on the MapR Ecosystem Packs Process here:
MapR Ecosystem Packs Process
MEP 2.0 contains a series of important upgrades and new features:
Spark 2.0.1 GA
- Runs on the same engine as SparkSQL.
- Allows access to data from a variety of different data sources.
- Can run database-like operations or allow for passing in custom code.
Spark as a Compiler:
- Whole-stage code generation is provided by the second-generation Tungsten engine.
- Eliminates the need for multiple JVM calls by flattening SQL queries into one single function evaluated as bytecode at runtime.
Note that the exciting Structured Streaming feature, which provides a tabular view into streaming data, is still an alpha release by the community, and thus the APIs are still experimental. Stay tuned for updates on this feature as it becomes GA and subsequently supported in the MapR Platform.
The key highlights of this release include:
- Enhanced Parquet Performance - Improved query performance for I/O intensive analytic queries using an optimized Parquet reader, as well as significant performance boosts for targeted queries by reducing I/O via Parquet filter pushdown and Limit operator pushdown. These techniques complement the variety of other Drill optimizations, including partition pruning and metadata caching to further enhance the performance.
- Flexible and Dynamic UDFs - Enables data scientists, analysts, and developers to develop and deploy custom Drill SQL functions (UDFs) in a self-service fashion without having to restart Drill services in the cluster or require IT involvement. This feature is greatly useful in large, multi-tenant organizations where restarting Drill services is disruptive to users. The feature also empowers users to get fast value from data using Apache Drill
- Seamless BI tool integration - In this release, Drill introduces a variety of SQL improvements to enable optimal BI tool integration. This includes support for a variety of join syntax generated from Tableau and other BI tools, as well as improvements to the number of the queries generated for metadata from the BI tools. These enhancements improve the overall interactive user experience.
Hue 3.10 has provided the following improvements:
- Oozie improvements
- External Workflow Graph
- Single Action Execution
- New Ability: Dryrun Oozie job
- New SQL Query Editor works over JDBC
- Look for an upcoming Community post on how to use this with Apache Drill!
- Directory and file-based document management
- Users can create their own directories and subdirectories and drag and drop documents within the simple file browser interface
MapR Installer Stanzas
MapR Installer Stanzas enable API-driven installation. These provide the ability to build a configuration file called a “stanza” which contains layout and settings for a cluster installation that can be passed programmatically to the installer.
Kafka Connect for MapR Streams
Kafka Connect for MapR Streams is a new way to easily connect common data systems with Kafka by providing prebuilt connectors for legacy and modern data stores.
Kafka REST Proxy for MapR Streams
Kafka REST Proxy for MapR Streams provides the ability for any device that can communicate using HTTP to easily publish/subscribe to Kafka topics.
MapR Teradata Connector (powered by Teradata Connector for Hadoop)
In partnership with Teradata, we're introducing the Teradata Connector for MapR, a MapR implementation of the Teradata Connector for Hadoop (TDCH). This is a Sqoop wrapper, built into MapR Sqoop, that facilitates bulk data transfer between Hadoop and external data storage.
All Components (* denotes re-release)
The following is a list of components included in the MEP 1.0 release, supported for MapR 5.2.
|MEP 2.0 Contents||Release Notes||Documentation|
|Apache Drill 1.9||Release Notes||Documentation|
|Apache Hive 1.2.1*||Release Notes||Documentation|
|Apache Flume 1.6||Release Notes||Documentation|
|Apache HBase 1.1.1||Release Notes||Documentation|
|AsyncHBase 1.7||Release Notes||Documentation|
|Apache Mahout 0.12.0*||Release Notes||Documentation|
|Apache Myriad 0.1.0||Release Notes||Documentation|
|Apache Oozie 4.2.0*||Release Notes||Documentation|
Apache Pig 0.16
|Apache Sentry 1.6||Release Notes||Documentation|
|Apache Spark 2.0.1||Release Notes||Documentation|
|Apache Sqoop 1.4.6*||Release Notes||Documentation|
|Apache Sqoop2 1.99.7||Release Notes||Documentation|
|Apache Storm 0.10.0*||Release Notes||Documentation|
|HttpFS 1.0||Release Notes||Documentation|
|Hue 3.10||Release Notes||Documentation|
|Impala 2.5||Release Notes||Documentation|
|Kafka Connect for MapR Streams||Release Notes||Documentation|
|Kafka REST Proxy for MapR Streams||Release Notes||Documentation|
|MapR Installer Stanzas||Release Notes||Documentation|
|MapR Teradata Connector (Powered by TDCH)||Release Notes (Sqoop)||Documentation|
- MapR Ecosystem Packs (MEP) Overview
- MapR Ecosystem Packs (MEP) Matrix
- Upgrading MapR Ecosystem Packs
- Apache Hadoop & Related Components
Have a Question?
Ask in Answers or comment below.