nrentachintala

Apache Drill 1.8 Released on MapR Converged Data Platform

Blog Post created by nrentachintala Employee on Sep 14, 2016

Today we are excited to announce the availability of Drill 1.8 on the MapR Converged Data Platform.

 

The current Drill 1.8 version is a production release on MapR and is another important milestone signifying Drill’s steady progress. Here are the key highlights of the release.

 

  • Integration with YARN (Available in MapR Platform only)

Starting with Drill 1.8, customers can deploy and manage Drill as a YARN application alongside other compute frameworks on the MapR cluster. This simplifies the deployment and management of Drill in customer environments involving large clusters.  It is important to note that Drill in this mode works as a long running service under YARN, and doesn’t spin up YARN containers for every single Drill query given the interactive SLAs required for Drill queries. This is a different model than MR/Spark batch jobs where every job execution is launched as a YARN application.

The features of Drill/YARN integration include a new client tool to launch Drill as a YARN application, a new Drill Application Master (AM) to coordinate with the YARN resource manager to get resources for the Drill service, CPU and memory controls on the Drill service, the ability to easily and add remove nodes from the Drill cluster, and the ability to launch multiple Drill clusters in a single MapR cluster. New web console features have been introduced to help manage Drill deployments under YARN. Please refer to the documentation to learn more.

 

  • Enhanced Query performance
    • Partition pruning enhancements to evaluate query filters at the leaf directory level rather than at files (Drill-4589). This will significantly help with planning performance for queries on large numbers of files.
    • Improved metadata cache performance
      • Metadata cache pruning for queries involving large number of partitions (Drill-4786)
      • Optimizations on reading the metadata cache for queries on a single partition (Drill-4530)

    • INFORMATION_SCHEMA query performance on Hive tables - This enhancement optimizes the calls made to the Hive metastore to retrieve metadata, thereby reducing overhead on query planning.

 

  • Monitoring via JMX & MapR Spyglass (Drill-4564)

A variety of Drill metrics are now made available via JMX to make monitoring of Drill production deployments easier.  Users are able to monitor these metrics via any JMX monitoring tool such as JConsole or the Drill web console. Additionally, Drill is now integrated with MapR Monitoring. With this feature, users can capture these metrics and build custom dashboards to observe trends on a variety of system and query metrics to easily manage the health of the Drill cluster and diagnose/troubleshoot issues. Sample JMX-based Drill metrics include drill.queries.running, drill.queries.completed, heap.used, direct.used, and waiting.count. For more information on Drill monitoring, refer to the documentation here and here

 

  • Additional enhancements

A variety of new SQL and usability features have been introduced as part of the 1.8 release. These include:

    • HBase 1.x support (Drill-4199)
    • Multibyte line delimiters for Text reader (Drill-3149)
    • Return directory associated with a workspace on the fly (Drill-4514)
    • Ability to return file names as part of queries
    • Hive CHAR data type support
    • DROP TABLE IF EXISTS SQL command support
    • Support for nested aggregate expressions for window aggregates
    • Improvements to MaxDir/MinDir functions
    • Split function
    • Access to Drill logs in the web UI
    • Addition of JDBC/ODBC client IP in Drill audit logs
    • And a lot more improvements and bug fixes

 

There are many additional exciting features in Drill 1.8. Download the MapR release and try it out!

Outcomes