dsorenson13

MapR Patch Release - December 2016

Blog Post created by dsorenson13 Employee on Feb 2, 2017

Get Notified of New Release & Patch Announcements

 

MapR Patch Release - December 2016

The following list of patches are interim patches released between maintenance release cycles. Each patch released is cumulative in nature and includes prior patches released for the branch used.

 

5.2.0

Bug 12856

Description - When the hadoop fs -rmr command is run, it reads the entire directory contents into memory before starting the delete process. This bug results in an Out Of Memory error.

Resolution - This fix includes a new haddop mfs -rmr <path> command that will:

  • not build the entire readdir file list in memory. Instead, once 1MB of readdir data is reached, the command will unlink and remove those directories.
  • not fetch the attributes of the entries in readdir.

Bug 20965

Description - When working with multiple clusters, synchronization issues were causing MapRFileSystem to return NullPointerException.

Resolution - With this fix, MapRFileSystem has been improved to better support interaction with multiple clusters as well as address synchronization issues.

Bug 23257

Description - In MCS, new NFS VIPs were visible in the NFS HA > VIP Assignments tab, but not in the NFS HA > NFS Setup tab.

Resolution - With this fix, the NFS VIPs are made available in both the NFS HA > VIP Assignments tab and the NFS HA > NFS Setup tab.

Bug 24139 & 25184

Description – When limit spread was enabled and nodes were more than 85% full, the CLDB did not allocate containers for IOs on non-local volumes.

Resolution - With this fix, CLDB will now allocate new containers to ensure the IO does not fail.

Bug 24155

Description - Disk setup was timing out if running trim on flash drives took a longer period of time.

Resolution - With this fix, disk setup will complete successfully and the warning message (“Starting Trim of SSD drives, it may take a long time to complete”) is entered in the log file.

Bug 24249

Description - When running map/reduce jobs with older versions of MapR classes, the system hung because the older classes linked to the native library installed on cluster nodes were updated to a newer MapR version.

Resolution - With this fix, the new fs.mapr.bailout.on.library.mismatch parameter detects mismatched libraries, fails the map/reduce job, and logs an appropriate error message. The parameter is enabled by default. The parameter can be disabled on all TaskTracker nodes, then you must resubmit the job for the task to continue to run. To disable the parameter, you must set it to false in the core-site.xml file.

Bug 24352

Description - During mirror resync, the destination resends the max version number, previously sent during the last mirroring action. This bug causes issues when the workload consists of multiple small files in the name space container.

Resolution - In this patch, the mirror synchronization has been optimized for changes in a small percentage of inodes. During mirror sync operation, the destination will send the recent version number from the last mirror resync operation. While scanning inodes to identify the inodes that have changed since the last resync operation, MFS will now compare the version number sent by the destination with the allocation group, which keeps track of all the inodes. If the allocation group version is:

  • Higher than the last resync version, the MFS will check for the changed inodes in the allocation group.
  • Less than or equal to the last resync version, MFS will not read all the inodes in the allocation group because the allocation group has not changed since the last resync operation.

Bug 24618

Description- Remote mirror volumes could not be created on secure clusters using MCS, even when the appropriate tickets were present.

Resolution - With this fix, remote mirror volumes can now be created on secure clusters using MCS.

Bug 24846

Description - If the topology of a node changed (after a CLDB failover), the list of nodes under a topology could not be determined as the new non-leaf topologies were not updated.

Resolution - With this fix, the inner nodes of the topology graph will be updated correctly, and the list of nodes under an inner (non-leaf) topology will be determined correctly.

Bug 24965

Description - On large clusters, sometimes the bind failed with the message indicating unavailability of port when running MR jobs, specifically reducer tasks.

Resolution - With this fix, the new fs.mapr.bind.retries configuration parameter in core-site.xml file, if set to true , will retry to bind during client initialization for 5 minutes before failing. By default, the fs.mapr.bind.retries configuration parameter is set to false.

Bug 24969

Description - The maprcli volume create command was not setting group ownership to user's primary group when the user's primary GID was not the first GID in the list of GIDs.

Resolution - With this fix, the primary GID of the user performing the operation will now be the first GID in the list of GIDs.

Bug 24971

Description - When the mirroring operation started after a CLDB failover, sometimes it was sending requests to slave CLDB where data was stale, resulting in the mirroring operation hanging. If the CLDB failover happened again during this time, the new CLDB master was discarding data resynchronized by the old mirroring operation, while marking the mirroring operation as successful. This bug resulted in data mismatch between source and destination.

Resolution - With this fix, mirroring requests will be sent to master CLDB node only.

Bug 24915

Description - In version 5.1, running the expandaudit utility on volumes can result in very large (more than 1GB) audit log files due to incorrect GETATTR (get attributes) cache handling.

Resolution - With this fix, the expandaudit utility has been updated so that it will not perform subsequent GETATTR calls if the original call to the same file identifier failed.

Bug 24610

Description - In a secure cluster, when there are intermittent connection drops (between MFS-MFS or client-MFS), the client and/or server could crash during authentication.

Resolution - With this fix, the client and/or server will not crash during authentication if there are intermittent connection drops.

Bug 24585

Description - Excessive logging in CLDB audit caused cldbaudit.log file to grow to large sizes.

Resolution - With this fix, to reduce the size of cldbaudit.log file, the queries to CLDB for ZK string will no longer be logged for auditing.

Bug 25177

Description - When using FairScheduler with maxAMShare enabled, total amResourceUsage per queue is not calculated properly, which may cause applications to hang in ACCEPTED state.

Resolution - AM resource usage is now calculated as expected, and YARN jobs no longer get stuck in the ACCEPTED state.

Bug 25290

Description – At times, while writes were in progress, the FUSE process crashed, and the group IDs of the user changed.

Resolution - With this fix, the FUSE process will not crash while writes are in progress.

Bug 25426

Description - The server was rejecting encrypted writes as the expected length did not match the RPC data length. This caused the server to crash.

Resolution - With this fix, the server will no longer crash, as the expected length will always match the RPC data length for encrypted writes.

5.1.0

Bug 13187

Description - The maprcli volume create command was not setting group ownership to the user's primary group.

Resolution - With this fix, the maprcli volume create command will set group ownership to the user's primary group.

Bug 20965

Description - When working with multiple clusters, synchronization issues were causing MapRFileSystem to return NullPointerException.

Resolution - With this fix, MapRFileSystem has been improved to better support interaction with multiple clusters as well as address synchronization issues.

Bug 23257

Description - In MCS, new NFS VIPs were visible in the NFS HA > VIP Assignments tab, but not in the NFS HA > NFS Setup tab.

Resolution - With this fix, the NFS VIPs are made available in both the NFS HA > VIP Assignments tab and the NFS HA > NFS Setup tab.

Bug 23975

Description - MFS was failing to start on some docker containers, as it was attempting to determine number of numa nodes from /sys/devices/system/node.

Resolution - With this fix, MFS will work on docker containers.

Bug 24139

Description – When limit spread was enabled and nodes were more than 85% full, the CLDB did not allocate containers for IOs on non-local volumes.

Resolution - With this fix, CLDB will now allocate new containers to ensure the IO does not fail.

Bug 24155

Description - Disk setup was timing out if running trim on flash drives took a longer period of time.

Resolution - With this fix, disk setup will complete successfully, and the warning message (“Starting Trim of SSD drives, it may take a long time to complete”) is entered in the log file.

Bug 24249

Description - When running map/reduce jobs with older versions of MapR classes, the system hung because the older classes linked to the native library installed on cluster nodes were updated to a newer MapR version.

Resolution - With this fix, the new fs.mapr.bailout.on.library.mismatch parameter detects mismatched libraries, fails the map/reduce job, and logs an appropriate error message. The parameter is enabled by default. The parameter can be disabled on all TaskTracker nodes, then you must resubmit the job for the task to continue to run. To disable the parameter, you must set it to false in the core-site.xml file.

Bug 24585

Description - Excessive logging in CLDB audit caused cldbaudit.log file to grow to large sizes.

Resolution - With this fix, to reduce the size of cldbaudit.log file, the queries to CLDB for ZK string will no longer be logged for auditing.

Bug 24610

Description - In a secure cluster, when there are intermittent connection drops (between MFS-MFS or client-MFS), the client and/or server could crash during authentication.

Resolution - With this fix, the client and/or server will not crash during authentication if there are intermittent connection drops.

Bug 24812

Description - Apache Hadoop could not look up the status of a finished job because job.xml was already removed from the search directory. Hive interpreted the job as failing and generated an exception.

Resolution - With this fix, Apache Hadoop correctly reports the status of the finished job.

Bug 24965

Description - On large clusters, sometimes the bind failed with the message indicating unavailability of port when running MR jobs, specifically reducer tasks.

Resolution - With this fix, the new fs.mapr.bind.retries configuration parameter in core-site.xml file, if set to true, will retry to bind during client initialization for 5 minutes before failing. By default, the fs.mapr.bind.retries configuration parameter is set to false.

Bug 24915

Description - In version 5.1, running the expandaudit utility on volumes can result in very large (more than 1GB) audit log files due to incorrect GETATTR (get attributes) cache handling.

Resolution - With this fix, the expandaudit utility has been updated so that it will not perform subsequent GETATTR calls if the original call to the same file identifier failed.

Bug 25003

Description - When a specific queue uses all of its resources, the UsedResources tab in the Resource Manager UI might show a greater value than shown in the MaxResources tab. This happens when another application is submitted and the application master container size is included.

Resolution - With this fix, no additional containers can be assigned to a queue when its UsedResource has reached the MaxResource limit.

Bug 25177

Description - When using FairScheduler with maxAMShare enabled, total amResourceUsage per queue is not calculated properly, which may cause applications to hang in ACCEPTED state.

Resolution - AM resource usage is now calculated as expected, and YARN jobs no longer get stuck in the ACCEPTED state.

5.0.0

Bug 13187

Description - The maprcli volume create command was not setting group ownership to user's primary group.

Resolution - With this fix, the maprcli volume create command will set group ownership to user's primary group.

Bug 20965

Description - When working with multiple clusters, synchronization issues were causing MapRFileSystem to return NullPointerException.

Resolution - With this fix, MapRFileSystem has been improved to better support interaction with multiple clusters as well as address synchronization issues.

Bug 24139

Description – When limit spread was enabled and nodes were more than 85% full, the CLDB did not allocate containers for IOs on non-local volumes.

Resolution - With this fix, CLDB will now allocate new containers to ensure the IO does not fail.

Bug 24249

Description - When running map/reduce jobs with older versions of MapR classes, the system hung because the older classes linked to the native library installed on cluster nodes were updated to a newer MapR version.

Resolution - With this fix, the new fs.mapr.bailout.on.library.mismatch parameter detects mismatched libraries, fails the map/reduce job, and logs an appropriate error message. The parameter is enabled by default. The parameter can be disabled on all TaskTracker nodes, and then you must resubmit the job for the task to continue to run. To disable the parameter, you must set it to false in the core-site.xml file.

 Bug 24618

Description- Remote mirror volumes could not be created on secure clusters using MCS even when the appropriate tickets were present.

Resolution - With this fix, remote mirror volumes can now be created on secure clusters using MCS.

Bug 24965

Description - On large clusters, sometimes the bind failed with the message indicating unavailability of port when running MR jobs, specifically reducer tasks.

Resolution - With this fix, the new fs.mapr.bind.retries configuration parameter in core-site.xml file, if set to true, will retry to bind during client initialization for 5 minutes before failing. By default, the fs.mapr.bind.retries configuration parameter is set to false.

Bug 24969

Description - The maprcli volume create command was not setting group ownership to user's primary group when the user's primary GID was not the first GID in the list of GIDs.

Resolution - With this fix, the primary GID of the user performing the operation will now be the first GID in the list of GIDs.

Bug 24971

Description - When the mirroring operation started after a CLDB failover, sometimes it was sending requests to slave CLDB where data was stale, resulting in the mirroring operation hanging. If the CLDB failover happened again during this time, the new CLDB master was discarding data resynchronized by the old mirroring operation, while marking the mirroring operation as successful. This bug resulted in data mismatch between source and destination.

Resolution - With this fix, mirroring requests will be sent to master CLDB node only.

Bug 25003

Description - When a specific queue uses all of its resources, the UsedResources tab in the Resource Manager UI might show a greater value than shown in the MaxResources tab. This bug happens when another application is submitted and the application master container size is included.

Resolution - With this fix, no additional containers can be assigned to a queue when its UsedResource has reached the MaxResource limit.

Bug 25041

Description - When a newly added node was made the master of the name container, MFS crashed, while deleting files in the background.

Resolution - With this fix, MFS will not crash when a newly added node is made the master of the name container.

4.1.0

Bug 24139

Description – When limit spread was enabled and nodes were more than 85% full, the CLDB did not allocate containers for IOs on non-local volumes.

Resolution - With this fix, CLDB will now allocate new containers to ensure the IO does not fail.

 Bug 24969

Description - The maprcli volume create command was not setting group ownership to the user's primary group when the user's primary GID was not the first GID in the list of GIDs.

Resolution - With this fix, the primary GID of the user performing the operation will now be the first GID in the list of GIDs.

Bug 24971

Description - When the mirroring operation started after a CLDB failover, sometimes it was sending requests to slave CLDB where data was stale, resulting in the mirroring operation hanging. If the CLDB failover happened again during this time, the new CLDB master was discarding data resynchronized by the old mirroring operation, while marking the mirroring operation as successful. This bug resulted in data mismatch between source and destination.

Resolution - With this fix, mirroring requests will be sent to master CLDB node only.

Bug 25041

Description - When a newly added node was made the master of the name container, MFS crashed, while deleting files in the background.

Resolution - With this fix, MFS will not crash when a newly added node is made the master of the name container.

Outcomes