AnsweredAssumed Answered

MapR CLDB failed to come on line within 600 seconds

Question asked by dzndrx on Mar 13, 2017
Latest reply on Mar 14, 2017 by dzndrx

Hi im setting up a 3 node cluster bare metal offline installation, and this error keeps on popping out I've done all the preconfiguration/preparation of nodes and they all pass the verification process. I dont know what causes this error.

I attached cldb.log, hoststats.log and warden.log.Please help me with this matter this is a live project for a client.

 

Here is the list of preconfiguration steps that I made, also i modify the /etc/pam.d/su file because it is required in this step check /etc/pam.d/su contains the following settings. This is the settings.

 

#%PAM-1.0
auth sufficient pam_rootok.so
# Uncomment the following line to implicitly trust users in the "wheel" group.
#auth sufficient pam_wheel.so trust use_uid
# Uncomment the following line to require a user to be in the "wheel" group.
#auth required pam_wheel.so use_uid
auth include system-auth
account sufficient pam_succeed_if.so uid = 0 use_uid quiet
account include system-auth
password include system-auth
session include system-auth
session required pam_limits.so
session optional pam_xauth.so

 

 

Do this on each node
-MEMORY CONFIGURATION

-Service numad stop
-chkconfig numad off
-Edit the file /etc/sysctl.conf and add the following line: vm.overcommit_memory=0
-On each node, set TCP retries for net.ipv4.tcp_retries2 to 5 so that MapR can detect unreachable nodes with less latency.
-/etc/sysctl.conf add the following line net.ipv4.tcp_retries2=5
-Run sysctl -p

-DRIVE CONFIGURATION

-If you have a RAID controller, configure it to run in HBA mode. For LSI MegaRAID controllers that do not support HBA, configure the following drive group settings for optimal performance:

[RAID Level] RAID0
[Stripe Size] >=256K
[Cache Policy or I/O Policy] Cached IO or Cached
[Read Policy] Always Read Ahead or Read Ahead
[Write Policy] Write-Through
[Disk Cache Policy or Drive Cache] Disabled

-LOCAL STORAGE

-/opt [at least 128gb]
-/tmp [at least 10gb]
-/opt/mapr/zkdata [about 500mb]
-swap space 24 to 126 gb [110% of physical memory]
ul
-RESOLVABILITY

-Unique hostname for each node [hostname -f]
-Resolvable with all other nodes with both forward and reverse DNS lookup

-SYSLOG
-Syslog must be enabled on each node to preserve logs regarding killed processes or failed jobs. Modern versions such as syslog-ng and rsyslog are possible, making it more difficult to be sure that a syslog daemon is present. One of the following commands should suffice:
-service syslog status

-INFRA

-root user is set to 0022 in /etc/profile
-nofile and nproc to 64000
-/etc/security/limits.conf add the following line (mapr - nofile 64000)
-/etc/security/limits.d/90-nproc.conf add the following line (mapr - noproc 64000)
-disable selinux
-disable stock linux nfs
-disable iptables/firewalld
- echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
-check /etc/pam.d/su contains the following settings
#%PAM-1.0
auth sufficient pam_rootok.so
# Uncomment the following line to implicitly trust users in the "wheel" group.
#auth sufficient pam_wheel.so trust use_uid
# Uncomment the following line to require a user to be in the "wheel" group.
#auth required pam_wheel.so use_uid
auth include system-auth
account sufficient pam_succeed_if.so uid = 0 use_uid quiet
account include system-auth
password include system-auth
session include system-auth
session required pam_limits.so
session optional pam_xauth.so

-finally use ulimit to verify settings
-reboot the system
-run ulimit -n

 

 

 

Attachments

Outcomes