Upgrade OS on MapR Cluster

Document created by mufeed on Feb 7, 2016Last modified by mufeed on Feb 13, 2016
Version 2Show Document
  • View in full screen mode

Author: Mufeed Usman

 

Original Publication Date: March 5, 2015

Updated: February 10, 2016

 

The scope of this explanation will be confined to the actions to be taken from a MapR perspective. For the OS upgrade, leverage the documentation from the vendor.

 

The first step is to isolate the node taking part in the OS upgrade from the cluster so as to have the least bit of interference. There are two ways of executing this when seen from a high-level. The path to choose will eventually depend on evaluation of the PROS & CONS of each of the methods involved. The methods are:

 

(a) Putting the node in maintenance mode

    - Refer the following to know the package dependencies for the OS version being upgraded to:

http://doc.mapr.com/display/RelNotes/Packages+and+Dependencies+for+MapR+Software

- Backup /opt/mapr/hostid & /opt/mapr/conf/disktab files. (This is MANDATORY if the OS partition is wiped clean and reinstalled as part of the upgrade. Otherwise it is a good practice to save the information.)

    - Capture the symbolic link information to MapR init scripts under /etc/init.d. (This is MANDATORY if the OS partition is wiped clean and reinstalled as part of the upgrade. Otherwise it is a good practice to save the information.)

      For example:

 

# ls -l /etc/init.d/mapr*

lrwxrwxrwx. 1 root root 31 Jun 18 10:23 /etc/init.d/mapr-cldb -> /opt/mapr/initscripts/mapr-cldb

lrwxrwxrwx. 1 root root 36 Jun 22 09:46 /etc/init.d/mapr-hoststats -> /opt/mapr/initscripts/mapr-hoststats

lrwxrwxrwx. 1 root root 30 Jun 18 10:23 /etc/init.d/mapr-mfs -> /opt/mapr/initscripts/mapr-mfs

lrwxrwxrwx. 1 root root 36 Jun 22 09:46 /etc/init.d/mapr-nfsserver -> /opt/mapr/initscripts/mapr-nfsserver

lrwxrwxrwx. 1 root root 33 Jun 18 10:23 /etc/init.d/mapr-warden -> /opt/mapr/initscripts/mapr-warden

lrwxrwxrwx. 1 root root 49 Jun 18 10:23 /etc/init.d/mapr-zookeeper -> /opt/mapr/zookeeper/zookeeper-3.4.5/bin/zookeeper

- Capture the symbolic link information to MapR libraries. If the likes of below get removed, services like hoststats will not start.   

# ls -l /usr/lib64 | grep mapr

        lrwxrwxrwx 1 root root 35 Sep 21 10:52 libsoci_core.so.3.1 -> /opt/mapr/lib/libsoci_core.so.3.1.0*

        lrwxrwxrwx 1 root root 36 Sep 21 10:52 libsoci_mysql.so.3.1 -> /opt/mapr/lib/libsoci_mysql.so.3.1.0*

- Capture disk permission information as

# ls -l /dev/sd*

brw-rw----. 1 root disk 8, 0 Jun 11 20:51 /dev/sda

brw-rw----. 1 root disk 8, 1 Jun 11 20:51 /dev/sda1

brw-rw----. 1 root disk 8, 2 Jun 11 20:51 /dev/sda2

brw-rw----. 1 root mapr 8, 16 Jul 16 04:15 /dev/sdb

brw-rw----. 1 root mapr 8, 32 Jul 16 04:22 /dev/sdc

brw-rw----. 1 root mapr 8, 48 Jul 16 04:22 /dev/sdd

brw-rw----. 1 root mapr 8, 64 Jul 16 04:26 /dev/sde

brw-rw----. 1 root mapr 8, 80 Jul 16 04:27 /dev/sdf

brw-rw----. 1 root mapr 8, 96 Jul 16 04:23 /dev/sdg

- Node in maintenance mode

    - Stop MapR services on the node

    - Node OS upgrade

    - Restore  /opt/mapr/hostid & /opt/mapr/conf/disktab files. (This is MANDATORY if the OS partition is wiped clean and reinstalled as part of the upgrade. Not required if the OS partition is untouched.)

    - Recreate the symbolic link to scripts whose information was captured earlier. (This is MANDATORY if the OS partition is wiped clean and reinstalled as part of the upgrade. Not required if the OS partition is untouched.)

    - Ensure the disk permissions are intact (as per the earlier capture). If not, modify them accordingly.

    - Start MapR services on the node

    - Node taken back out of maintenance mode

 

(b) Decommissioning the node

    - Refer the following to know the package dependencies for the OS version being upgraded to:

http://doc.mapr.com/display/RelNotes/Packages+and+Dependencies+for+MapR+Software

    - Backup /opt/mapr/hostid & /opt/mapr/conf/disktab files. (This is MANDATORY if the OS partition is wiped clean and reinstalled as part of the upgrade. Otherwise it is a good practice to save the information.)

    - Capture the symbolic link information to MapR init scripts under /etc/init.d. (This is MANDATORY if the OS partition is wiped clean and reinstalled as part of the upgrade. Otherwise it is a good practice to save the information.)

      For example,

# ls -l /etc/init.d/mapr*

lrwxrwxrwx. 1 root root 31 Jun 18 10:23 /etc/init.d/mapr-cldb -> /opt/mapr/initscripts/mapr-cldb

lrwxrwxrwx. 1 root root 36 Jun 22 09:46 /etc/init.d/mapr-hoststats -> /opt/mapr/initscripts/mapr-hoststats

lrwxrwxrwx. 1 root root 30 Jun 18 10:23 /etc/init.d/mapr-mfs -> /opt/mapr/initscripts/mapr-mfs

lrwxrwxrwx. 1 root root 36 Jun 22 09:46 /etc/init.d/mapr-nfsserver -> /opt/mapr/initscripts/mapr-nfsserver

lrwxrwxrwx. 1 root root 33 Jun 18 10:23 /etc/init.d/mapr-warden -> /opt/mapr/initscripts/mapr-warden

lrwxrwxrwx. 1 root root 49 Jun 18 10:23 /etc/init.d/mapr-zookeeper -> /opt/mapr/zookeeper/zookeeper-3.4.5/bin/zookeeper

- Capture the symbolic link information to MapR libraries. If the likes of below get removed, services like hoststats will not start.   

# ls -l /usr/lib64 | grep mapr

        lrwxrwxrwx 1 root root 35 Sep 21 10:52 libsoci_core.so.3.1 -> /opt/mapr/lib/libsoci_core.so.3.1.0*

        lrwxrwxrwx 1 root root 36 Sep 21 10:52 libsoci_mysql.so.3.1 -> /opt/mapr/lib/libsoci_mysql.so.3.1.0*

- Capture disk permission information as

# ls -l /dev/sd*

brw-rw----. 1 root disk 8, 0 Jun 11 20:51 /dev/sda

brw-rw----. 1 root disk 8, 1 Jun 11 20:51 /dev/sda1

brw-rw----. 1 root disk 8, 2 Jun 11 20:51 /dev/sda2

brw-rw----. 1 root mapr 8, 16 Jul 16 04:15 /dev/sdb

brw-rw----. 1 root mapr 8, 32 Jul 16 04:22 /dev/sdc

brw-rw----. 1 root mapr 8, 48 Jul 16 04:22 /dev/sdd

brw-rw----. 1 root mapr 8, 64 Jul 16 04:26 /dev/sde

brw-rw----. 1 root mapr 8, 80 Jul 16 04:27 /dev/sdf

brw-rw----. 1 root mapr 8, 96 Jul 16 04:23 /dev/sdg

- Node decommissioned to /decommissioned topology and drained    - Stop MapR services on the node

    - Node OS upgrade

    - Restore  /opt/mapr/hostid & /opt/mapr/conf/disktab files. (This is MANDATORY if the OS partition is wiped clean and reinstalled as part of the upgrade. Not required if the OS partition is untouched.)

    - Recreate the symbolic link to scripts whose information was captured earlier. (This is MANDATORY if the OS partition is wiped clean and reinstalled as part of the upgrade. Not required if the OS partition is untouched.)

    - Ensure the disk permissions are intact (as per the earlier capture). If not, modify them accordingly.

    - Start MapR services on the node

    - Node added back to the original topology, usually /data topology

 

For either of the above, it is advisable to touch critical cluster services such as the master CLDB, leader ZooKeeper and active JT/RM in the end to avoid unnecessary fail-overs when these services are stopped.

 

The above 2 methods are briefly touched upon in the following section.

 

(a) Putting the node in maintenance mode: This method is ideal in scenarios where there is a replication factor of 3 for the containers in the cluster so that you can afford to continue the functionality of the cluster with 2 copies of containers. It is recommended to go through the following link to have a better understanding of what happens and the steps needed to be carried out during this mode of execution.

 

Link: http://doc.mapr.com/display/MapR/Performing+Maintenance+on+a+Node

 

PROS: Shorter time for execution. No load on network and disk utilization.

CONS: Risk of running the cluster with 2 copies of container data.

 

(b) Decommissioning the node: This method is ideal in a scenario where the time taken to perform the upgrade is not critical. That is, when it is affordable to take the time to drain the node being decommissioned of its container copies so that no data will be under-replicated during the process. It is also the longest way to accomplish this as each node needs to be fully drained and it's also intensive in terms of network and disk utilization.

 

The existing rack topology and the server id information can be gathered as,

# maprcli node list -columns racktopo,id

id racktopo hostname ip 

7421804265708548418 /data/default-rack/n65 n65 10.10.70.65 

5563461926583275637 /data/default-rack/n66 n66 10.10.70.66 

4033431907806572591 /data/default-rack/n69 n69 10.10.70.69 

8676496755481576303 /data/default-rack/n72 n72 10.10.70.72 

3274467792102590278 /data/default-rack/n73 n73 10.10.70.73

To move a node to /decommissioned topology do (the below example corresponds to node n73),

# maprcli node move -serverids 3274467792102590278 -topology /decommissioned

 

The topology should be changed follows:

# maprcli node list -columns racktopo,id -filter '[hostname==n73]' 
id                   racktopo             hostname  ip
3274467792102590278  /decommissioned/n73  n73       10.10.70.73,192.168.122.1

 

To ensure that no non-local cluster volume data resides on this node use the following:

# maprcli volume list -columns volumename | grep -v local | while read x 
do
maprcli dump volumenodes -volumename $x -json | grep 10.10.70.73:5660
done

 

When the node has been fully drained this should not produce any output. To compare on a node still part of the original topology refer to the following:

 

[root@n72 ~]# maprcli volume list -columns volumename | grep -v local | while read x

do

maprcli dump volumenodes -volumename $x -json | grep 10.10.70.72:5660

done

  "10.10.70.72:5660--7-VALID"

  "10.10.70.72:5660--3-VALID"

  "10.10.70.72:5660--3-VALID"

  "10.10.70.72:5660--4-VALID"

  "10.10.70.72:5660--3-VALID"

  "10.10.70.72:5660--3-VALID"

  "10.10.70.72:5660--3-VALID"

  "10.10.70.72:5660--4-VALID"

  "10.10.70.72:5660--5-VALID"

  "10.10.70.72:5660--3-VALID"

Once the OS upgrade is over, move the node back to the original topology using the following with the original rack topology:

# maprcli node move -serverids 3274467792102590278 -topology /data/default-rack/

 

The topology should be changed back as follows:

# maprcli node list -columns racktopo,id -filter '[hostname==n73]' 
id                   racktopo                hostname  ip
3274467792102590278  /data/default-rack/n73  n73       10.10.70.73,192.168.122.1

 

Link1: http://doc.mapr.com/display/MapR/Removing+Nodes+from+a+Cluster (The steps here needn't be followed through in it's entirety as the node will only be moving to /decommissioned topology and moved back after OS upgrade)

 

Link2: http://doc.mapr.com/display/MapR/node+remove (Reference for the actual command to be executed)

Link3: http://doc.mapr.com/display/MapR/Adding+Nodes+to+a+Cluster (Here also, only need to start the Warden and run configure.sh with the needed options for defining CLDB and ZooKeeper ensemble)

 

PROS: No risk of under-replicated data.

CONS: Longer time for execution. Load on network and disk utilization.

1 person found this helpful

Attachments

    Outcomes