How to purge old data in MapR Metrics Database

Document created by Hao Zhu Employee on Feb 18, 2016Last modified by Hao Zhu Employee on Feb 18, 2016
Version 3Show Document
  • View in full screen mode

Author: Hao Zhu

Original Publication Date: February 24, 2015

Goal:

How to purge old data if MapR metrics database is running out of disk space.

Solution:

1. Identify nodes which are running hoststats service.

$ maprcli node list -filter '[csvc==hoststats]' -columns service

2. Identify retention policy of MapR metrics.Login to one of the above nodes and check the content of /opt/mapr/conf/db.conf.Normally the largest table in MapR metrics database is "METRIC_TRANSACTION" where the finest data granularity is kept. It is partitioned by day and has 3 partitions by default. The parameter "db.partition.finest.count.days" controls how many days worth of data will be stored.Similarly, table "METRIC_TRANSACTION_SUMMARY_DAILY" by default has 15 partitions which is controlled by the parameter "db.partition.fine.count.days";Table "METRIC_TRANSACTION_SUMMARY_YEARLY" by default has 100 partitions which is controlled by parameter "db.partition.coarse.count.years".

 

3. Change retention policy.If the metrics database has enough disk space currently and it is not an urgent requirement to purge old data immediately, the retention policy can be changed on the nodes for 2 different services:a. Change the retention policy in /opt/mapr/conf/db.conf on all nodes which are running "hoststats" service, and then restart "hoststats" services using below command:

$ maprcli node services -name hoststats -action restart -filter '[csvc==hoststats]'

This is to make sure all "hoststats" services insert data into correct partitions of MySQL tables.b. Change the retention policy in /opt/mapr/conf/db.conf on all nodes which are running webserver service, and then restart "webserver" services using below command:

$ maprcli node services -name webserver -action restart -filter '[csvc==webserver]'

Warden where the first webserver service lives will periodically purge the metrics data according to retention policy.Below command is used to identify the node of the first webserver:

$ maprcli node list -columns "configuredservice,hostname" | grep "webserver" | head -n 1

Reducing the retention policy may not purge the data immediately.If things are urgent, please follow below steps to purge the old data immediately.

 

4.  Identify the largest table.Normally the table "METRIC_TRANSACTION" is the largest table,  however it is best to double check.On the MySQL node identify the MySQL data directory by checking the "--datadir" option of the mysqld process.

For example:

$ ps -ef|grep -i mysqld|grep -v grep

root 1270 1 0 15:45 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql

--socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql

mysql 1378 1270 0 15:45 ? 00:00:13 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql

--user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock

List files in the metrics database directory under MySQL data directory. For example, the metrics database name is "metrics":

$ ls -altr /var/lib/mysql/metrics

The file name includes the table name.

For example:

$ ls -altr /var/lib/mysql/metrics|grep sometable

-rw-rw----. 1 mysql mysql 8880 Mar 2 14:59 sometable.frm

-rw-rw----. 1 mysql mysql 159770112 Mar 2 17:17 sometable.MYD

-rw-rw----. 1 mysql mysql 1024 Mar 2 17:17 sometable.MYI

Another way to get the table size is to run below SQL in "metrics" database and order by table size.

mysql> SELECT table_name AS "Tables", round(((data_length + index_length) / 1024 / 1024), 2)

"Size in MB" FROM information_schema.TABLES WHERE table_schema = "metrics" ORDER BY (data_length + index_length) DESC;

+-----------------------------------+------------+

| Tables | Size in MB |

+-----------------------------------+------------+

| METRIC_TRANSACTION | 65.39 |

| METRIC_TRANSACTION_SUMMARY_DAILY | 35.23 |

| METRIC_TRANSACTION_EVENT | 9.59 |

| METRIC_TRANSACTION_SUMMARY_YEARLY | 3.34 |

| JOB_ATTRIBUTES | 0.52 |

| JOB | 0.06 |

| TASK_ATTEMPT | 0.05 |

| METRIC_TRANSACTION_SUMMARY_BATCH | 0.03 |

| JOB_HISTORY_PROCESSING_BATCH | 0.03 |

| TASK | 0.03 |

| NODE | 0.02 |

| TASK_EVENT | 0.02 |

| TASK_ATTEMPT_EVENT | 0.02 |

| TASK_ATTEMPT_ATTRIBUTES | 0.02 |

| JOB_EVENT | 0.02 |

+-----------------------------------+------------+

15 rows in set (0.04 sec)

Assume table "METRIC_TRANSACTION" is the largest table in below steps.

5. Identify which partition(s) need to be purged.

First use "show create table <tablename>" to find out how many partitions belong to this table.

For example, table "METRIC_TRANSACTION" has 3 partitions -- p0, p1 and p2.

 

mysql> show create table METRIC_TRANSACTION ;

+--------------------+----------------------------------------------------------------------+

| Table | Create Table |

+--------------------+----------------------------------------------------------------------+

| METRIC_TRANSACTION | CREATE TABLE `METRIC_TRANSACTION` (

  `PARTITION_ID` tinyint(4) NOT NULL,

  `EVENT_TIME` bigint(20) NOT NULL,

  `M_NAME` varchar(64) NOT NULL,

  `M_VALUE` bigint(20) NOT NULL,

  `NODE_ID` varchar(64) NOT NULL,

  `CLUSTER_ID` varchar(32) NOT NULL,

  `USER_ID` varchar(64) DEFAULT NULL,

  `PROCESS_ID` varchar(64) DEFAULT NULL,

  `CREATED` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,

  PRIMARY KEY (`PARTITION_ID`,`CLUSTER_ID`,`NODE_ID`,`M_NAME`,`EVENT_TIME`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

/*!50100 PARTITION BY LIST (PARTITION_ID)

(PARTITION p2 VALUES IN (2) ENGINE = InnoDB,

PARTITION p1 VALUES IN (1) ENGINE = InnoDB,

PARTITION p0 VALUES IN (0) ENGINE = InnoDB) */ |

+--------------------+--------------------------------------------------------------------+

1 row in set (0.00 sec)

Since each partition of "METRIC_TRANSACTION" contains one day's data, use below SQL to find out which partitions have the oldest data.

For example:

mysql> select PARTITION_ID,max(EVENT_TIME) from METRIC_TRANSACTION group by PARTITION_ID order by 2 desc;

+--------------+-----------------+

| PARTITION_ID | max(EVENT_TIME) |

+--------------+-----------------+

| 0 | 1424817768000 |

| 2 | 1424814429000 |

+--------------+-----------------+

2 rows in set (0.10 sec)

Above result shows partition 0 has the most current data, partition 2 has older data, other partitions have no data.Assume the decision is to purge partition 1 and 2 of "METRIC_TRANSACTION" in below steps.

6. Drop and recreate partitions.

If partition 1 and 2 need to be dropped and recreated:

ALTER TABLE METRIC_TRANSACTION DROP PARTITION p1;

ALTER TABLE METRIC_TRANSACTION ADD PARTITION (PARTITION p1 VALUES IN (1));

 

ALTER TABLE METRIC_TRANSACTION DROP PARTITION p2;

ALTER TABLE METRIC_TRANSACTION ADD PARTITION (PARTITION p2 VALUES IN (2));

After that, verify current partition number using below SQL:

show create table METRIC_TRANSACTION ;

7. Verify disk space.

Make sure the disk space is back after dropping old partitions of the largest table(s).

Attachments

    Outcomes