Data Safety and Data Recoverability: A Snapshot How-to in a Snap

Document created by mufeed on Feb 13, 2016
Version 1Show Document
  • View in full screen mode

Author: Mufeed Usman

 

Original Publication Date: August 25, 2014

 

It doesn’t matter if it’s big data or small data, it’s always BIG for the user. Big data investments mean big expected ROI and big business value. This is why when customers are searching for a big data platform it’s important for them to ask the right questions. Our customers typically plan ahead for enterprise deployments, so they ask good questions such as:

  • "How can I protect my data?"
  • "How do I restore my data if there’s a problem, and how easy is it to do?"
  • "Can you help me deliver on my service-level agreements?"

The questions above, and others like them can be heard in different flavors, but they all boil down to that one concern - data safety & data recoverability.

MapR, by design, has built-in native data protection in the form of container replication. To add more punch to that, we've included snapshot and mirroring capabilities. Our snapshot function is what I’ll focus on today.

Snapshots are point-in-time, static views of data. In other words, they capture the state of the storage system at the time the snapshot command is issued. When implemented with consistency in mind, as they are in MapR, they are guaranteed to reflect the data exactly as it was when the snapshot was taken. Snapshots are useful in a variety of scenarios, one of which is recovering data that was corrupted by user or application errors. They are also good for establishing a baseline view of data upon which point-in-time querying, audit processes, or machine learning techniques can be applied.

You’ll notice that taking a snapshot in MapR is quick and very space efficient. And because they are accessible directly from the file system, organizations do not need to go through significant effort to retrieve snapshot data.

There are two things worth noting. First, taking a snapshot is a volume-level process. Therefore you cannot, and it doesn’t make sense to, take a snapshot of a single file or a single subdirectory. Second, though the method to create a snapshot is the same, the path to recovery slightly varies depending on the type of data to be recovered – regular files versus MapR-DB tables.

Getting into details about how a snapshot works and the finer details is out of scope for this post. This post is intended just to give you a quick “how-to” on the most common restore techniques from a snapshot.

Snapshot and Restore of Simple Files

For this example, I have created two volumes: snapshot_src and snapshot_dst (source and destination respectively). Note that snapshots do not need a specific “destination” volume, I’m just creating that volume as a place in this example to put the recovered snapshot data. Most likely you would overwrite your corrupted data file with the valid version in the snapshot.

[root@mu-node-64 ~]# maprcli volume create -name snapshot_src -path /snapshot_src -topology /data

[root@mu-node-64 ~]# maprcli volume create -name snapshot_dst -path /snapshot_dst -topology /data

[root@mu-node-64 ~]#

[root@mu-node-64 ~]# ls -ld /mapr/my.cluster.com/snapshot_*

drwxr-xr-x. 2 root root 0 May 21 14:02 /mapr/my.cluster.com/snapshot_dst

drwxr-xr-x. 2 root root 0 May 21 14:01 /mapr/my.cluster.com/snapshot_src

Fig.1

Now I’ll create some example text files in the source volume:

[root@mu-node-64 ~]# cd /mapr/my.cluster.com/snapshot_src/

[root@mu-node-64 snapshot_src]# echo a > file1.txt

[root@mu-node-64 snapshot_src]# echo b > file2.txt

[root@mu-node-64 snapshot_src]# ls -ltr

total 1

-rw-r--r--. 1 root root 2 May 21 14:04 file1.txt

-rw-r--r--. 1 root root 2 May 21 14:04 file2.txt

Fig.2

Notice that in Figure 2, I used the Linux cd command to go to the Hadoop directory and create files as if they resided on a regular Linux file system. I can do this because I am using the MapR NFS interface which lets me access my Hadoop data with the Linux command line.

Now, I can take the snapshot as follows:

[root@mu-node-64 snapshot_src]# maprcli volume snapshot create -snapshotname snapshot.snapshot_src -volume snapshot_src

[root@mu-node-64 snapshot_src]# maprcli volume snapshot list

cumulativeReclaimSizeMB creationtime ownername snapshotid snapshotname volumeid volumename ownertype volumepath

0 Wed May 21 14:07:09 IST 2014 root 256000055 snapshot.snapshot_src 136224524 snapshot_src 1 /snapshot_src

[root@mu-node-64 snapshot_src]#

[root@mu-node-64 snapshot_src]# hadoop fs -ls /snapshot_src/.snapshot

Found 1 items

drwxr-xr-x - root root 2 2014-05-21 14:04 /snapshot_src/.snapshot/snapshot.snapshot_src

Fig.3

And now, I want to restore the snapshot.snapshot_src directory, which contains the snapshot data, by copying it to the destination volume. Since the snapshot looks exactly like a file system directory, the restore process simply uses the hadoop fs -cp command just like on any distribution for Hadoop. It is pretty straight forward as shown below:

[root@mu-node-64 snapshot_src]# hadoop fs -cp /snapshot_src/.snapshot/snapshot.snapshot_src/ /snapshot_dst

[root@mu-node-64 snapshot_src]# cd /mapr/my.cluster.com/snapshot_dst/snapshot.snapshot_src

[root@mu-node-64 snapshot.snapshot_src]#

[root@mu-node-64 snapshot.snapshot_src]# ls -ltr

total 1

-rwxr-xr-x. 1 root root 2 May 21 14:09 file1.txt

-rwxr-xr-x. 1 root root 2 May 21 14:09 file2.txt

Fig.4

As you can probably tell, I also could have used the standard Linux cp command instead of hadoop fs -cp to restore the snapshot data.

Snapshot and Restore of MapR-DB Tables

As mentioned earlier, the restore of MapR-DB tables is a little different. I setup the source table by first creating a tablessubdirectory in the snapshot_src volume, and then create the table, as follows:

[root@mu-node-64 snapshot.snapshot_src]# hadoop fs -mkdir /snapshot_src/tables

[root@mu-node-64 snapshot.snapshot_src]# maprcli table create -path /snapshot_src/tables/table01

[root@mu-node-64 snapshot.snapshot_src]# maprcli table cf create -path /snapshot_src/tables/table01 -cfname table01_cf01

[root@mu-node-64 snapshot.snapshot_src]#

[root@mu-node-64 snapshot.snapshot_src]# maprcli table cf list -path /snapshot_src/tables/table01

readperm appendperm inmemory versionperm cfname writeperm compressionperm memoryperm compression ttl maxversions minversions

u:root u:root false u:root table01_cf01 u:root u:root u:root lz4 2147483647 3 0

[root@mu-node-64 snapshot.snapshot_src]#

[root@mu-node-64 snapshot.snapshot_src]# hbase shell

HBase Shell; enter 'help' for list of supported commands.

Type "exit" to leave the HBase Shell

Version 0.94.17-mapr-1403-SNAPSHOT, rbb690294807b1bf405176c2dfbcff0e815849f4e, Tue Apr 1 14:07:41 PDT 2014

 

Not all HBase shell commands are applicable to MapR tables.

Consult MapR documentation for the list of supported commands.

 

hbase(main):001:0> put '/snapshot_src/tables/table01', 'row1', 'table01_cf01', 'table01_value01'

0 row(s) in 0.3270 seconds

 

hbase(main):002:0> scan '/snapshot_src/tables/table01'

ROW COLUMN+CELL 

row1 column=table01_cf01:, timestamp=1400663440775, value=table01_value01 

1 row(s) in 0.0400 seconds

 

hbase(main):003:0>

Fig.5

And the snapshot was taken as shown below:

[root@mu-node-64 ~]# maprcli volume snapshot create -snapshotname snapshot1.snapshot_src -volume snapshot_src

[root@mu-node-64 ~]# maprcli volume snapshot list

cumulativeReclaimSizeMB creationtime ownername snapshotid snapshotname volumeid volumename ownertype volumepath

0 Wed May 21 14:07:09 IST 2014 root 256000055 snapshot.snapshot_src 136224524 snapshot_src 1 /snapshot_src

0 Wed May 21 14:44:32 IST 2014 root 256000056 snapshot1.snapshot_src 136224524 snapshot_src 1 /snapshot_src

[root@mu-node-64 ~]#

Fig.6

The latest snapshot is the one bearing the snapshotid 256000056. To restore the table, I’ll first setup the destination table as shown below:

[root@mu-node-64 ~]# hadoop fs -mkdir /snapshot_dst/tables

[root@mu-node-64 ~]# maprcli table create -path /snapshot_dst/tables/table01

[root@mu-node-64 ~]#

Fig.7

And the column family as shown below:

[root@mu-node-64 ~]# maprcli table cf create -path /snapshot_dst/tables/table01 -cfname 
table01_cf01
[root@mu-node-64 ~]#

Fig.8

The table copy will be carried out as shown below:

[root@mu-node-64 ~]# hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=/snapshot_dst/tables/table01 /snapshot_src/.snapshot/snapshot1.snapshot_src/tables/table01

14/05/21 14:47:32 INFO util.NativeCodeLoader: Loaded the native-hadoop library

14/05/21 14:47:32 INFO security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution

Fig. 9

Upon successful execution of the task you should be greeted with output similar to what is shown Fig. 10 and Fig. 11.

14/05/21 15:01:09 INFO mapred.JobClient: Running job: job_201405201959_0014

14/05/21 15:01:10 INFO mapred.JobClient: map 0% reduce 0%

14/05/21 15:01:23 INFO mapred.JobClient: map 100% reduce 0%

14/05/21 15:01:24 INFO mapred.JobClient: Job job_201405201959_0014 completed successfully

14/05/21 15:01:24 INFO mapred.JobClient: Counters: 17

14/05/21 15:01:24 INFO mapred.JobClient: Job Counters

14/05/21 15:01:24 INFO mapred.JobClient: Aggregate execution time of mappers(ms)=5035

14/05/21 15:01:24 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

14/05/21 15:01:24 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

14/05/21 15:01:24 INFO mapred.JobClient: Rack-local map tasks=1

14/05/21 15:01:24 INFO mapred.JobClient: Launched map tasks=1

14/05/21 15:01:24 INFO mapred.JobClient: Aggregate execution time of reducers(ms)=0

14/05/21 15:01:24 INFO mapred.JobClient: FileSystemCounters

14/05/21 15:01:24 INFO mapred.JobClient: MAPRFS_BYTES_READ=122

14/05/21 15:01:24 INFO mapred.JobClient: MAPRFS_BYTES_WRITTEN=1701

14/05/21 15:01:24 INFO mapred.JobClient: FILE_BYTES_WRITTEN=76870

14/05/21 15:01:24 INFO mapred.JobClient: Map-Reduce Framework

14/05/21 15:01:24 INFO mapred.JobClient: Map input records=1

14/05/21 15:01:24 INFO mapred.JobClient: PHYSICAL_MEMORY_BYTES=164352000

14/05/21 15:01:24 INFO mapred.JobClient: Spilled Records=0

14/05/21 15:01:24 INFO mapred.JobClient: CPU_MILLISECONDS=460

14/05/21 15:01:24 INFO mapred.JobClient: VIRTUAL_MEMORY_BYTES=2499231744

14/05/21 15:01:24 INFO mapred.JobClient: Map output records=1

14/05/21 15:01:24 INFO mapred.JobClient: SPLIT_RAW_BYTES=122

14/05/21 15:01:24 INFO mapred.JobClient: GC time elapsed (ms)=16

Fig.10

 

[root@mu-node-64 ~]# hbase shell

HBase Shell; enter 'help' for list of supported commands.

Type "exit" to leave the HBase Shell

Version 0.94.17-mapr-1403-SNAPSHOT, rbb690294807b1bf405176c2dfbcff0e815849f4e, Tue Apr 1 14:07:41 PDT 2014

 

Not all HBase shell commands are applicable to MapR tables.

Consult MapR documentation for the list of supported commands.

 

hbase(main):001:0> scan '/snapshot_src/tables/table01'

ROW COLUMN+CELL 

row1 column=table01_cf01:, timestamp=1400663440775, value=table01_value01 

1 row(s) in 0.3190 seconds

 

hbase(main):002:0> scan '/snapshot_dst/tables/table01'

ROW COLUMN+CELL 

row1 column=table01_cf01:, timestamp=1400663440775, value=table01_value01 

1 row(s) in 0.0090 seconds

 

hbase(main):003:0>

Fig.11

Mistakes to Avoid

The following common mistakes can be encountered when running the above commands, and I felt it’d be useful to have them laid out here for quick reference on how to resolve them.

Remember that MapR-DB tables have to be restored with the hbase CopyTable command. If I were to try a simplehadoop fs -cp command to attempt a restore, it would fail.

[root@mu-node-64 ~]# hadoop fs -cp /snapshot_src/.snapshot/snapshot1.snapshot_src/tables/table01 /snapshot_dst

cp: Cannot copy MDP Tables

[root@mu-node-64 ~]#

Fig.12

Remember to create the table to which you’ll restore your snapshot table data, or else you will hit errors like the ones below. Be sure to follow the steps in Fig. 7 shown earlier.

14/05/21 14:47:32 INFO fs.JobTrackerWatcher: Current running JobTracker is: mu-node-64/10.250.50.64:9001

2014-05-21 14:47:33,3158 ERROR Client fs/client/fileclient/cc/dbclient.cc:186 Thread: 140289152771840 OpenTable failed for path /snapshot_dst/tables/table01, LookupFid error No such file or directory(2)

14/05/21 14:47:33 ERROR mapreduce.TableOutputFormat: java.io.IOException: Open failed for table: /snapshot_dst/tables/table01, error: No such file or directory (2)

14/05/21 14:47:33 INFO mapred.JobClient: Cleaning up the staging area maprfs:/var/mapr/cluster/mapred/jobTracker/staging/root/.staging/job_201405201959_0012

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Open failed for table: /snapshot_dst/tables/table01, error: No such file or directory (2)

  at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:206)

  at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)

  at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

Fig. 13

Also remember to setup the column family for the restore table, or else you’ll see the errors below. If you follow steps in Fig. 8 you should be able to avoid this problem.

14/05/21 14:51:54 INFO mapred.JobClient: Running job: job_201405201959_0013

14/05/21 14:51:55 INFO mapred.JobClient: map 0% reduce 0%

14/05/21 14:52:13 INFO mapred.JobClient: Task Id : attempt_201405201959_0013_m_000000_0, Status : FAILED on node mu-node-65

java.io.IOException: Invalid column family table01_cf01

  at com.mapr.fs.PutConverter.createMapRPut(PutConverter.java:76)

 

attempt_201405201959_0013_m_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader).

attempt_201405201959_0013_m_000000_0: log4j:WARN Please initialize the log4j system properly.

14/05/21 14:52:18 INFO mapred.JobClient: Task Id : attempt_201405201959_0013_m_000000_1, Status : FAILED on node mu-node-66

java.io.IOException: Invalid column family table01_cf01

  at com.mapr.fs.PutConverter.createMapRPut(PutConverter.java:76)

 

attempt_201405201959_0013_m_000000_2: log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader).

attempt_201405201959_0013_m_000000_2: log4j:WARN Please initialize the log4j system properly.

14/05/21 14:52:32 INFO mapred.JobClient: Job job_201405201959_0013 failed with state FAILED due to: NA

14/05/21 14:52:32 INFO mapred.JobClient: Counters: 7

14/05/21 14:52:32 INFO mapred.JobClient: Job Counters

14/05/21 14:52:32 INFO mapred.JobClient: Aggregate execution time of mappers(ms)=24140

14/05/21 14:52:32 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

14/05/21 14:52:32 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

14/05/21 14:52:32 INFO mapred.JobClient: Rack-local map tasks=4

14/05/21 14:52:32 INFO mapred.JobClient: Launched map tasks=4

14/05/21 14:52:32 INFO mapred.JobClient: Aggregate execution time of reducers(ms)=0

14/05/21 14:52:32 INFO mapred.JobClient: Failed map tasks=1

[root@mu-node-64 ~]#

Fig. 14

As a side note, if you get errors about “Cannot resolve the host name” like the message below, you likely do not have a functional rDNS. But this is harmless and should let you carry out your copy.

14/05/21 14:51:47 INFO mapreduce.TableOutputFormat: Created table instance for 
/snapshot_dst/tables/table01
14/05/21 14:51:54 ERROR mapreduce.TableInputFormatBase: Cannot resolve the host name
for /10.250.50.64 because of javax.naming.NameNotFoundException: DNS name not found
[response code 3]; remaining name '64.50.250.10.in-addr.arpa'

Fig. 15

 

Happy restores!

Attachments

    Outcomes