AnsweredAssumed Answered

Issues faced during Data Import from My sql to MapR_FS and or MapR-DB

Question asked by anirban.das on Sep 5, 2016
Latest reply on Sep 8, 2016 by mufeed
Branched to a new discussion
  1. We have 5node cluster with 16GB RAM support and 185 GB each storage pool. We have around 85 GB data in our MYSQL db. We want to push that data into MapR-DB table. We explored a lot and found some way to do that but need some good assistance from MapR team

                    Sample Code/ Commands/Details steps are required

     

    • Can we push all the data at a time to MapR FS by using sqoop ?
    • Can we push data from MapR- FS to MapR-DB table ? We need an example (sample coding, details steps ).
    • Is there any possibility to send this data directly from my sql to Mapr Table using sqoop? If yes then can you please provide sample command ?
    • We have to push this sql data into MapR-DB table with multiple column family.      Is there any sample code/commands/steps to push data into multiple column family in mapr table?      In this case how we can map with columns against the column family?
    • We have tried to push this sql data into hbase by sqoop in 2 commands. First commands ran successfully where we able to push data(5 columns data) into one column family in hbase.
    • But when we tried to push another set columns data into habse with another column family, the cluster goes down. We have seen that huge memory consumption is happening over there. Almost 95% disk space are occupied during the 2nd commands execution of sqoop. Ultimately the 2nd commands did not get success and Hbase table is populated with partial data.
    • We have also seen that Hbase Region server (5) goes down in this situation.We have to know how we can configure the cluster where we can see each node distribute the load equivalently ?
    • Why the disk space took such a large? We know the data volume is 85? Max it can take 85*3=255 Gb space. Where 3 is the replication factor by default. Don’t know why that much of memory and disk space occupied. Are there any suggestion to handle bulk load data into mapr-fs /Mapr-DB, so disk space issue can be optimized.
    • Same process we have to execute where data source will be a File System instead of My sql DB.

Outcomes