Is it possible to configure sqoop with the map client to run on my local machine? I would like to use my local instance of sqoop to trigger a sqoop job to move data held in hive tables on my tenancy to a SQL server. How can I do this?
Invite Sqoop expert, prakhar verma, to share his knowledge.
The sqoop job is a map/reduce job so you can launch the job from any edge node, which is a machine that is cluster aware and has the mapr/client code on it.
Since you indicated that you want to push data from Hadoop to a SQL Server, it doesn't matter where the SQL engine resides as long as it is up and running and allows JDBC connections from the MapR Cluster.
Hi David Lu
Please check below sqoop documentation
Sqoop - MapR 5.0 Documentation - doc.mapr.com
Assuming you will be using sqoop2
You have to perform 2 steps:
Let me know if you have any doubts.
If you're going to use sqoop, then you're probably going to have HiveServer2 running Hive on top of MapRDB / HBase.
Once you extract the data from MapRDB, you might as well as put it into a format that you can do a fast load in to the targeted DB. Some RDBMSs offer a bulk load capability, YMMV depending on RDBMS, and it will be faster.
The reason I don't like sqoop is that performance is based on the assumption that your data will be evenly distributed over the entire set of workers and that you can have enough parallel workers connecting to the RDBMS.
Retrieving data ...