AnsweredAssumed Answered

Python packages in the Spark/Hadoop cluster and Edge Nodes

Question asked by sambitkumohanty183 on Aug 4, 2017
Latest reply on Aug 7, 2017 by maprcommunity

How to use the below python packages in the Spark/Hadoop cluster:

  • numpy
  • scikit-learn
  • pandas


Also do we need to install it on both Edge node as well as cluster node and all the executor node?

The end goal is to make this dependency available to each executor nodes where the spark jobs executes in both yarn-client and yarn-cluster mode.

I am not sure if MapR already have a way to manage python dependency centrally and push to executor nodes at run time.