AnsweredAssumed Answered

Facing issues while running Spark jobs on Jupyterhub

Question asked by Arunav on Jun 5, 2017
Latest reply on Jun 6, 2017 by maprcommunity


I'm facing errors while running Spark jobs with Jupyterhub on a MapR cluster. 


I have a 5 node MapR cluster running on RHEL 6.8 and MapR 5.2. The default Python version is 2.6. I've Jupyterhub installed on one of the servers. The Jupyterhub has 2 kernels py2(python 2.7) and py3(python3.5).

I'm launching spark code with the py2(python2.7) kernel of Jupyterhub. I get error saying:


Here's my code:

import findspark


import pyspark

from pyspark.sql import spark.session

from pyspark.sql.types import *

from pyspark import SparkConf, SparkContext


conf = SparkConf().set('spark.yarn.dist.files','file:/opt/mapr/spark/spark-2.0.1/python/lib/, 'file:/opt/mapr/spark/spark-2.0.1/python/lib/').setExecutorEnv('PYTHONPATH','').set('spark.yarn.appMasterEnv.PYSPARK_PYTHON','/usr/bin/python').set('spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON','/usr/bin/python').setMaster("yarn").setAppName("SparkTest").set("park.executor.instances","10").set("spark.executor.cores","3").set("spark.executor.memory","8G")


sc = pyspark.SparkContext(conf = conf)
spark = pyspark.sql.SparkSession(sc)



As I understand, while running Spark codes with Jupyterhub, the Spark Driver is picking up the Python version on Jupyterhub(python2.7) and the worker version gets the default python version of the system(2.6).

I ran the same code with spark-submit and the code runs successfully there. Evidently in this case both the worker and the driver is picking up the system's default python version(2.6).


As per some of the suggestions I got online, on this error, I've used the below parameters in the code, to direct the Spark driver to the default python version(2.6). But it doesn't have the expected result.





Can someone please guide me on how to resolve this error? 



Arunava Addy