Rachel Silver

How To: Collaborate & Share Notebooks using the MapR Data Science Refinery

Blog Post created by Rachel Silver Employee on Nov 29, 2017

So, let's say you want to share your notebooks with a colleague or just utilize the persistent storage provided by MapR-FS. How would you go about doing this?

 

Convergence and portability are at the heart of our design of the MapR Data Science Refinery container. The goal was to create something really agile that would allow our customers to create turn-key development environments that they could spin up or down as needed, while still leveraging their MapR Converged Data Platform as a persistent store.

 

The ability to leverage the global namespace from the container is exactly the sort of ability that you need to create a secure collaborative environment. Why create a new space with all of the security and IT overhead involved, when you already have all of this securely in place in your cluster? Simplicity is key here.

 

A typical container can only access the space inside the container and the underlying filesystem on which you're running Docker. That's fine for some use cases but doesn't really take advantage of the benefits that the MapR Converged Data Platform offers:

 

 

So, we've included the MapR POSIX Client for Containers in our build, enabling you to access your global namespace:

 

   

 

And, since security is handled by passing a security ticket into the container, you (and your collaborators) only have access to those parts of the file system that you've explicitly been granted permissions for.

 

So, let's say that you've created a space in your global namespace, where you want to share notebooks with others, and that this space is:

/user/mapr/zeppelin/shared-notebooks

 

This space contains a notebook that our very own Ian Downard has shared with you, called Churn Prediction with Spark:

 

 

In order to access this notebook using the Data Science Refinery container, it's as simple as pointing Zeppelin, in the Docker run command, to this directory. The switch that you need for this is:

 

-e ZEPPELIN_NOTEBOOK_DIR=/mapr/my.cluster.com/user/mapr/zeppelin/shared-notebooks/

 

Now, when you spin up your container and log into Zeppelin, you'll see that this notebook has been added to your list:

 

 

In addition, Zeppelin has built-in integration, including versioning support with Git. Read more here:

Apache Zeppelin 0.7.3 Documentation: Notebook Storage for Apache Zeppelin 

 

You can watch a demo of how to do this collaboration here (time pinned to beginning of relevant demo):

Self Service Data Science for ML AI - YouTube 

 

Related Resources

Outcomes