Recently, MapR launched the MapR Data Science Refinery, a novel way to deliver data science functionality and connectivity for your MapR Converged Data Platform.
One of the great advantages to this product is the ability to deploy this workspace from wherever you choose to do your work: an edge node, a cloud instance, or even your personal laptop!
Below are the steps that are required to run the MapR Data Science Refinery from a Mac.
First, you need to install and start the Docker Environment for your operating system. You'll be given a choice between Docker Community Edition (CE) and Docker Enterprise Edition (EE), and either work for this purpose.
There are some basic commands for Mac here:
Get started with Docker for Mac | Docker Documentation
If you want to enable Shell completion, for example, you need to create symlinks to these files:
ln -s /Applications/Docker.app/Contents/Resources/etc/docker.bash-completion /usr/local/etc/bash_completion.d/docker
ln -s /Applications/Docker.app/Contents/Resources/etc/docker-machine.bash-completion /usr/local/etc/bash_completion.d/docker-machine
ln -s /Applications/Docker.app/Contents/Resources/etc/docker-compose.bash-completion /usr/local/etc/bash_completion.d/docker-compose
Once you have this installed, you need to pull the image into your local Docker image repository. Our Docker Hub is located here, and the pull command that you should use from your Mac terminal to pull the most recent version of the CentOS image is:
docker pull maprtech/data-science-refinery:v1.0_6.0.0_4.0.0_centos7
After you've run this, you can see that this image now exists in your registry by running:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/maprtech/data-science-refinery v1.0_6.0.0_4.0.0_centos7 <IMAGE ID>
The only piece that you have to have in place at this point, for a secure cluster, is your MapR-SASL ticket, available somewhere on this host. For steps for generating this ticket, please see this document:
Administrator's Reference for 'maprlogin'
Next, you simply use the Docker Run command, passing in the highlighted variables as needed. For more information on this command and options, please visit this document:
Understanding Zeppelin Docker Parameters
docker run -it -p 9995:9995 \
-e HOST_IP=<docker-host-ip> -p 10000-10010:10000-10010 \
-e MAPR_CLUSTER=<cluster-name> -e MAPR_CLDB_HOSTS=<cldb-ip-list> \
-e MAPR_CONTAINER_USER=<user-name> -e MAPR_CONTAINER_PASSWORD=<password> -e MAPR_CONTAINER_GROUP=<group-name> -e MAPR_CONTAINER_UID=<uid> -e MAPR_CONTAINER_GID=<gid> -e MAPR_TICKETFILE_LOCATION= </path/to/ticket/file> -e MAPR_MOUNT_PATH=/mapr \
--cap-add SYS_ADMIN --cap-add SYS_RESOURCE --device /dev/fuse \
-e MAPR_HS_HOST=<historyserver-ip> -e ZEPPELIN_NOTEBOOK_DIR=<path-for-notebook-storage> \
-e MAPR_TZ=<time-zone> -v </path/to/ticket/file>:/tmp/mapr_ticket:ro maprtech/data-science-refinery:latest
That's it! Now you can log into Zeppelin by visiting the UI at the following address:
And you log in using the credentials that you provided in the Docker Run command. The authorization for the jobs themselves–whether Spark, POSIX, or JDBC–is provided by your MapR-SASL ticket.
In addition, you can peruse the file system from inside the container using POSIX or Hadoop syntax from the CLI or Zeppelin. This is made possible by the MapR POSIX Client For Containers, which allows MapR customers to mount their global namespace to their Docker container.
$ ls -la /mapr/my.cluster.com/
drwxr-xr-x 10 mapr mapr 9 Nov 27 08:55 .
dr-xr-xr-x 3 root root 1 Dec 16 17:43 ..
drwxr-xr-x 3 mapr mapr 1 Nov 27 08:51 apps
drwxr-xr-x 2 mapr mapr 0 Nov 27 08:48 hbase
$ hadoop fs -ls /
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Found 8 items
drwxr-xr-x - mapr mapr 1 2017-11-27 08:51 /apps
drwxr-xr-x - mapr mapr 0 2017-11-27 08:48 /hbase
After running the Docker Run command, you see the following error:
Started service mapr-posix-client-container [FAILED]
This error can be safely ignored as it is a remnant of an issue with the MapR Persistent Application Client Container (PACC).
You're prompted to go to an unsafe site by your web browser when visiting the Apache Zeppelin UI:
This is okay and expected behavior if you haven't installed an SSL certificate for this instance.
More troubleshooting information can be found here:
Troubleshooting Data Science Refinery