Recently, MapR launched the MapR Data Science Refinery, a novel way to deliver data science functionality and connectivity for your MapR Converged Data Platform.
Below are the steps that are required to run this from an edge node. This could be from an on-premises server or a cloud/VM deployed edge node, and it only requires that a supported flavor of Linux be installed on the node that you intend to use. The supported Operating Systems are:
- CentOS 7.x
- Ubuntu 14
- Ubuntu 16
First, you need to install and start the Docker Environment for your operating system. You'll be given a choice between Docker Community Edition (CE) and Docker Enterprise Edition (EE), and either work for this purpose.
Once you have this installed, you need to pull the image into your local Docker image repository. Our Docker Hub is located here, and the pull command that you should use to pull the most recent version is:
$docker pull maprtech/data-science-refinery
After you've run this, you can see that this image now exists in your registry by running:
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/maprtech/data-science-refinery v1.0_6.0.0_4.0.0_centos7 <IMAGE ID>
The only piece that you have to have in place at this point, for a secure cluster, is your MapR-SASL ticket, available somewhere on this host. For steps for generating this ticket, please see this document:
We recommend creating an environment variable file instead of passing these into the Docker Run command as it's easier to spot problems. Here is an example file, 'env.list' that we pass into the Docker Run command:
MAPR_HS_HOST=<needed if you're using Pig>
Next, you simply use the Docker Run command, passing in the an . For more information on this command and options, please visit this document:
docker run --rm -it --env-file ./env.list --cap-add SYS_ADMIN --cap-add SYS_RESOURCE --device /dev/fuse -p 9995:9995 -p 10000-10010:10000-10010 -v </path/to/ticket/file>:/tmp/dsr_ticket:ro -v /sys/fs/cgroup:/sys/fs/cgroup:ro docker.io/maprtech/data-science-refinery
That's it! Now you can log into Zeppelin by visiting the UI at the following address:
And you log in using the credentials that you provided in the Docker Run command. The authorization for the jobs themselves–whether Spark, POSIX, or JDBC–is provided by your MapR-SASL ticket.
In addition, you can peruse the file system using POSIX or Hadoop syntax from the CLI or Zeppelin. This is made possible by the MapR POSIX Client For Containers, which allows MapR customers to mount their global namespace to their Docker container.
$ ls -la /mapr/my.cluster.com/
drwxr-xr-x 10 mapr mapr 9 Nov 27 08:55 .
dr-xr-xr-x 3 root root 1 Dec 16 17:43 ..
drwxr-xr-x 3 mapr mapr 1 Nov 27 08:51 apps
drwxr-xr-x 2 mapr mapr 0 Nov 27 08:48 hbase
$ hadoop fs -ls /
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Found 8 items
drwxr-xr-x - mapr mapr 1 2017-11-27 08:51 /apps
drwxr-xr-x - mapr mapr 0 2017-11-27 08:48 /hbase
After running the Docker Run command, you see the following error:
Started service mapr-posix-client-container [FAILED]
This error can be safely ignored as it is a remnant of an issue with the MapR Persistent Application Client Container (PACC).
You're prompted to go to an unsafe site by your web browser when visiting the Apache Zeppelin UI:
This is okay and expected behavior if you haven't installed an SSL certificate for this instance.
More troubleshooting information can be found here: