Will Apache Zeppelin be added to MapR's roadmap as a support application?
Apache Zeppelin is a very promising notebook initiative. Several of MapR customers are using and/or experimenting with it. You can run Zeppelin on MapR (and with Apache Drill) . Here is a useful page for you
The Hue project that MapR packages and supports has started to also introduce some preliminary notebook support.
Joel - I would love to hear your experience with Zeppelin and any other notebook style tools you have used.
At this point, we have not seen any major push from our customers to add Zeppelin to MapR distribution.
Anoop Dawar Please provide your insights on this.
I am working with Zeppelin right now, I am currently running through using Zeppelin in docker on a MapR based Zeta Architecture install (Based on the work done by Jim Scott at MapR). My main work is with Apache Drill, and the JDBC interpreter, I plan on doing more with PySpark and other interpreters as well. This far, I collecting notes, but will summarize here when I get further in my analysis. It's very promising at this point. I'd love to keep a topic like this going, and just tossing out observations, interest levels etc, perhaps we can even work out a basic tutorial for how to (in an unsupported fashion at this point) get it working with existing MapR installations.
Splendid John Omernik. Community would immensely benefit from such first-hand experiences shared. Looking forward to more updates.
So I just finished up my "raw" testing. What I mean by this, is "can I get this to work" and the answer is a resounding yes. I am running a version of Jim Scott's Zeta Architecture, thus, I had some requirements for Zeppelin. I wanted it to run within a Docker container, I wanted it to be able to interact with Drill and Spark out of the box, I wanted to understand how to create more instantiations of Zeppelin (one for each user for example) and then "come up for air" (with this post.
So, my first challenge with using MapR/Spark on Mesos. This wasn't anyone fault but my own in that I had a faulty upgrade process to MapR 5.1 and my links in HADOOP_HOME to all the MapR libs in /opt/mapr/lib did not created. This caused lots of issues, until Yuliya Feldman was able to help me troubleshoot this issue. Awesome assistance! Fixing this allowed Mesos/Spark/MapR to interact properly, as well as Myriad to work again.
So, my first challenge with the Docker containers, is to submit an app to Mesos from within a docker container you need the libmesos.so files... which means you need a lot of other dependancies. That sounded like a fat docker container, so I decided to go the Yarn route (running on Myriad).
After getting Myriad working, and spark working on Myriad, I started playing with running pyspark from within a container. As you can see from another post on the boards here, it was a crazy bout of frustrating back and forth. Yarn wouldn't work in bridged mode, MapR wouldn't work with Spark in Host mode. Eventually, I went the Mesos route again. I created a docker file that included all the dependancies for Mesos. And then abused volume mounting in Marathon/Docker/Mesos. Since every node was a MapR node, I passed through /opt/mapr and /usr/local/lib. I guess could have copied the libmesos into the container, but frankly, it was getting large as it is. I will revisit this when I start to clean it up. I also mapped a spark location and hadoop location from maprfs into the container (RO). This helped with pathing and some other things, like I said, this goal was to get things together, not to make it pretty.
Basically, with that, and then running in host mode. (I had to use host mode, because I use auth in mesos, and apparently it requires that when authing frameworks)
All worked, I was able to query both drill using the jdbc interpreter (remember to add the drill jdbc file to /interpreters/jdbc) and using spark.
So far so good, my next steps are to clean up and package to make it easier to deploy in this setup.
RUN apt-get update
RUN apt-get install -y openjdk-7-jre python python-dev
RUN adduser --disabled-password --gecos '' --uid=700 mapr
RUN apt-get -y install build-essential python-dev python-boto libcurl4-nss-dev libsasl2-dev libsasl2-modules maven libapr1-dev libsvn-dev
CMD ["python -V"]
"PRODUCTION_READY":"True", "CONTAINERIZER":"Docker", "ZETAENV":"Prod"
"ZEPPELIN_MEM":"-Xms1024m -Xmx1024m -XX:MaxPermSize=512m",
We are looking to explore script+visualization tools like zeppelin, iPythonNotebook etc. It will be mainly for sharing various findings that helps stakeholder to analyze data faster and make decision. I think integration with any of this open source tool would be a great help.
Retrieving data ...