MichaelSegel

Securing Zeppelin

Blog Post created by MichaelSegel on Oct 5, 2017

Zeppelin is an Apache open source Notebook that supports multiple interpreters and seems to be one of the favorites for working with spark.

 

Zeppelin is capable of running on a wide variety of platforms so that it’s possible to run tests and perform code development away from working on a cluster or on a cluster.

 

As always, in today’s world, it is no longer paranoia to think about securing your environment.   This article focuses on the setup to secure Zeppelin and is meant as a supplement focusing on adding security.

 

Zeppelin itself is easy to install. You can ‘wget’ or download the pre-built binaries from the Apache Zeppelin site, follow the instructions and start using it right away.

 

The benefits are that you can build a small test environment on your laptop without having to work on a cluster. This reduces the impact on a shared environment.

 

However, its important to understand how to also set up and secure your environment.

 

Locking down the environment

 

Running Local

Zeppelin can run local on your desktop platform. In most instances your desktop doesn’t have a static IP address.   In securing your environment, you will want to force Zeppelin to only be available to the localhost/127.0.0.1 environment.

 

In order to set this up, two entries in the $ZEPPELIN_HOME/conf/zeppelin-site.xml file have to be set.

 

<property>

<name>zeppelin.server.addr</name>

<value>127.0.0.1</value>

<description>Server address</description>

</property>

 

<property>

<name>zeppelin.server.port</name>

<value>9080</value>

<description>Server port.</description>

</property>

 

Setting the zeppelin.server.addr to only listen on the localhost address will mean that no one from the outside of the desktop will be able to access the Zeppelin service.

 

Setting the zeppelin.server.port to a value other than the 8080 the default is done because that port is the default for many services. By going to a different and unique port you can keep this consistent between the instances on your desktop and on a server. While this isn’t necessary, it does make life easier.

 

Beyond those two settings, there isn’t much else that you need to change.

Notice that there are properties that allow you to set up SSL tickets. While the documentation contains directions on how to set up SSL directly, there is a bug where trying to run with pkcs12 certificates causes an error. In trying to follow up, no resolution could be found. The recommendation is to use a proxy server nginx for managing the secure connection. (More on this later.)

 

Since the only interface is on the 127.0.0.1 interface, SSL really isn’t required.

 

The next issue is that by default, there is no user authentication. Zeppelin provides this through Shiro. From the Apache Shiro website:

Apache Shiro™ is a powerful and easy-to-use Java security framework that performs authentication, authorization, cryptography, and session management. With Shiro’s easy-to-understand API, you can quickly and easily secure any application – from the smallest mobile applications to the largest web and enterprise applications.

(see: https://shiro.apache.org/)


While it may be ok to run local code as an anonymous user, its also possible for Zeppelin to run locally, yet access a cluster that is maintained remotely which may not accept anonymous users.

 

In order to setup shiro, just copy the shiro.ini.template to shiro.ini.

Since my SOHO environment is rather small and limited to a handful of people, I am not yet running LDAP. (Maybe one day … ) So I practice the K.I.S.S principle. The only thing need from shiro is to set up local users.

[users]

# List of users with their password allowed to access Zeppelin.

# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections

#admin = password1, admin

#user1 = password2, role1, role2

#user2 = password3, role3

#user3 = password4, role2

If you find the [users] section, you’ll notice that the list of admin and userX entries are not commented out. Any entry in the form of <user> = <password>, <role> [ , <role2>, <role3> … ]   will be active. So you really don’t want to have an entry here that will give someone access.

 

So you will need to create an entry for yourself and other users.

 

Note: There are other entries below this section. One section details where you can use LDAP for authenticating users. If you are running LDAP, it would be a good idea to set this up to use LDAP.

 

Once you’ve made and saved the changes, that’s pretty much it. You can start/restart the service and you will authenticate against the entry in shiro.

 

Note the following:

While attempting to follow the SSL setup, I was directed to a stack overflow conversation on how to accomplish this. IMHO, it’s a major red flag when the documentation references a stack overflow article on how to setup and configure anything.

Running on a Server


Suppose you want to run Zeppelin on your cluster? Zeppelin then becomes a shared resource. You can set Zeppelin to run spark contexts per user or per notebook instead of one instance for the entire service.

The larger issue is that you will need to use an external interface to gain access, and even if you’re behind a firewall, you will need to have SSL turned on. Because I want to be able to run notes from outside of my network, I have to have SSL in place.   As I alluded to earlier, the ability to configure SSLs from within Zeppelin wasn’t working and the only guidance was to instead set up a proxy using nginx. (This came from two reliable sources)

 

With nginx, you should use the same configuration that we have already set up. Zeppelin will only listen on the local host and rely on the proxy server to handle external connections. Since my linux server is sitting next to me, I have a monitor set up so I can easily test connections to the local host, ensuring that my zeppelin instance is up and running. I followed the same steps that I used to set up my desktop and it ran without a hitch.

 

Unlike the instructions for trying to set up SSL directly, the information found on the Zeppelin site was very helpful. You can find a link to it here:

https://zeppelin.apache.org/docs/0.7.3/security/authentication.html#http-basic-authentication-using-nginx

 

There are various ways of obtaining nginx, since I run Centos, I could have pulled down a version via yum, and of course if you run a different version of Linux, you can use their similar tool. Of course downloading it from the official site will get you the latest stable release.

While the documentation for setting up nginx w Zeppelin is better, there are still gaps… yet its still pretty straight forward.

 

Nginx installs in /etc/nginx directory. Under this directory, all of the configuration files are located in the ./conf.d directory.   I should have taken better notes, but going from memory, there was one file… the default configuration file. I suggest that you ignore the file and copy it in to another file name. I chose default.conf.ini . Based on the documentation, I was under the impression that nginx will look at all *.conf files for various setup data.

 

I then created a zeppelin.conf file and cut and pasted the section from the zeppelin documents.

 

upstream zeppelin {

   server [YOUR-ZEPPELIN-SERVER-IP]:[YOUR-ZEPPELIN-SERVER-PORT];   # For security, It is highly recommended to make this address/port as non-public accessible

}

 

# Zeppelin Website

server {

   listen [YOUR-ZEPPELIN-WEB-SERVER-PORT];

   listen 443 ssl;                                     # optional, to serve HTTPS connection

   server_name [YOUR-ZEPPELIN-SERVER-HOST];             # for example: zeppelin.mycompany.com

 

   ssl_certificate [PATH-TO-YOUR-CERT-FILE];           # optional, to serve HTTPS connection

   ssl_certificate_key [PATH-TO-YOUR-CERT-KEY-FILE];   # optional, to serve HTTPS connection

 

   if ($ssl_protocol = "") {

       rewrite ^ https://$host$request_uri? permanent; # optional, to force use of HTTPS

   }

 

   location / {   # For regular websever support

       proxy_pass http://zeppelin;

       proxy_set_header X-Real-IP $remote_addr;

       proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

       proxy_set_header Host $http_host;

       proxy_set_header X-NginX-Proxy true;

       proxy_redirect off;

       auth_basic "Restricted";

       auth_basic_user_file /etc/nginx/.htpasswd;

   }

 

   location /ws { # For websocket support

       proxy_pass http://zeppelin/ws;

       proxy_http_version 1.1;

       proxy_set_header Upgrade websocket;

       proxy_set_header Connection upgrade;

       proxy_read_timeout 86400;

   }

}

As you can see, the configuration is pretty straight forward. Note the comment in the upstream zeppelin section. This is why using the loopback / localhost interface is a good idea.

 

Since the goal of the use of nginx is to create a secure (SSL) interface to Zeppelin, we need to create a public/private key pair. A simple use of Google will turn up a lots of options. Note: If you don’t have OpenSSL already installed on your server, you should set it up ASAP.   Using OpenSSL, the following command works:

 

openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365

 

Note that this will create a 4096 bit key. That’s a bit of an overkill for this implementation however, the minimum key length for SSL connection these days is 2048 so its not really too long.

 

Note: If you use this command as is, you will be required to provide a simple password which is used to encrypt the key. The downside to that is that each time you want to start/stop the web service, you will be required to manually enter the passphrase. Using the –nodes option will remove this requirement, however the key is visible. You can change the permissions on the key file to control access.

 

For ngenix, I created the key pair in the ./conf.d directory and set their paths in the zeppelin.conf file.

 

After the edits, if you start the service, you’re up and running.

Well almost….

 

Further Tweaking

 

If you try to use the service, nginx asks your for a user name and password.

   location / {   # For regular websever support

       proxy_pass http://zeppelin;

       proxy_set_header X-Real-IP $remote_addr;

       proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

       proxy_set_header Host $http_host;

       proxy_set_header X-NginX-Proxy true;

       proxy_redirect off;

       auth_basic "Restricted";

       auth_basic_user_file /etc/nginx/.htpasswd;

   }

In this section, the auth_basic setting is set to “Restricted” indicating that a password check has to be performed.

The setting auth_basic_user_file is set to the path of the password file.

 

The instructions on how to set this up are also found within Zeppelin’s setup page.   While this is secure… you enter a password at the proxy before being able to use the proxy to the protected web service, does this make sense? You need a password to access a website that again asks you to log in before you can use it? Our goal in using nginx was to set up SSL so that any traffic between the client and the server is encrypted and not out in the open.   For our use case, it makes more sense that if you connect to the service that you want it to establish an SSL socket and then take you to your zeppelin service where you could then authenticate.

 

The simple fix is to set auth_basic to off.   This allows you to still authenticate the user without having to log in twice and your notebooks do not run as ‘anonymous’.

 

In Summary

 

Running Zeppelin out of the box with no security is not a good idea. This article helps to demonstrate some simple tweaks that help lock down your environment so that you can run Zeppelin on a desktop or connect to a service running on the edge of your cluster.


I am sure that there are other things that one could do to further lock down Zeppelin or tie in to your network. At a minimum you will want to authenticate your users via shiro as well as offer a SSL connection.

 

With Zeppelin up and running, now the real fun can begin.

Outcomes