Has anyone setup Dr Elephant on a cluster. Any feedback and can you provide information on the setup?
GitHub - bretlowery/dr-elephant-mapr: MapR compatible fork of Dr Elephant 2.0.6
Open Sourcing Dr. Elephant: Self-Serve Performance Tuning for Hadoop and Spark | LinkedIn Engineering
Installing Dr.Elephant on MapR
yum install -y java-1.8.0-openjdk-devel
Check that java version points to 1.8
[root@maprdemo logs]# java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-b15)
OpenJDK 64-Bit Server VM (build 25.111-b15, mixed mode)
yum install -y nodejs
yum install –y npm
Download the file
Unzip the file, make sure you unzip the file into a folder that you have write access, I have downloaded the file into /home/mapr
Add activator to your path, and also add it to your login profile $HOME/.profile
yum install –y git
git clone https://github.com/linkedin/dr-elephant.git
Change the Hadoop & Spark versions in compile.conf
First create a user elephant, login as root to mysql, change localhost according to your hostname settings.
mysql –u root
>CREATE USER 'elephant'@'localhost' IDENTIFIED BY 'elephant';
>CREATE DATABASE drelephant;
> GRANT ALL PRIVILEGES ON *.* TO 'elephant'@'localhost';
> FLUSH PRIVILEGES;
And create the tables & indexes from the files (1.sql, 2.sql, 3.sql) listed under, don’t use the entire sql script as is, as these scripts have drop statements as well. These scripts are under the dr-elephant dir as well /home/mapr/dr-elephant/conf/evolutions/default
Before you start dr.elephant disable evolutions in
Now start Dr.elephant
Make sure dr.elephant started with out any errors, check the dr.log
Change the hostname/ip according to your env, you should be able to see the Dr.elephant dashboard
hadoop jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0-mapr-1607.jar pi 100 100
After the job completes you could see the analysis on Dr.elephant UI
Hi Bret Lowery,
I noticed that you published a personal blog about Dr. Elephant. Could you please share your experience and knowledge here.
Thank you very much!
Any update on this?
Able to git clone the project and set up activator play to compile. While starting the dr.elephant, I see the below as it is unable to find find /bin/..
# ./start.sh /opt/dr-elephant-mapr-master/app-conf/Using config dir: /opt/dr-elephant-mapr-master/app-confUsing config file: /opt/dr-elephant-mapr-master/app-conf/elephant.confReading from config file...db_url: <host>:3306db_name: drelephantdb_user: drelephant_userhttp port: 8010error: I couldn't find any dr. Elephant executable.
Thank you so much for sharing the latest update. Still waiting for any members who have Dr. Elephant knowledge/experience to share.
Thank you. The above works with GitHub - bretlowery/dr-elephant-mapr: MapR compatible fork of Dr Elephant 2.0.6. I'm able to view the mapreduce jobs in the elephant dashboard but not for spark jobs with the below error. Tried to set event_log_dir in FetcherConf.xml but, shows the same. Any suggestions?
ERROR [dr-el-executor-thread-0] com.linkedin.drelephant.ElephantRunner : can't find Spark conf; please set SPARK_HOME or SPARK_CONF_DIR
ERROR [dr-el-executor-thread-0] com.linkedin.drelephant.ElephantRunner : java.lang.IllegalStateException: can't find Spark conf; please set SPARK_HOME or SPARK_CONF_DIR
Also invite Bret Lowery, who wrote Dr. Elephant for MapR, to help to diagnose and share his knowledge.
Retrieving data ...