AnsweredAssumed Answered

Impala Catalog Services High Availability

Question asked by rpillai on Feb 22, 2018
Latest reply on Feb 23, 2018 by Murshid Chalaev

I have impala catalog server installed on two servers. One active and one on standby mode. 

But on my impala's env.sh, I only have 1 server mentioned. So when it goes down my impala based application goes down unless I change the env.sh on all my nodes to point to the active catalog server and restart impala daemon or fail over back the impala catalog server mentioned on the env.sh.   How can I make this HA ?  I want the impala server to automatically connect to active catalog server . 

Here is how my env.sh looks like 

 

export IMPALA_HOME=/opt/mapr/impala/impala-2.5.0
export MAPR_HOME=/opt/mapr
export IMPALA_VERSION=2.5.0
export LIBHDFS_OPTS="-Dhadoop.login=hybrid -Dhadoop.login=hybrid_keytab -Djavax.security.auth.useSubjectCredsOnly=false"

# Get the generic mapr environment variables
[ -f ${MAPR_HOME}/conf/env.sh ] && . ${MAPR_HOME}/conf/env.sh

# This MUST point to the node running statestore
IMPALA_STATE_STORE_HOST=my-hadoop-p4.rp.local
IMPALA_STATE_STORE_PORT=24000
CATALOG_SERVICE_HOST=my-hadoop-p1.rp.local

#Set the Shared Memory to 128 MB
export MAPR_CLIENT_SHMEM=16384

# These impact the impala server and can be optionally changed
IMPALA_BACKEND_PORT=22000
IMPALA_LOG_DIR=${IMPALA_HOME}/logs
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-use_statestore \
-authorized_proxy_user_config=mapr=* \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-catalog_service_host=${CATALOG_SERVICE_HOST} \
-be_port=${IMPALA_BACKEND_PORT} \
-mem_limit=50% \
-idle_session_timeout=50 \
-idle_query_timeout=20 \
-query_log_size=1000 \
"

# These impact the state store daemon and can be optionally changed
IMPALA_STATE_STORE_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-catalog_service_host=${CATALOG_SERVICE_HOST} \
"

IMPALA_CATALOG_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-use_statestore \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
"

# for troubleshooting
ENABLE_CORE_DUMPS=false

# Impala figures these out at runtime, but they can be overridden here.
# (Normally, they should be commented out.)
# HIVE_HOME=${MAPR_HOME}/hive/hive-*
# HBASE_HOME=${MAPR_HOME}/hbase/hbase-*
# HADOOP_HOME=${MAPR_HOME}/hadoop/hodoop-*

# No longer used ...
# LIBHDFS_OPTS=-Djava.library.path=/usr/lib/impala/lib
# MYSQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar

Outcomes