AnsweredAssumed Answered

MapR+Streamsets batch mode: cluster start timeout error?

Question asked by reedv on Jan 12, 2018
Latest reply on Jan 16, 2018 by maprcommunity

Trying to use streamsets to move simply move data from a mapr FS (tsv) origin to a mapr FS destination (json) in cluster batch mode (mapr 6.0). The sdc configurations have been set up following this video for integrating mapr and streams. Successfully able to validate and "preview" the pipeline in the sdc web UI, but when actually trying to start the pipline, the UI says "Starting" for a long time and then throws an error that 'the cluster application was unable to start'.

2018-01-12 13:42:28,605      ingest2sa_demodata_batch/ingest2sademodatabatchadca8442-cb00-4a0e-929b-df2babe4fd41      
ERROR      Unexpected error starting pipeline:
    java.lang.IllegalStateException: Timed out after waiting 121 seconds for for cluster application to start. Submit command is not alive.      
    ClusterRunner      *admin           runner-pool-2-thread-1
java.lang.IllegalStateException: Timed out after waiting 121 seconds for for cluster application to start. Submit command is not alive.
     at com.streamsets.datacollector.cluster.ClusterProviderImpl.startPipelineInternal(
     at com.streamsets.datacollector.cluster.ClusterProviderImpl.startPipeline(
     at com.streamsets.datacollector.execution.cluster.ClusterHelper.submit(
     at com.streamsets.datacollector.execution.runner.cluster.ClusterRunner.doStart(
     at com.streamsets.datacollector.execution.runner.cluster.ClusterRunner.start(
     at com.streamsets.datacollector.execution.runner.common.AsyncRunner.lambda$start$3(
     at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.lambda$call$0(
     at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$
     at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.lambda$call$0(
     at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$
     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(
     at java.util.concurrent.ScheduledThreadPoolExecutor$
     at com.streamsets.datacollector.metrics.MetricSafeScheduledExecutorService$
     at java.util.concurrent.ThreadPoolExecutor.runWorker(
     at java.util.concurrent.ThreadPoolExecutor$

Yet checking the sdc process owner and mapr login configuration I can see:

ps -aux | grep sdc | grep maprlogin
mapr 43341 24.4 1.7 6939680 1192992 ? Sl 13:38 1:57 /usr/bin/java
-classpath /opt/streamsets-datacollector/libexec/bootstrap-libs/main/streamsets-datacollector-bootstrap-* 


[root@mapr002 libexec]# netstat -an | grep 18630
tcp6       0      0 :::18630                :::*                    LISTEN    
tcp6       0      0      ESTABLISHED
tcp6       0      0      ESTABLISHED
tcp6       0      0      ESTABLISHED

Furthermore, looking at the streamsets docs for mapr, it says:

To run MapR commands in the cluster, Data Collector can run as a user account granted access in a MapR user ticket. For example, if a user ticket is generated for the "myuser" user account, then configure Data Collector to run as the "myuser" user account.

Yet, I have impersonation mode activated as evidenced by:

[mapr@mapr002 ~]$ ps -aux | grep sdc | grep mapr
mapr      43341  0.3  1.9 7674492 1286284 ?     Sl   Jan12  13:00 ... -Dmaprlogin.password.enabled=true ...

and running wordcount from the mapr-provided example jar to read from the intended origin and write to the intended destination mapr FS locations seems to run fine:

[mapr@mapr002 ~]$ hadoop jar \
/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0-mapr-1710.jar \
wordcount \
/path/to/origin/data/dir/ \
This happens even when setting HADOOP_CONF_DIR=<$SDC_DIST/resources/hadoop-conf-dir-I-made> before the hadoop command to use the same resources as what sdc would see.

These outputs plus the fact that the pipeline can be previewed and validated in the sdc web UI make me confused about what is not working here. Does anyone have a similar problem and know what can be done (and why)?