AnsweredAssumed Answered

Why Yarn jobs fail?

Question asked by ANIKADOS on Nov 7, 2017
Latest reply on Nov 9, 2017 by cathy


I try to launch a mapreduce job, but I get an error while excuting the jobs in shell or in hive :

 

hive> select count(*) from employee ; Query ID = mapr_20171107135114_a574713d-7d69-45e1-aa73-d4de07a3059b Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1510052734193_0005, Tracking URL = http://hdpsrvpre2.intranet.darty.fr:8088/proxy/application_1510052734193_0005/ Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1510052734193_0005 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2017-11-07 13:51:25,951 Stage-1 map
= 0%, reduce = 0% Ended **Job = job_1510052734193_0005 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: MAPRFS Read: 0 MAPRFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 mse

 

 

 

 

 

 

 

in Ressourcemanager logs that what I find :

 

 

2017-11-07 13:51:25,269 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1510052734193_0005_000002 State change from LAUNCHED to FINAL_SAVING
2017-11-07 13:51:25,269 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1510052734193_0005_000002 at: /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/application_1510052734193_0005/appattempt_1510052734193_0005_000002
2017-11-07 13:51:25,283 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1510052734193_0005_000002
2017-11-07 13:51:25,283 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Application finished, removing password for appattempt_1510052734193_0005_000002
2017-11-07 13:51:25,283 **INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1510052734193_0005_000002 State change from FINAL_SAVING to FAILED**
2017-11-07 13:51:25,284 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The number of failed attempts is 2. The max attempts is 2
2017-11-07 13:51:25,284 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1510052734193_0005 with final state: FAILED
2017-11-07 13:51:25,284 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1510052734193_0005 State change from ACCEPTED to FINAL_SAVING
2017-11-07 13:51:25,284 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1510052734193_0005
2017-11-07 13:51:25,284 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application appattempt_1510052734193_0005_000002 is done. finalState=FAILED
2017-11-07 13:51:25,284 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for app: application_1510052734193_0005 at: /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/application_1510052734193_0005/application_1510052734193_0005
2017-11-07 13:51:25,284 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1510052734193_0005 requests cleared
2017-11-07 13:51:25,296 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1510052734193_0005 failed 2 times due to AM Container for appattempt_1510052734193_0005_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://hdpsrvpre2.intranet.darty.fr:8088/cluster/app/application_1510052734193_0005Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e10_1510052734193_0005_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:304)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:354)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
at java.lang.Thread.run(Thread.java:748)

Shell output: main : command provided 1
main : user is mapr
main : requested yarn user is mapr


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.

 

 


Any idea about the reason ?

Outcomes