AnsweredAssumed Answered

Benchmarking scripts teragen fails on a cluster

Question asked by dzlabs on Apr 12, 2015
Latest reply on Apr 18, 2015 by dzlabs
I've a cluster of 3 nodes on which I'm trying to run the application benchmarking scripts teragen & terasort as follows
    
    root@n1:~# maprcli volume create -name benchmarks -replication 1 -mount 1 -path /benchmarks
    mapr@n1:/root$ yarn jar /opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1-mapr-1501.jar teragen 5000000 /benchmarks/teragen1
    15/04/12 14:05:46 INFO terasort.TeraSort: Generating 5000000 using 2
    15/04/12 14:05:46 INFO mapreduce.JobSubmitter: number of splits:2
    15/04/12 14:05:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1428571979306_0003
    15/04/12 14:05:47 INFO security.ExternalTokenManagerFactory: Initialized external token manager class - com.mapr.hadoop.yarn.security.MapRTicketManager
    15/04/12 14:05:47 INFO impl.YarnClientImpl: Submitted application application_1428571979306_0003
    15/04/12 14:05:47 INFO mapreduce.Job: The url to track the job: http://n3:8088/proxy/application_1428571979306_0003/
    15/04/12 14:05:47 INFO mapreduce.Job: Running job: job_1428571979306_0003
    15/04/12 14:05:52 INFO mapreduce.Job: Job job_1428571979306_0003 running in uber mode : false
    15/04/12 14:05:52 INFO mapreduce.Job:  map 0% reduce 0%
    15/04/12 14:06:01 INFO mapreduce.Job:  map 50% reduce 0%
    15/04/12 14:06:09 INFO mapreduce.Job: Task Id : attempt_1428571979306_0003_m_000001_0, Status : FAILED
    15/04/12 14:06:23 INFO mapreduce.Job: Task Id : attempt_1428571979306_0003_m_000001_1, Status : FAILED
    15/04/12 14:06:35 INFO mapreduce.Job: Task Id : attempt_1428571979306_0003_m_000001_2, Status : FAILED
    15/04/12 14:06:53 INFO mapreduce.Job:  map 100% reduce 0%
    15/04/12 14:06:53 INFO mapreduce.Job: Job job_1428571979306_0003 failed with state FAILED due to: Task failed task_1428571979306_0003_m_000001
    Job failed as tasks failed. failedMaps:1 failedReduces:0
    
    15/04/12 14:06:54 INFO mapreduce.Job: Counters: 34
            File System Counters
                    FILE: Number of bytes read=0
                    FILE: Number of bytes written=80162
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    MAPRFS: Number of bytes read=82
                    MAPRFS: Number of bytes written=250000000
                    MAPRFS: Number of read operations=7
                    MAPRFS: Number of large read operations=0
                    MAPRFS: Number of write operations=5029297
            Job Counters
                    Failed map tasks=4
                    Killed map tasks=1
                    Launched map tasks=6
                    Other local map tasks=6
                    Total time spent by all maps in occupied slots (ms)=64331
                    Total time spent by all reduces in occupied slots (ms)=0
                    Total time spent by all map tasks (ms)=64331
                    Total vcore-seconds taken by all map tasks=64331
                    Total megabyte-seconds taken by all map tasks=65874944
                    DISK_MILLIS_MAPS=32167
            Map-Reduce Framework
                    Map input records=2500000
                    Map output records=2500000
                    Input split bytes=82
                    Spilled Records=0
                    Failed Shuffles=0
                    Merged Map outputs=0
                    GC time elapsed (ms)=35
                    CPU time spent (ms)=3150
                    Physical memory (bytes) snapshot=323448832
                    Virtual memory (bytes) snapshot=1646190592
                    Total committed heap usage (bytes)=261619712
            org.apache.hadoop.examples.terasort.TeraGen$Counters
                    CHECKSUM=5369395528751711
            File Input Format Counters
                    Bytes Read=0
            File Output Format Counters
                    Bytes Written=250000000
    
The second map task fails, and when I checked the task logs I found an empty stdout. Here is the content of stderr:

    2015-04-12 14:05:55,0304 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1290 Thread: 9600 Lookup of volume mapr.var failed, error Connection reset by peer(104), CLDB: 192.168.2.201:7222 backing off ...
    2015-04-12 14:05:56,0309 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1290 Thread: 9600 Lookup of volume mapr.var failed, error Connection reset by peer(104), CLDB: 192.168.2.200:7222 backing off ...

and here is the content of syslog file:

    2015-04-12 14:05:54,084 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
    2015-04-12 14:05:54,084 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
    2015-04-12 14:05:54,227 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
    2015-04-12 14:05:54,308 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
    2015-04-12 14:05:54,308 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
    2015-04-12 14:05:54,366 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
    2015-04-12 14:05:54,366 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1428571979306_0003, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@7846b8c3)
    2015-04-12 14:05:54,417 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
    2015-04-12 14:05:54,605 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /tmp/hadoop-mapr/nm-local-dir/usercache/mapr/appcache/application_1428571979306_0003
    2015-04-12 14:05:54,725 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
    2015-04-12 14:05:54,726 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
    2015-04-12 14:05:54,963 INFO [main] org.apache.hadoop.mapred.Task: mapOutputFile class: org.apache.hadoop.mapred.MapRFsOutputFile
    2015-04-12 14:05:54,964 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
    2015-04-12 14:05:54,987 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    2015-04-12 14:05:57,040 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@37734b10
    2015-04-12 14:05:59,806 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1428571979306_0003_m_000000_0 is done. And is in the process of committing
    2015-04-12 14:05:59,835 INFO [main] org.apache.hadoop.mapred.Task: Task attempt_1428571979306_0003_m_000000_0 is allowed to commit now
    2015-04-12 14:05:59,838 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_1428571979306_0003_m_000000_0' to maprfs:/benchmarks/teragen1/_temporary/1/task_1428571979306_0003_m_000000
    2015-04-12 14:05:59,860 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1428571979306_0003_m_000000_0' done.

Any idea what is causing the job failure?






Outcomes