AnsweredAssumed Answered

an strange error about IOException,when I use nutch

Question asked by sworddance on Nov 25, 2012
Latest reply on Nov 25, 2012 by sworddance
I used the Class Webgraph in nutch and I written a new kind of Fileinputformat. However it work well in some files based on my new FileInputFormat and some not。The error has showned showed below。


    12/11/26 10:15:02 INFO webgraph.WebGraph: WebGraphDb: starting at 2012-11-26 10:15:02
    12/11/26 10:15:02 INFO webgraph.WebGraph: WebGraphDb: webgraphdb: /home/tian.yuchen/test_data/mySE/webgraph
    12/11/26 10:15:02 INFO webgraph.WebGraph: WebGraphDb: URL normalize: false
    12/11/26 10:15:02 INFO webgraph.WebGraph: WebGraphDb: URL filter: false
    12/11/26 10:15:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    12/11/26 10:15:02 INFO webgraph.WebGraph: OutlinkDb: adding input: /home/tian.yuchen/test_data/mySE/parseResult/3
    /home/tian.yuchen/test_data/mySE/parseResult/3
    12/11/26 10:15:02 INFO webgraph.WebGraph: OutlinkDb: adding input: /home/tian.yuchen/test_data/mySE/webgraph/outlinks
    12/11/26 10:15:02 INFO webgraph.WebGraph: OutlinkDb: running
    12/11/26 10:15:02 INFO mapred.FileInputFormat: Total input paths to process : 1
    the filelist to be handled
    file:/home/tian.yuchen/test_data/mySE/parseResult/3/parse_1340680336192
    12/11/26 10:15:02 INFO mapred.JobClient: Running job: job_local_0001
    12/11/26 10:15:02 INFO util.ProcessTree: setsid exited with exit code 0
    12/11/26 10:15:02 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2a0ab444
    the Reader path12/11/26 10:15:02 INFO mapred.MapTask: numReduceTasks: 1
    
    /home/tian.yuchen/test_data/mySE/parseResult/3/parse_1340680336192
    12/11/26 10:15:02 INFO mapred.MapTask: io.sort.mb = 100
    12/11/26 10:15:03 INFO mapred.MapTask: data buffer = 79691776/99614720
    12/11/26 10:15:03 INFO mapred.MapTask: record buffer = 262144/327680
    12/11/26 10:15:03 INFO mapred.MapTask: Starting flush of map output
    12/11/26 10:15:03 INFO mapred.JobClient:  map 0% reduce 0%
    12/11/26 10:15:03 INFO mapred.MapTask: Finished spill 0
    12/11/26 10:15:03 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
    12/11/26 10:15:05 INFO mapred.LocalJobRunner: file:/home/tian.yuchen/test_data/mySE/parseResult/3/parse_1340680336192:0+6348869
    12/11/26 10:15:05 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
    12/11/26 10:15:05 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6632060c
    12/11/26 10:15:05 INFO mapred.LocalJobRunner:
    12/11/26 10:15:05 INFO mapred.Merger: Merging 1 sorted segments
    12/11/26 10:15:05 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 6589570 bytes
    12/11/26 10:15:05 INFO mapred.LocalJobRunner:
    12/11/26 10:15:05 INFO compress.CodecPool: Got brand-new compressor
    12/11/26 10:15:05 WARN domain.DomainSuffixes: java.net.MalformedURLException
     at java.net.URL.<init>(URL.java:601)
     at java.net.URL.<init>(URL.java:464)
     at java.net.URL.<init>(URL.java:413)
     at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
     at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
     at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
     at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
     at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
     at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
     at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
     at org.apache.nutch.util.domain.DomainSuffixesReader.read(DomainSuffixesReader.java:54)
     at org.apache.nutch.util.domain.DomainSuffixes.<init>(DomainSuffixes.java:44)
     at org.apache.nutch.util.domain.DomainSuffixes.getInstance(DomainSuffixes.java:57)
     at org.apache.nutch.util.URLUtil.getDomainName(URLUtil.java:142)
     at org.apache.nutch.util.URLUtil.getDomainName(URLUtil.java:172)
     at org.apache.nutch.scoring.webgraph.WebGraph$OutlinkDb.reduce(WebGraph.java:433)
     at org.apache.nutch.scoring.webgraph.WebGraph$OutlinkDb.reduce(WebGraph.java:1)
     at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
     at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
    
    12/11/26 10:15:06 INFO mapred.JobClient:  map 100% reduce 0%
    12/11/26 10:15:07 WARN mapred.LocalJobRunner: job_local_0001
    java.net.MalformedURLException: For input string: "#default#homepage);this.setHomePage(http:"
     at java.net.URL.<init>(URL.java:601)
     at java.net.URL.<init>(URL.java:464)
     at java.net.URL.<init>(URL.java:413)
     at org.apache.nutch.util.URLUtil.getDomainName(URLUtil.java:172)
     at org.apache.nutch.scoring.webgraph.WebGraph$OutlinkDb.reduce(WebGraph.java:445)
     at org.apache.nutch.scoring.webgraph.WebGraph$OutlinkDb.reduce(WebGraph.java:1)
     at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
     at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
    12/11/26 10:15:07 INFO mapred.JobClient: Job complete: job_local_0001
    12/11/26 10:15:07 INFO mapred.JobClient: Counters: 20
    12/11/26 10:15:07 INFO mapred.JobClient:   File Input Format Counters
    12/11/26 10:15:07 INFO mapred.JobClient:     Bytes Read=0
    12/11/26 10:15:07 INFO mapred.JobClient:   FileSystemCounters
    12/11/26 10:15:07 INFO mapred.JobClient:     FILE_BYTES_READ=559289
    12/11/26 10:15:07 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=7207262
    12/11/26 10:15:07 INFO mapred.JobClient:   Map-Reduce Framework
    12/11/26 10:15:07 INFO mapred.JobClient:     Map output materialized bytes=6589574
    12/11/26 10:15:07 INFO mapred.JobClient:     Map input records=667
    12/11/26 10:15:07 INFO mapred.JobClient:     Reduce shuffle bytes=0
    12/11/26 10:15:07 INFO mapred.JobClient:     Spilled Records=50520
    12/11/26 10:15:07 INFO mapred.JobClient:     Map output bytes=6484215
    12/11/26 10:15:07 INFO mapred.JobClient:     Total committed heap usage (bytes)=631439360
    12/11/26 10:15:07 INFO mapred.JobClient:     CPU time spent (ms)=0
    12/11/26 10:15:07 INFO mapred.JobClient:     Map input bytes=6336146
    12/11/26 10:15:07 INFO mapred.JobClient:     SPLIT_RAW_BYTES=124
    12/11/26 10:15:07 INFO mapred.JobClient:     Combine input records=0
    12/11/26 10:15:07 INFO mapred.JobClient:     Reduce input records=0
    12/11/26 10:15:07 INFO mapred.JobClient:     Reduce input groups=0
    12/11/26 10:15:07 INFO mapred.JobClient:     Combine output records=0
    12/11/26 10:15:07 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
    12/11/26 10:15:07 INFO mapred.JobClient:     Reduce output records=0
    12/11/26 10:15:07 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
    12/11/26 10:15:07 INFO mapred.JobClient:     Map output records=50520
    
    **12/11/26 10:15:07 INFO mapred.JobClient: Job Failed: NA**
    
    **12/11/26 10:15:07 ERROR webgraph.WebGraph: java.io.IOException: Job failed!
     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
     at org.apache.nutch.scoring.webgraph.WebGraph.createWebGraph(WebGraph.java:689)
     at org.apache.nutch.scoring.webgraph.WebGraph.run(WebGraph.java:868)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
     at org.apache.nutch.scoring.webgraph.WebGraph.main(WebGraph.java:794)**
    
    
    12/11/26 10:15:07 ERROR webgraph.WebGraph: WebGraph: java.io.IOException: Job failed!
     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
     at org.apache.nutch.scoring.webgraph.WebGraph.createWebGraph(WebGraph.java:689)
     at org.apache.nutch.scoring.webgraph.WebGraph.run(WebGraph.java:868)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
     at org.apache.nutch.scoring.webgraph.WebGraph.main(WebGraph.java:794)

Outcomes