AnsweredAssumed Answered

Loading text file from Hdfs to MongoDB

Question asked by sailaja on Jul 26, 2012
Latest reply on Jul 26, 2012 by sailaja
While trying to load a text file from HDFS to MongoDB through a map-reduce java program,i am getting the following error:-

        12/07/26 19:19:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
        12/07/26 19:19:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        12/07/26 19:19:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
        12/07/26 19:19:02 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
        12/07/26 19:19:02 INFO input.FileInputFormat: Total input paths to process : 1
        12/07/26 19:19:02 WARN snappy.LoadSnappy: Snappy native library not loaded
        12/07/26 19:19:02 INFO mapred.JobClient: Running job: job_local_0001 should setup context
        12/07/26 19:19:02 INFO mapred.MapTask: io.sort.mb = 100
        12/07/26 19:19:02 INFO mapred.MapTask: data buffer = 79691776/99614720
        12/07/26 19:19:02 INFO mapred.MapTask: record buffer = 262144/327680
        12/07/26 19:19:02 INFO mapred.MapTask: Starting flush of map output
        12/07/26 19:19:02 INFO mapred.MapTask: Finished spill 0
        12/07/26 19:19:02 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
        12/07/26 19:19:02 INFO mapred.LocalJobRunner:
        12/07/26 19:19:02 INFO mapred.Task: Task attempt_local_0001_m_000000_0 is allowed to commit now should commit task
        12/07/26 19:19:02 INFO mapred.LocalJobRunner:
        12/07/26 19:19:02 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done. should setup context
        12/07/26 19:19:02 INFO mapred.LocalJobRunner:
        12/07/26 19:19:02 INFO mapred.Merger: Merging 1 sorted segments
        12/07/26 19:19:02 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 20 bytes
        12/07/26 19:19:02 INFO mapred.LocalJobRunner:
        12/07/26 19:19:02 WARN mapred.FileOutputCommitter: Output path is null in cleanup
        12/07/26 19:19:02 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalArgumentException: Unable to connect to MongoDB Output Collection.
    at com.mongodb.hadoop.util.MongoConfigUtil.getOutputCollection(MongoConfigUtil.java:272)
    at com.mongodb.hadoop.MongoOutputFormat.getRecordWriter(MongoOutputFormat.java:41)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:559)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:256)
Caused by: java.lang.IllegalArgumentException: Unable to connect to collection: null
    at com.mongodb.hadoop.util.MongoConfigUtil.getCollection(MongoConfigUtil.java:262)
    at com.mongodb.hadoop.util.MongoConfigUtil.getOutputCollection(MongoConfigUtil.java:269)
    ... 4 more
Caused by: java.lang.NullPointerException
    at com.mongodb.Mongo$Holder._toKey(Mongo.java:679)
    at com.mongodb.Mongo$Holder.connect(Mongo.java:657)
    at com.mongodb.hadoop.util.MongoConfigUtil.getCollection(MongoConfigUtil.java:259)
    ... 5 more
        12/07/26 19:19:03 INFO mapred.JobClient:  map 100% reduce 0%
        12/07/26 19:19:03 INFO mapred.JobClient: Job complete: job_local_0001
        12/07/26 19:19:03 INFO mapred.JobClient: Counters: 14
        12/07/26 19:19:03 INFO mapred.JobClient:   FileSystemCounters
        12/07/26 19:19:03 INFO mapred.JobClient:     FILE_BYTES_READ=219
        12/07/26 19:19:03 INFO mapred.JobClient:     HDFS_BYTES_READ=11
        12/07/26 19:19:03 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=57858
        12/07/26 19:19:03 INFO mapred.JobClient:   Map-Reduce Framework
        12/07/26 19:19:03 INFO mapred.JobClient:     Reduce input groups=0
        12/07/26 19:19:03 INFO mapred.JobClient:     Combine output records=1
        12/07/26 19:19:03 INFO mapred.JobClient:     Map input records=1
        12/07/26 19:19:03 INFO mapred.JobClient:     Reduce shuffle bytes=0
        12/07/26 19:19:03 INFO mapred.JobClient:     Reduce output records=0
        12/07/26 19:19:03 INFO mapred.JobClient:     Spilled Records=1
        12/07/26 19:19:03 INFO mapred.JobClient:     Map output bytes=16
        12/07/26 19:19:03 INFO mapred.JobClient:     Combine input records=1
        12/07/26 19:19:03 INFO mapred.JobClient:     Map output records=1
        12/07/26 19:19:03 INFO mapred.JobClient:     SPLIT_RAW_BYTES=133
        12/07/26 19:19:03 INFO mapred.JobClient:     Reduce input records=0



Following is my code:


public class WordCountH2M {

    private static final Log log = LogFactory.getLog( WordCountM2H.class );

    public static class TokenizerMapper extends Mapper<LongWritable, Text ,Text, IntWritable> {

   
        static IntWritable one = new IntWritable(1);
        public void map(LongWritable key, Text value, Context context ) throws IOException, InterruptedException{
            
            context.write( new Text(value.toString()),one);
    }
    }

    public static class IntSumReducer extends Reducer<Text, IntWritable, Text,IntWritable> {

       

        public void reduce( Text key, IntWritable values, Context context )
                throws IOException, InterruptedException{

            context.write(key,values);
        }
    }



    public static void main( String[] args ){
    try
    {
        final Configuration conf = new Configuration();

        final Job job = new Job( conf, "word count" );
        job.setJarByClass( WordCountH2M.class );

            FileInputFormat.addInputPath(job, new Path("hdfs:****user/user1/input-data/samplefile/file.txt"));

        MongoConfigUtil.setOutputURI(conf, "mongodb://127.0.0.1:12333/test.ss1" );
       job.setMapperClass( TokenizerMapper.class );

        job.setReducerClass( IntSumReducer.class );

        job.setOutputKeyClass(Text.class );
        job.setOutputValueClass(IntWritable.class );
  
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(MongoOutputFormat.class );

        System.exit( job.waitForCompletion( true ) ? 0 : 1 );
       
    }
    catch (Exception e) {
        System.out.println(e.getMessage());
    }
    }
}

Outcomes