AnsweredAssumed Answered

merging two files

Question asked by frazman on Sep 25, 2012
Latest reply on Sep 27, 2012 by lohit
I am a newbie in hadoop framework. So it would help me if someone can guide me thru this.
    I have two type of files.
dirA/  --> file_a , file_b, file_c

dirB/  --> another_file_a, another_file_b...

Files in directory A contains tranascation information.

So something like:

       id, time_stamp
       1 , some_time_stamp
       2 , some_another_time_stamp
       1  , another_time_stamp

So, this kind of information is scattered across all the files in dirA.
Now 1st thing to do is: I give a time frame (lets say last week) and I want to find all the unique ids which are present between that time frame.

So, save a file.

Now, dirB files contains the address information.
Something like:

        id, address, zip code
         1, fooadd, 12345
         and so on

So all the unique ids outputted by the first file.. I take them as input and then find the address and zip code.

basically the final out is like the sql merge.

Find all the unique ids between a time frame and then merge the address infomration.

I would greatly appreciate any help.