AnsweredAssumed Answered

culling information from sequence file headers

Question asked by dreamerboy on Oct 30, 2013
Hi -

I don't have access to the code that created a sequence file.  However, the header seems to have a lot of schema information embedded in it.  It's easy to print out the information as text using 'hadoop fs -text ...' however, I would like to use the header information to be able to parse the text into separate fields.  I have checked all the books - "Definitive Guide", "Hadoop in Action" - no joy.  Does anyone know how to parse this header information and how to use it?  (Is it metadata?  I take is the numbers in parens are field widths but do the other numbers have any meaning as offsets? etc.)  Thanks much.

SEQ^F!org.apache.hadoop.io.LongWritable^Yorg.apache.hadoop.io.Text^A^A'org.apache.hadoop.io.compress.GzipCodec^@^@^@l^A0^Sacct_num string(16)^A1^Vacct_num_ext string(7)^B10^Zacqr_bus_id_used string(8)^C100^[issr_bin_ctry_cd decimal(3)^C101^Zissr_bin_regn_cd string(2)^C102^[mrch_issr_tier_cd string(1)^C103^Rxbrdr_cd string(1)^C104^Xsales_type_id decimal(4)^C105^Zacqr_bin_regn_cd string(2)^C106^[mrch_loc_id_new decimal(10)^C107^Rnew_line string(1)^B11^^prch_dt datetime("YYYY-MM-DD")^B12^Qntfn_cd string(1)^B13^Rfrd_type string(1)^B14&central_proc_dt datetime("YYYY-MM-DD")^B15^_frd_fgn_tran_amt decimal(14, 2)^B16^^frd_us_tran_amt decimal(12, 2)^B17^Qcurr_cd string(3)^B18^]is_mrch_catg_cd decimal(4, 0)^B19^Xissr_gnrt_auth string(1)^A2^Zacct_num_seq decimal(5, 0)^ ...

Outcomes