AnsweredAssumed Answered

Rowkey design for analytical usecase

Question asked by charanthota on May 4, 2016
Latest reply on May 5, 2016 by Ted Dunning

I have a dump table where i have lots of records and i want to summarize my data in to a summary table so that i can avoid computation every time i need data, The grouping and aggregation can vary, below is my row key design to put all records in single table

 

Sample RowKeys

 

processor:FIS_currency:USD_date:20160504

processor:FIS_currency:EUR_date:20160504

processor:GPS_currency:INR_date:20160504

processor:FIS_currency:INR_date:20160504

processor:GPS_currency:USD_date:20160504

program:MyChoice_currency:USD_date:20160504

program:AdvanceCash_currency:USD_date:20160504

program:MyChoice_currency:EUR_date:20160504

program:AdvanceCash_currency:EUR_date:20160504

processor:FIS_program:AdvanceCash_date:20160504

processor:GPS_program:MyChoice_date:20160504

currency:USD_channel:ATM_date:20160504

currency:EUR_channel:POS_date:20160504

currency:INR_date:20160504

 

 

Query Patterns with rowkeys can be like

processor:*_currency:*_date:20160504 --- get me data for all processor for all currency for particular date

processor:*_currency:*_date:* --- get me data for all processor for all currency for all days

program:*_currency:*_date:* --- get me data for all programs for all currency for all days

currency:USD_channel:*_date:* --- get me data for usd currency for all channels for all days

currency:INR_date:* -- get me data for inr currency for all days

 

Instead of having it in multiple tables i chose to have it in single table so that i can give ease of changing order of param in my rowkey anytime, But my problem here is will my row key design lead to hot spotting?

 

Previously i thought of using row key like FIS_USD_20160504 without ingesting any metadata but this will lead to semantics issues but data will get distributed, so to avoid semantic issues i came up with metadata to be ingested but now i fear it will lead to hot spotting since my rowkeys start with same word for many rows like program or processor or currency?

Outcomes