AnsweredAssumed Answered

Understanding ColumnFamily - Best practices for ETL

Question asked by bhardwaj_rajesh on Aug 14, 2014
Latest reply on Aug 14, 2014 by bhardwaj_rajesh
Hello,
We do etl and load time series data database. We want to understand whats the best way of using column family groups.
From architecture , I understand data belonging to same column family groups are placed together in HFile/tablets.
Say for example, we get huge number of files, each containing specific data for a GivenKey (Node and TimeStamp), So we can get multiple files specific to a given node.

As we have to load data as it comes, its possible by the time the next data comes for the given key, the cache might have already filled and sync has happened, so will the colocation be happening at the compaction time , does HBase also keeps track of what columnfamily is in which disk file ?

Is it a bad practice, to write columns belonging to same columnfamily in different puts (puts can be staggered in time)

Outcomes