Q) Support Cost Base Optimization and predicate pushdown features are enabled in Hive. Which is true about 2 queries?
Select employees.id, b.sales from employees LEFT JOIN sales ON (employees.id = sales.employee_id) WHERE day_id between '2015-01-01' AND '2015-03-31';
Select employees.id, b.sales from employees LEFT JOIN sales ON (employees.id = sales.employee_id AND day_id between '2015-01-01' AND '2015-03-31');
A) Both queries will run with equal efficiency.
B) Query 1 will be faster.
C) Query 2 will be faster.
D) Both query 1 and query 2 will run faster by setting hive.optimize.ppd = false;
I know query 2 will run faster because of pre-filtering that happens on 'day_id' column and hence hive will have lesser data load while joining.
I got to know this by reading into some blog but i am not convinced can somebody please explain principles of CBO and Predicate pushdown? Or at least point me to good website with good introduction into this concepts!