Data Layout & Deployment

How do I calculate the MEDIAN of a column using Drill window functions?

See all drill best practice FAQs.

Data Layout & Deployment

How do I calculate the MEDIAN of a column using Drill window functions?

See all drill best practice FAQs.

Here is the definition of MEDIAN:

https://en.wikipedia.org/wiki/Median

In Drill, you can use a window function to implement the MEDIAN of a column.

Here are two examples with different data samples:

- Sample data:

`select cast(columns[0] as int) as age from dfs.root.`/tmp/median.csv` order by 1;`

+------+

| age |

+------+

| 10 |

| 15 |

| 20 |

| 20 |

| 40 |

| 50 |

+------+

6 rows selected (0.231 seconds)To calculate MEDIAN:

`with ordered_ages as (`

select age,row_number() over (order by age) as row_id,

(select count(1) from (select columns[0] as age from dfs.root.`/tmp/median.csv`)) as ct

from (select cast(columns[0] as int) as age from dfs.root.`/tmp/median.csv`)

)

select avg(age) as median

from ordered_ages

where row_id between ct/2.0 and ct/2.0 + 1;

+---------+

| median |

+---------+

| 20.0 |

+---------+

1 row selected (0.396 seconds)- Sample data:

`select cast(columns[0] as int) as age from dfs.root.`/tmp/median.csv` order by 1;`

+------+

| age |

+------+

| 10 |

| 15 |

| 20 |

| 21 |

| 40 |

| 50 |

+------+

6 rows selected (0.219 seconds)To calculate MEDIAN:

`with ordered_ages as (`

select age,row_number() over (order by age) as row_id,

(select count(1) from (select columns[0] as age from dfs.root.`/tmp/median.csv`)) as ct

from (select cast(columns[0] as int) as age from dfs.root.`/tmp/median.csv`)

)

select avg(age) as median

from ordered_ages

where row_id between ct/2.0 and ct/2.0 + 1;

+---------+

| median |

+---------+

| 20.5 |

+---------+

1 row selected (0.378 seconds)

Here is the definition of MEDIAN:

https://en.wikipedia.org/wiki/Median

In Drill, you can use a window function to implement the MEDIAN of a column.

Here are two examples with different data samples:

To calculate MEDIAN:

To calculate MEDIAN: