AnsweredAssumed Answered

Apache Drill - Get max directory in each path?

Question asked by john.humphreys on Jan 26, 2018
Latest reply on Jan 26, 2018 by maprcommunity

Drill lets you query multiple directories simultaneously, and lets you control which ones with the dir0/dir1/etc variables.

It also lets you find the MAXDIR or MINDIR with its directory functions. So, you can, for example, write a new version of files in a directory and ensure drill always uses the newest one.

Is there a way to query the newest version of each leaf directory though? For example.

  • 2018/
    • 01/
      • v1/
      • v2/
    • 02/
      • v1/
      • v2/

I'd like to select only the data in the v2 directories for each month. So, dir0 would be 2018, dir1 would be *, and I'd want the MAX(dir2).

I was thinking of something like this:

SELECT count(*) FROM dfs.`/path/drill-data/`
where dir0 = '2018'
and dir1 = '*'
and dir3 = MAXDIR('dfs', dir1);

but it doesn't seem to work; it says something about a null-related error with the MAXDIR function. I suspect I need to provide a full path as the second parameter but then I think it would probably choose a single max directory and not one per leaf folder.