AnsweredAssumed Answered

Spark SQL - Aggregate results of distinct_set()

Question asked by john.humphreys on Aug 3, 2017
Latest reply on Aug 7, 2017 by john.humphreys

Let's say I have 2 data frames.


DF1 may have values {3, 4, 5} in column A of various rows.

DF2 may have values {4, 5, 6} in column A of various rows.


I can aggregate these into a set of distinct elements using distinct_set(A), assuming all those rows fall into the same grouping.


At this point I have a set in the resulting data frame. Is there anyway to aggregate that set with another set?


Basically, if I have 2 data frames resulting from the first aggregation, I want to be able to aggregate their results.