AnsweredAssumed Answered

Filter a RDD by some other RDD in spark

Question asked by kksethi02 on Feb 23, 2016
Latest reply on Mar 18, 2016 by Hao Zhu
I have 2 RDDs:
Rdd1:(String,CompactBuffer)
(3,CompactBuffer(3, 5, 6, 7, 8, 9))
(4,CompactBuffer(2, 4))
(1,CompactBuffer(1, 4, 5, 7, 8, 9))
(5,CompactBuffer(1, 8))
(2,CompactBuffer(1, 2, 3, 4, 6, 8, 9))
Rdd2:(String,Long)
(1,6)
(2,7)
(3,6)
I want to filter Rdd1 by the keys of Rdd2 i.e. only the entries with the same keys should be saved. After the filter result should be in the form:
Rdd3:
(3,CompactBuffer(3, 5, 6, 7, 8, 9))
(1,CompactBuffer(1, 4, 5, 7, 8, 9))
(2,CompactBuffer(1, 2, 3, 4, 6, 8, 9))

Outcomes