AnsweredAssumed Answered

spark rank - scala based one second and third tuple of RDD

Question asked by madhureddy915 on Sep 30, 2015
Latest reply on Mar 18, 2016 by Hao Zhu
Hi I would like to assign a rank for each row based on second and third tuple ,Here we have sample data . would like to add "1" if the third tuple has max value against id . If Id has same third tuple values , then based one second tuple -maximum of second tuple should have "1" as a fourth tuple . all the other fourth tuple values would be zero . I hope you understand the requirement :

    (32609,878,199)
    (32609,832,199)
    (45470,231,199)
    (42482,1001,299)
    (42482,16,291)
code: *val Rank=matching.map{{case (x1,x2,x3)=> (x1,x2,x3,((x3.toInt*100000)+x2.toInt).toInt)}.sortBy(-_.4).groupBy(._1)*

Result: rank.take(10).foreach(println)

(32609,CompactBuffer((32609,878,199,19900878), (32609,832,199,19900832)))
(45470,CompactBuffer((45470,231,199,19900231)))
(42482,CompactBuffer((42482,1001,299,29901001), (42482,16,291,29100016)))
Desired output would be :

(32609,878,199,1)
(32609,832,199,0)
(45470,231,199,1)
(42482,1001,299,1)
(42482,16,291,0)

Outcomes