AnsweredAssumed Answered

Spark Streaming + MapR Streams - Unfair partition consumption

Question asked by john.humphreys on Oct 6, 2017
Latest reply on Oct 6, 2017 by john.humphreys

I have a 20 partition topic in Kafka and am reading it with Spark Streaming (8 executors, 3 cores each). I'm using the direct stream method of reading.

I'm having problems because the first 12 partitions are getting read at a faster rate than the last 8 for some reason. So, data in the last 8 is getting stale (well, staler).

Partitions 12-19 are around 90% caught up to partitions 0-11; but we're talking about billions of messages; so the stale-ness of the data 10% back in the topic partition is pretty significant.

Is this normal? Can I make sure Kafka consumes the partitions more fairly?

Outcomes