I have a 20 partition topic in Kafka and am reading it with Spark Streaming (8 executors, 3 cores each). I'm using the direct stream method of reading.
I'm having problems because the first 12 partitions are getting read at a faster rate than the last 8 for some reason. So, data in the last 8 is getting stale (well, staler).
Partitions 12-19 are around 90% caught up to partitions 0-11; but we're talking about billions of messages; so the stale-ness of the data 10% back in the topic partition is pretty significant.
Is this normal? Can I make sure Kafka consumes the partitions more fairly?