AnsweredAssumed Answered

Reliable usage of DirectKafkaAPI

Question asked by mahdi62b on Oct 20, 2016
Latest reply on Nov 4, 2016 by aalvarez

I am pllaned to develop a reliable streamig application based on directkafkaAPI..I will have one producer and another consumer..I wnated to know what is the best approach to achieve the reliability in my consumer?..I can employ two solutions..

  1. Increasing the retention time of messages in Kafka
  2. Using writeahead logs

I am abit confused regarding the usage of writeahead logs in directkafka API as there is no receiver..but in the documentation it indicates..

"Exactly-once semantics: The first approach uses Kafka’s high level API to store consumed offsets in Zookeeper. This is traditionally the way to consume data from Kafka. While this approach (in combination with write ahead logs) can ensure zero data loss (i.e. at-least once semantics), there is a small chance some records may get consumed twice under some failures. "

so I wanted to know what is the best approach..if it suffices to increase the TTL of messages in kafka or I have to also enable write ahead logs..

I guess it would be good practice if I avoid one of the above since the backup data (retentioned messages, checkpoint files) can be lost and then recovery could face failure..

Outcomes