AnsweredAssumed Answered

MapR Streams - Sequential Processing of All Messages

Question asked by john.humphreys on Aug 1, 2017
Latest reply on Aug 11, 2017 by john.humphreys

How can I guarantee that messages in a MapR stream will be written/read sequentially?  I think I have 1 partition in my topic and it still appears my messages are not always in order when read (or somehow they're being written out of order).


Currently, I'm doing the following:

  1. Create a MapR stream with:
    • maprcli stream create -path /nmr/eis/sysm/pmp/work/dev/streaming-test/test-stream

  2. Create a topic in the stream with:
    • maprcli stream topic create -path /nmr/eis/sysm/pmp/work/dev/streaming-test/test-stream -topic test-topic

  3. Write a pile of strings to the stream by copy-pasting a simple file into this process:
    • /opt/mapr/kafka/kafka-0.9.0/bin/ --topic /nmr/eis/sysm/pmp/work/dev/streaming-test/test-stream:test-topic --broker-list
  4. Consume from the topic in Spark Streaming using:
    • KafkaUtils.createDirectStream[String,String](...)


I also write event logs from the Spark Streaming application to another monitoring topic created in the same stream in the same way.  I consume these event logs, for now, with this:

  • /opt/mapr/kafka/kafka-0.9.0/bin/ --topic /nmr/eis/sysm/pmp/work/dev/streaming-test/test-stream:monitoring-topic --new-consumer --bootstrap-server


The Problem


For the monitoring topic, I frequently see messages out of order slightly (e.g. task 2's start log will display before task 1's end log).


For the test-topic (the main one for Spark Streaming), messages usually come in order but I have observed multiple instances where they don't as well.


Topic Properties


Here is the output of the topic info command which I believe says it only has one partition (the only displayed partition is partition 0) unless someone can correct me on that.  


-bash-4.1$ maprcli stream topic info -path /nmr/eis/sysm/pmp/work/dev/streaming-test/test-stream -topic test-topic

fid minoffsetacrossconsumers mintimestampacrossconsumers maxtimestamp servers logicalsize partitionid mintimestamp physicalsize maxoffset master
9099.32.2103262 0 1969-12-31T07:00:00.000-0500 2017-07-31T05:15:57.244-0400,, 131072 0 2017-07-31T05:15:54.518-0400 73728 707