AnsweredAssumed Answered

MapR Streams - Sequential Processing of All Messages

Question asked by john.humphreys on Aug 1, 2017
Latest reply on Aug 11, 2017 by john.humphreys

How can I guarantee that messages in a MapR stream will be written/read sequentially?  I think I have 1 partition in my topic and it still appears my messages are not always in order when read (or somehow they're being written out of order).

 

Currently, I'm doing the following:

  1. Create a MapR stream with:
    • maprcli stream create -path /nmr/eis/sysm/pmp/work/dev/streaming-test/test-stream

  2. Create a topic in the stream with:
    • maprcli stream topic create -path /nmr/eis/sysm/pmp/work/dev/streaming-test/test-stream -topic test-topic

  3. Write a pile of strings to the stream by copy-pasting a simple file into this process:
    • /opt/mapr/kafka/kafka-0.9.0/bin/kafka-console-producer.sh --topic /nmr/eis/sysm/pmp/work/dev/streaming-test/test-stream:test-topic --broker-list this.will.be.ignored:9092
  4. Consume from the topic in Spark Streaming using:
    • KafkaUtils.createDirectStream[String,String](...)

 

I also write event logs from the Spark Streaming application to another monitoring topic created in the same stream in the same way.  I consume these event logs, for now, with this:

  • /opt/mapr/kafka/kafka-0.9.0/bin/kafka-console-consumer.sh --topic /nmr/eis/sysm/pmp/work/dev/streaming-test/test-stream:monitoring-topic --new-consumer --bootstrap-server this.will.be.ignored:9092

 

The Problem

 

For the monitoring topic, I frequently see messages out of order slightly (e.g. task 2's start log will display before task 1's end log).

 

For the test-topic (the main one for Spark Streaming), messages usually come in order but I have observed multiple instances where they don't as well.

 

Topic Properties

 

Here is the output of the topic info command which I believe says it only has one partition (the only displayed partition is partition 0) unless someone can correct me on that.  

 

-bash-4.1$ maprcli stream topic info -path /nmr/eis/sysm/pmp/work/dev/streaming-test/test-stream -topic test-topic


fid minoffsetacrossconsumers mintimestampacrossconsumers maxtimestamp servers logicalsize partitionid mintimestamp physicalsize maxoffset master
9099.32.2103262 0 1969-12-31T07:00:00.000-0500 2017-07-31T05:15:57.244-0400 psclxcpdevsys10.nomura.com:5660, psclxcpdevsys11.nomura.com:5660, psclxcpdevsys12.nomura.com:5660 131072 0 2017-07-31T05:15:54.518-0400 73728 707 psclxcpdevsys11.nomura.com:5660

Outcomes