MapR Streams, like MapR-DB, supports very complex topologies including looping, combining streams, and more. MapR Streams replication automatically detects replication loops just like MapR-DB. However, certain replication patterns may result in duplicated work by the system as the same message may be replicated via multiple paths.
For example, the A->B->C->A replication path is safe. In other words, messages will not go around the loop and waste resources or be duplicated. On the other hand, this path consumes extra system resources due to duplicated message traffic: A->B->C, A->C. C receives the same messages from both A and B. C automatically suppresses the duplicates, but it still stores them initially. This consumes network and disk resources. In addition, MapR Streams has an optimized IO internally, which is slowed by duplicated messages.
Note: Internally, MapR Streams streams are optimized MapR-DB tables. To prevent IO amplification once a bucket is full of messages, it is not flushed to spills because the messages are already in sorted order. However, in this case, the bucket contains duplicated messages. As the result, it has to copy the bucket file into segments to eliminate the duplicates-wasting resources.
Retrieving data ...