Can I send a Kafka message to MapR Stream directly? In another word, can I re-use my kafka project and re-direct it to Streams?
Yes you can, assuming your application is Java. To accomplish this, you would install mapr-client on your system, ensure `mapr classpath` is on your application's classpath, and that you have configured your application to talk to a topic in a MapR Stream using the /path/to/stream:topic convention.
See this article for a quick start -
MapR 5.1 Documentation
What about if your Kafka producer is hosted outside of the MapR cluster?
There's no broker in the MapR streams implementation, and there doesn't seem to be anything listening for Kafka messages on a TCP port either, so am I right in thinking that whilst MapR Streams offer an implementation of the API they don't implement the Kafka 'over the wire' protocol? What's the best way to deal with that?
Take a look at Tug's blog on getting started with Streams here -
Getting Started with MapR Streams | MapR
Basically what you do is install our client library, tell it where your MapR cluster is, and any Kafka application running on your server will talk to our client and messages will get where they need to go. No need to specify broker IPs in your app.
I had a read of the MapR Streams chapter, but can't see which configuration option I'd set to point to the MapR cluster (as the bootstrap.servers value is irrelevant/unused because there is no broker):
However, not to worry - we're going to change tack a little. Our Kafka producer would have been a Flume agent, so we'll change the topology to use two Flume agents: with an Avro sink connecting to an Avro source to get our external data into the cluster.
Pointing to the MapR cluster is done at the time you install the MapR client libraries. Typically you would do a yum/apt install mapr-client, then run configure.sh to point at the CLDBs of your cluster. Once done, all apps that call Kafka APIs would get their messages shot towards the cluster, as long as the topic name is a valid stream:topic name like "/mapr/my.cluster.com/users/james/mystream:mytopic". Here, the client will see that my.cluster.com is mapped to a list of CLDBs (set via configure.sh), and send accordingly. If you give a simple Kafka-style topic name "mytopic", the client will look for bootstap.servers and fail.
Flume works too - our packaged version of flume comes with the Streams client built in. Here are some example configs -
Retrieving data ...