AnsweredAssumed Answered

Divolte-collector with MAPR, Storm, Kafka and Cassandra

Question asked by sarahjohn388 on Dec 1, 2017
Latest reply on Dec 1, 2017 by maprcommunity

Hello All, 

I am not sure if I can get help for this on here, but I thought it was worth a try.

I have 3 node cluster on AWS, I am running MAPR M3 , I installed Storm, Kafka and Divolte-collector and Cassandra. I would like try some of the clickstream examples and I am running into an issue with the tcp-consumer example. Also being quite new to java and distributed processing I have some clarification questions. Again I am not quite sure where to post this because I feel like this is divolte-collector specific and I also have some gaps in my understanding of the javadoc concept and the building and running of jar files; but I figured someone could point me to some resources or help with some clarifications. I can't get the json string to appear in the console running netcat socket listening for clicks:

Divolte tcp-kafka-consumer example

Everything works until the netcat part step 7 and my knowledge gap is with step 6.\

Step 1: install and configure Divolte Collector


Install works and hello world click collections is promising

Step 2: download, unpack and run Kafka

# In one terminal session
cd kafka_2.10-
./ ../config/

# Leave Zookeeper running and in another terminal session, do:
cd kafka_2.10-
./ ../config/

No erros plus tested kafka examples so seems to working as well

Step 3: start Divolte Collector

Go into the bin directory of your installation and run:

    cd divolte-collector-0.2/bin

Step 3 no hitch, can test default divole-collector test page

Step 4: host your Javadoc files

Setup a HTTP server that serves the Javadoc files that you generated or downloaded for the examples. If you have Python installed, you can use this:

    cd <your-javadoc-directory>
    python -m SimpleHTTPServer

Ok so I can reach the javadoc pages

Step 5: listen on TCP port 1234
nc -kl 1234
Note: when using netcat (nc) as TCP server, make sure that you configure the Kafka consumer to use only 1 thread, because nc won't handle multiple incoming connections.

Tested netcat by opening port and sending messages so I figured I don't have any port issues on AWS.

Step 6: run the example

    cd divolte-examples/tcp-kafka-consumer
    mvn clean package
    java -jar target/tcp-kafka-consumer-*-jar-with-dependencies.jar

Note: for this to work, you need to have the avro-schema project installed into your local Maven repository

I installed the avro-schema with mvn clean install in avro project that comes with the examples. as per instructions here

Step 7: click around and check that you see events being flushed to the console where you run netcat

When you click around the Javadoc pages, you console should show events in JSON format similar to this:

I don't see the clicks in my netcat window :( Investigating the issue I viewed the console and network tabs using chrome developer tools it seems divolte is running, but I am not sure how to dig further. This is the console view. Any ideas or pointers?


Thanks anyways