slimbaltagi

Data Systems that Integrate with MapR-ES via Kafka Connect

Blog Post created by slimbaltagi on May 22, 2017

As of today May 22nd 2017, there are over 70 Kafka Connect connectors for streaming data into and out of Apache Kafka! 

 

 

The connectors themselves for different applications or data systems are not maintained with Apache Kafka main code base. An easy way to discover Kafka Connect resources including connectors is to search GitHub for ’kafka-connect’ or directly open this URL https://github.com/search?q=kafka-connect.

 

Kafka Connect is included in MapR Streams, please see Kafka Connect for MapR Streams.  Now,  through simple configurations and no code necessary, we can leverage these Kafka Connect connectors for large scale streaming of data in and out of  Kafka/MapR Streams for a variety of data systems! 

 I categorized the available Kafka Connect connectors into several categories while specifying their type as either source, for getting from data from another data system into Apache Kafka ; Or sink, for getting data from Kafka into another data system:

  • Change Data Capture: Attunity Replicate (Source) , Dbvisit Replicate Connector for Oracle (Source), Oracle Golden Gate (Source) , IBM Data Replication (Source), Debezium [MySQL, PostgreSQL, MongoDB]
  • Databases: JDBC (Source, Sink), MySQL, Blockchain, Edge Intelligence   InfluxDB (Sink), KineticaDB (Sink), KLP-PostgreSQL (Sink) from InfoBright, SAP HANA (Source, Sink), Vertica (Source, Sink) , VoltDB (Sink) , ReThinkDB (Sink), OpenTSDB (Sink)
  • NoSQL: Azure DocumentDb (Sink), Aerospike (Sink), Cassandra (Source, Sink), Couchbase (Source), Druid (Sink), Dynamo DB (Source, Sink), HBase (Source, Sink), MongoDB (Source, Sink), Redis (Sink), MarkLogic (Sink)  
  • File Systems: FTP (Source) , HTTP (Source) , File (Source, Sink), FileSystem (Source), HDFS (Sink), Apache Kudu (Sink), spooldir (Source)
  • Log: Splunk (Sink, Source) , Syslog (Source)
  • Search: Elasticsearch (Sink), Solr (Sink, Source)  
  • Object Stores: Amazon S3, Google Cloud Storage, Azure Blob Store ( on the roadmap) 
  • Mainframe: Syncsort DMX (Source, Sink)
  • IoT: Azure IoT Hub (Source), CoAP [Constrained Application Protocol] (Source, Sink) , MQTT( Source), Flogo (Source)
  • Data Warehouse: BigQuery (Sink), Hive (Sink)
  • IMDB: Apache Ignite (Source, Sink), Hazelcast (Sink)  
  • Messaging: AMQP, Google PubSub (Source, Sink), JMS (Source, Sink), Amazon SQS (source) , MQTT( Source), Slack via webhooks (Sink), RabbitMQ, AWS Kinesis   
  • Application Feed: Bloomberg Feeds (Source), Jenkins (Source), Salesforce (Source), IRC (Internet Relay Chat) Source, PubNub, Mobile Apps , Twitter (Source, Sink), Yahoo Finance ( Source), GitHub (Source) 
  • Analytics: Mixpanel (Source)  
  • JMX: JMX (Source)
  • Content Extraction: DocumentSource

 

A few examples of use cases of Kafka Connect connectors would be: 

  • Publishing SQL Tables (or an entire SQL database) into Apache Kafka
  • Consuming streams from Apache Kafka into HDFS for batch processing
  • Consuming streams from Apache Kafka into Elasticsearch for secondary indexing
  • Integrating legacy systems such as mainframe ones with Apache Kafka
  • … 

 

Please share your experience, in the comments section, using Kafka Connect connectors with MapR Streams!

 

Thanks 

Slim Baltagi

Advanced Analytics LLC

Outcomes