slimbaltagi

Building Kafka Connect connectors between data systems and Kafka

Blog Post created by slimbaltagi on May 29, 2017

Kafka Connect is a framework for streaming data between Apache Kafka and other data systems. It is included with Apache Kafka since Kafka 0.9 release on November 24th, 2015 and also with MapR Streams. With Kafka Connect, you can use many pre-built connectors without needing to write any code. The connectors themselves for different applications or data systems are not maintained within Apache Kafka main code base. For a list of Kafka Connect connectors and categories please check my blog Data Systems that Integrate with MapR-Streams via Kafka Connect

If there isn’t a pre-built Kafka Connect connector already available for you to import/export streaming data between your data system and Apache Kafka, you can build your own custom Kafka Connect connector by:

1.   Focusing on your few domain specific copying details such as:

  • Moving data from your source into Kafka or from Kafka into your destination.
  • Supporting relevant delivery semantics ( at most once, at least once, exactly once).

2.   Leveraging the Kafka Connect framework for the common requirements abstracted away for you such as:

  • Data conversion (Serialization / de-serialization)
  • Parallelism / scaling
  • Load balancing
  • Fault tolerance / failover
  • Partitioning / scale-out
  • Schema Registry integration

Now, where and how to start? You might start reading the related documentation such as the ‘Connector Developer Guide’ and learning Kafka Connect API from the related Java docs. Nevertheless, seeing a working code is the best way to start learning how to build your own Kafka Connect connector! In addition to the developer guide, Java docs API and source code of available connectors, there are also a maven archetype to generate a skeleton connector implementation, utilities, common components, … As these Kafka Connect resources are available piecemeal from the Apache Software Foundation and from vendors, I decided to list them in this article for the benefit of current and potential developers of Kafka Connect connectors. 

RESOURCES TO BUILD KAFKA CONNECT CONNECTORS:

  1. END-TO-END WALKTHROUGH: Kafka Connect Twitter design and code
  2. SOURCE CODE AT GITHUB OF MANY CONNECTORS: Learn from the source code of Kafka Connect connectors hosted on GitHub. Example: Kafka Connect Twitter
  3. API: The Connect API allows implementing connectors that continually pull from some source data system into Kafka or push from Kafka into some sink data system. Javadoc , SourceConnector classSinkConnector class
  4. MAVEN ARCHETYPE: This maven archetype is used to generate a skeleton Maven plugin for Kafka Connect.
  5. UTILITIES: Because some patterns for splitting work among tasks are so common, some utilities are provided in to simplify these cases: ConnectorUtilsKafka Connect Utils library at GitHub
  6. COMMONS: Kafka Connect Common
  7. VIDEOSKafka Connect OverviewKafka Connect Sources and SinksDeveloping Connectors in the Kafka Connect Framework
  8.  DEVELOPER GUIDES: Connector Developer Guide - Apache Software FoundationConnector Developer Guide - ConfluentPartner Development Guide for Kafka Connect

Good luck building your custom Kafka Connect connector if there isn't already a pre-build Kafka Connect connector already available for your data system. Your comments on this article are much appreciated. Thank you!

Outcomes