Intro to Big Data AppHub: S3 to HDFS Sync App & HDFS to Kafka Sync App Templates - 3/8/17 - Remote Attendees welcome!

Document created by Patrick Moran on Feb 28, 2017
Version 1Show Document
  • View in full screen mode


Date:Wednesday, March 8, 2017
Time:10:00am - 12:00pm PST
Registration Link:



To make critical business decisions in real time, many businesses today rely on a variety of data, which arrives in large volumes. Variety and volume together make big data applications complex operations. Big data applications require businesses to combine transactional data with structured, semi-structured, and unstructured data for deep and holistic insights.


And, time is of the essence: to derive the most valuable insights and drive key decisions, large amounts of data have to be continuously ingested into Hadoop data lakes as well as other destinations. As a result, data ingestion poses the first challenge for businesses, which must be overcome before embarking on data analysis.


With its various Application Templates for ingestion, DataTorrent allows users to: 

Ingest vast amounts of data with enterprise-grade operability and performance guarantees provided by its underlying Apache Apex framework. Those guarantees include fault tolerance, linear scalability, high throughput, low latency, and end-to-end exactly-once processing. 

Quickly launch template applications to ingest raw data, while also providing an easy and iterative way to add business logic and such processing logic as parse, dedupe, filter, transform, enrich, and more to ingestion pipelines. Visualize various metrics on throughput, latency and app data in real-time throughout execution.


Template descriptions:

S3 to HDFS Sync: The S3 to HDFS Sync Application Template continuously ingests files as blocks from the configured Amazon S3 location to the destination path in HDFS retaining one-to-one file traceability.

HDFS to Kafka: The HDFS to Kafka Application Template continuously reads lines from configured Hadoop HDFS file(s) and writes each line as a message in configured Apache Kafka topic.


Ashwin Putta, Product Manager at DataTorrent, Committer for Apache Apex

Dr. Munagala V. Ramanath ("Ram"), Software Engineer at DataTorrent, Committer for Apache Apex

Sanjay Pujare, Engineer at DataTorrent



10:00 AM – 10:30 AM – Introduction to AppHub, AppHub principles, TCO, time to market - Ashwin

10:30 AM – 11:00 AM – S3 to HDFS Sync - Ram

11:00 AM – 11:30 AM – HDFS to Kafka - Sanjay

11:30 AM – 12:00 PM – Upcoming templates, cloud, roadmap - Ashwin


Please RSVP for the event via and follow the instructions in the event description.



For deeper engagement with Apache Apexdownload, view past meetup webinarsslides, and docs

To reduce time to market, look at operable app-templates that you can quickly import and launch. 

Examples: HDFS-SyncKafka-HDFSHDFS-Line-CopyS3-HDFS and HDFS-Kafka.

Free DataTorrent Enterprise Edition for qualifying startups. Check it out!

Brought to you by DataTorrent, creators of Apache Apex.