MapR-specific tutorials coming soon!
- Log Shipping to Elasticsearch - Read weblog files from a local filesystem directory, decorate some of the fields (e.g. GeoIP Lookup), and write them to Elasticsearch.
- Simple Kafka Enablement using StreamSets Data Collector
- Ingesting Local Data into Azure Data Lake Store - Read records from a local CSV-formatted file, mask out PII (credit card numbers) and send them to a JSON-formatted file in Azure Data Lake Store.
Writing Custom Pipeline Stages
Creating a Custom StreamSets Origin - Build a simple custom origin that reads a Git repository's commit log and produces the corresponding records.
Creating a Custom Multithreaded StreamSets Origin - A more advanced tutorial focusing on building an origin that supports parallel execution, so the pipeline can run in multiple threads.
Creating a Custom StreamSets Processor - Build a simple custom processor that reads metadata tags from image files and writes them to the records as fields.
Creating a Custom StreamSets Destination - Build a simple custom destination that writes batches of records to a webhook.
Ingesting Drifting Data into Hive and Impala - Build a pipeline that handles schema changes in MySQL, creating and altering Hive tables accordingly.
Creating a StreamSets Spark Transformer in Java - Build a simple Java Spark Transformer that computes a credit card's issuing network from its number.
Creating a StreamSets Spark Transformer in Scala - Build a simple Scala Spark Transformer that computes a credit card's issuing network from its number.
The Data Collector documentation also includes an extended tutorial that walks through basic Data Collector functionality, including creating, previewing and running a pipeline, and creating alerts.