AnsweredAssumed Answered

Digesting From Spark (Batch) to OpenTSDB

Question asked by john.humphreys on May 25, 2017
Latest reply on May 31, 2017 by john.humphreys

We're in the process of migrating a legacy batch-based system (Spark) to a streaming system (also Spark).

  • The old system digests very large data files to a columnar database using Sqoop.
  • The new system should digest the same data to OpenTSDB over MapR-DB.

We need to shut down our legacy database and replace it with OpenTSDB as our first step (for business reasons).


How can I digest from Spark (batch oriented) to OpenTSDB?

  • Sqoop doesn't support OpenTSDB.
  • If I use the REST API for OpenTSDB in a spark batch job, it will write the same values multiple times from different executors (I don't think I can fix this).
  • Anything else I can think of (e.g. invoking external apps to digest the results) would be prone to failure and it would be hard to ensure it ran properly / was rescheduled.