DataOps: An Agile Method for Data-Driven Organizations

File uploaded by slimbaltagi on Apr 3, 2018
Version 1Show Document
  • View in full screen mode

Slides of a presentation by Ellen friedman  at the 2018 Strata Data Conference in San Jose on March 7th 2018. 

 

"No longer a new idea, big data is fast becoming a core competency for many organizations. According to a surveys by New Vantage Partners, as of 2016, 62% of F1000 firms and industry leaders report they have at least one big data application in production, double the amount who reported the same in 2013. By 2017, over 80% say their big data investments are successful. But the 2017 report goes on to highlight the major challenge now: dealing with the difficulty in organizational and cultural change around big data.

Another challenge involves the practical logistics of data and application management that are necessary to deliver value in real world settings. Data science and machine learning techniques are playing an increasingly important role in driving value for big data projects. However, as data science and machine learning start to move from R&D to production, organizations are finding unexpected challenges. For instance, with machine learning, it turns out that selecting models and tuning parameters is the easy part; much harder are the logistical aspects—that is, the work involved with curating training datasets, versioning datasets, training models, benchmarking models, deploying them to production, and improving them iteratively. Overcoming these logistical challenges becomes critical for an organization’s ability to derive value from data-intensive applications.

DataOps is an emerging practice that helps with these challenges. At its core is cross-skill communication between data scientists, data engineers, application developers and the operations staff, with a better focus on a shared, data-driven goal. This collaboration fosters an Agile process for flexibility and fast time to value. A successful DataOps practice is also a good fit to emerging approaches designed to deal with logistical aspects of data-intensive applications.

Ellen Friedman offers an overview of DataOps and explains how to implement it.

Topics include:

  • What DataOps is and why it improves focus and flexibility
  • The steps needed to build a DataOps approach
  • Why this style of work makes it more likely to stay on time and on focus
  • The connection between DataOps and microservices
  • How DataOps provides a good fit for use of a global data fabric"

Outcomes