MapR’s Ted Dunning says we’ll make better sense of bigger data in 2018

Document created by slimbaltagi on Dec 7, 2017
Version 1Show Document
  • View in full screen mode

The article below was published by Virginia Backaitis at Digitizing Polaris I am copying it here for your convenience. 


If you haven’t met Ted Dunning, look for him at the next big data, analytics, machine learning or Apache Software Foundation conference you go to. He is like one of those passionate, generous professors you had in college who can begin a conversation at whatever level you need to start it, whether you’re an expert or have just learned that advances in machine learning are changing everything. You’ll come away from the dialogue not only enlightened, but also feeling that the world is a bit bigger and more filled with wonder than when the conversation began.


He offered Digitizing Polaris seven predictions for big data in 2018:


  1. Machine Learning Will go from “In Vogue” to “In Production”

Increasingly machine learning will be seen as a normal part of business rather than being unusual especially as more enterprises begin to reap the benefits of machine learning systems in terms of real business value. AI will continue to get a lot of buzz, but it will be a much broader set of machine learning approaches that deliver valuable insights across many enterprises in different sectors.


Additionally people are likely to see that the most successful systems occur where people focus more on the problem than the tool. They will recognize how important it is to frame the question correctly, have realistic goals, have access to appropriate data at scale and have a realistic plan to convert machine learning results into action.


2. Organizations will Recognize that 90% of Machine Learning Success is in the Logistics (rather than the algorithm or the model)


It may sound less exciting or cool, but being able to effectively manage data, is essential to running successful machine learning systems in the real world. This is true for the complete life cycle — from managing input data to the development of machine learning models, to their ongoing maintenance in production. The good news is that with effective architecture and good planning, much of this can be handled at the platform level rather than the application level — and that cuts across many systems handled by different machine learning tools. In other words, you don’t have to come up with a new plan for logistics with every different project.


Because we think people will increasingly recognize the need for efficient machine learning logistics, we also think there will be a trend toward stream-based architectures and a global data fabric as part of their overall organization.


3. Rapid Kubernetes Adoption forms the foundation for Multi-cloudDeployments


We predict runaway success of Kubernetes, but it is running away with the prize of adoption so fast that this may quickly be more of an observation than a prediction in 2018.


So far, however, almost everybody is thinking of Kubernetes as a way of organizing and orchestrating computation in a cloud. Over the next year, we expect Kubernetes to more and more be the way that leading-edge companies organize and orchestrate computation across multiple clouds, both public and private. On premises computation is moving to containers and orchestration style at light speed, but when you can interchangeably schedule work anywhere that it makes sense to do so, you will see the real revolution.


But … this only talks about the computation. What about the data?Well there are two specific predictions related to that (4 and 5)


4. Big Data Systems will Become the Center of Gravity (and Building a Global Data Fabric is One Key Way to Do That)


In the past, big data and the projects built around it have been isolated, in many cases special projects or experiments that at best complemented traditional systems. Now, big data is becoming an essential asset and enterprises are transforming into data-driven concerns. This transformation naturally leads to big data systems becoming the center of gravity for enterprises, in terms of data size, storage and access as well as operations and analytics.


As a result, more businesses will be looking for ways to build a global data fabric that breaks down silos to give comprehensive access to data from many sources and to computation for truly multi-tenant systems.


5. Leading Organizations Knit Data Flows into a Data Fabric


This coming year, we will see more and more businesses treat computation in terms of data flows rather than data that is just processed and landed in a database. These data flows capture key business events and mirror business structure. A unified data fabric will be the foundation for building these large-scale flow-based systems. Such a fabric will necessarily support multiple kinds of computation that are appropriate in different contexts. More and more, databases will become the natural partner and complement of a dataflow. The emerging trend is to have a data fabric that provides data-in-motion and data-at-rest needed for multi-cloud computation provided by things like Kubernetes.


6. DataOps Emerges as Key Organizational Approach to Drive Agility


We have lately seen the beginning of a trend toward embedding data scientists and data-focused developers into otherwise traditional DevOps teams to form what we call a DataOps team. This approach involves much better communication, better focus and goal orientation by cross-skilled teams, and results (importantly) in faster time to value and better agility. Organizing work in a DataOps style gives an enterprise better ability to respond to changing conditions in a timely and appropriate way — it provides the flexibility and efficiency at the human level needed to take advantage of new technologies and architectures.


For example, as machine learning becomes mainstream (see item 1), switching to DataOps teams becomes very natural, and we expect this to become very popular this year. This will let some companies pull away from the pack, but it can be incredibly hard for core IT to keep up with the resulting demands. Security teams will also be hard-pressed.


7. Processing Extends to the IoT Edge


In this upcoming year, we aren’t just going to see data fabrics and computation that span on-premises facilities into multiple clouds. We are also going to see full-scale data fabric extend right to the edge next to devices, and, in some cases, we will see threads of the fabric extend right into the devices themselves.