What are some of the major criterias which would qualify for a hadoop implementation?
Ex High data volume , Better analytics etc.
Sujith Kumar - It was so nice to meet you today at Big Data Everywhere Chicago- Apr 28th 2016
and I'm pleased to see you in the community. This question is a great one for members of the community to answer. Let's crowd source this. I'll highlight this in the next community roundup. Can you elaborate a bit more on your question to make sure you get valuable answers.
I think George Demarest might be able to help share some use case information with you and Chase Hooley might be able to point out some relevant Converge Blog posts as well.
Jim Scott John Omernik Robert Novak -- Would be great to hear about the use cases you would share with business and IT leaders who may not be familiar with hadoop / MapR to help them understand the value.
So many answers for this question...
At the most basic level, if you have more data than you can store and/or process on a single computer, it's worth considering Hadoop. As the volume of data as well as the variety of users and demands on the data increases, Hadoop looks even more pertinent. And as you start to want more rapid iterations across the data (refining/replacing the questions you're asking or the methods you're using to process), it gets even more interesting.
Big data means different things to different people. So there's no minimum or maximum size of data to apply new world data methods to. If you have 50GB of really really important data that you need to slice, dice, chop, and refry constantly, that would qualify just as much as if you have 50TB that you're running hourly analytics against.
As Community Manager mentioned, we could definitely use more insight into your interest to provide a more precise answer. But in short, if it's more than you can handle on your laptop, give Hadoop a try.
Here are some resources that may be helpful. There are many more resources and I'll let the experts in the community take it from here.
Jack Norris shares examples of companies using high-frequency decisioning applications to make small, automated adjustments to:
In this Whiteboard Walkthrough, Jim Scott , Director of Enterprise Strategy and Architecture at MapR, discusses a business use case that leverages the power of MapR Streams.
MapR Named One of the Top 10 Banking Analytics Solution Providers for 2016 by Banking CIO Outlook Magazine | MapR
by Sean O'Dowd
MapR provides a converged platform that delivers enterprise-grade security, reliability, and real-time performance while dramatically lowering both hardware and operational costs, but we also power the digital banking industry in much broader ways by providing the following:
Whitepaper: Addressing Fraud and Privacy Issues in Hadoop for the Financial Services Industry
MapR use cases along with other topics are covered.
There are a couple of ebooks available that will also give you some great use case examples.
Hope you find these useful.
I have a very simplistic view on this since I have spent most of my working years in enterprise data warehousing so I tend to use edw as a comparison base:
Furthermore I use Hadoop in broad terms i.e. Big Data and not equals Map Reduce.
Data warehousing is great, it allows for processing and reporting on pretty large volumes of data given MPP/clustering technology that has been around in the space since forever. However, this processing power comes at a price: First the price of the systems which is very steep, and then in terms of processing in order to fit the data into a pretty limited format/model. And finally the language to access the data SQL which is great for most SET based processing needs but quite quickly becomes cumbersome when trying to address problems not easily translateable to SET based logic which is typically 20% of a warehouse solution imho.
Big Data systems (MapR) can do almost all the things a traditional enterprise data warehouse can do and with far less constraints both in terms of hardware and amount of processing needed to "align" the data to a predefined format/model. It features an MPP platform for SQL (Apache Drill) that supports most of the ANSI SQL standard, and it provides quite many more avenues for optimization and specialization if needed (such as running map reduce jobs, or Apache Spark jobs to process the data using other means than SQL). At the same time it provides a set of core services such as replication, mirroring, point in time snapshots, global namespace, and platform security, yarn to mention a few giving a very unified environment. It is based on commodity hardware and have reciliency built in.
Due to its flexible nature it can also process semi-structured, and unstructured data, and binary data streams if needed. With the advent of streams in the core data platform it can directly hook into more or less any data transport and provide guaranteed delivery and more, providing you with a great framework for building new types of applications utilizing the parts of the platform that makes sense for its use case, and the variations are basically endless.
So imho it is not so much about supporting particular use cases. Due to the flexible nature of a platform as MapR, you can handle almost "any"use case. I see it as a gigant toolchest where I previously only had a few tools to use for all problems, I have now all of them, + specialized tools for almost any variant, thanks to the open source communities, and still have the enterprise features needed to operationalize and support it.
So Higher volume
More variations of formats and possible processing ways.
More analytics per time unit
Simplification of processing by selecting the right tool for the job
Easy specialization if needed
Point in time consistent snapshots that can be used as basis for reporting, backup and more.
And so on.
Jim Scott just wrote a blog post that may be useful: Solving Problems with the Right Technology: Hadoop and RDBMS | MapR
Open source Hadoop is roughly a decade old, though most really serious development has occurred in the last five years. And in some ways, it does represent the future of data. In most organizations, data volumes are doubling about every two years. Most of that growth is unstructured or semi-structured data. Unstructured data is to RDBMS what oil is to water. Mix the two and you gum up the works. But if unstructured data is where data growth is going, then data processing must follow the same path, right?And that is where Hadoop shines. It is purpose-built to handle enormous volumes of unstructured data. It can scale in a way that harmonizes with the hockey-stick growth of unstructured data, which a typical RDBMS cannot. Where RDBMS are usually run on pricey commercial servers, Hadoop is just fine running on commodity hardware. And Hadoop splits typical data queries among various nodes, making it relatively fault-tolerant.
Open source Hadoop is roughly a decade old, though most really serious development has occurred in the last five years. And in some ways, it does represent the future of data. In most organizations, data volumes are doubling about every two years. Most of that growth is unstructured or semi-structured data. Unstructured data is to RDBMS what oil is to water. Mix the two and you gum up the works. But if unstructured data is where data growth is going, then data processing must follow the same path, right?
And that is where Hadoop shines. It is purpose-built to handle enormous volumes of unstructured data. It can scale in a way that harmonizes with the hockey-stick growth of unstructured data, which a typical RDBMS cannot. Where RDBMS are usually run on pricey commercial servers, Hadoop is just fine running on commodity hardware. And Hadoop splits typical data queries among various nodes, making it relatively fault-tolerant.
Fast, Scalable, Streaming Applications with MapR Streams, Spark Streaming, and MapR-DB | MapR
For Real life use cases for hadoop I wrote answer on quora you can read it.
Answer - Quora
I've researched and cataloged over a hundred MapR customer use cases. There is a wonderful variety in size and scope and my sense is that we are truly seeing the tip of the iceberg as to Hadoop/Spark use cases. The image below is an attempt to show the progression from early phase use cases to the most advanced. The idea of "knowing to doing" is that the more advanced use cases are more likely to change how the business operates, not just what business leaders know about their business.
Another thing to note is that as companies become more mature in the big data use cases, the applications become more "verticalized" - meaning they advance the customers capabilities in the industry they compete in. I hope it helps.
Retrieving data ...