Data lake :
A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data.( Standard definition )
When it comes to Big Data it has to deal with structured, unstructured and semi structured data in minimal amount of time with accurate results.
Data lake is having completely different type of structured when it compared with traditional data warehouse.
According to my knowledge Data lake can be of two types
Standard Data lake -
Which supports all type of data irrespective of domain. Consists of Data integration, Data Management and Analytics/BI Layer in addition with Hadoop Eco system resides with it.
This is irrespective of the domain. The streaming agents is going to catch data in general , no more filtering required for this.
Domain Specific Data lake-
Which supports data specific to domain that comes from. Suppose the data streaming coming from Telecom , the data lake is going to designed specific to Telecom. If it commerce the data it deals with more of customer purchase and demand analytics.
This is specific to particular data. The streaming agents will fetch data from the designed domain.
The filtering technique and pre-processing schema have to define before feeding data to it
If we create a system which is having both features can be coined as Hybrid model of Data lake that will increase operational performance and business space in the area of big data to deal with more of advanced analytics related to Medical and other domains .