Data Lake

A data lake is an information system consisting of the following 2 characteristics


 * A parallel system able to store big data
 * A system able to perform computations on the data without moving the data

Currently, Hadoop is the most common technology to implement a data lake, but it might not be that way forever. Thus it is important to distinguish the difference between Hadoop and a data lake. A data lake is a concept, and Hadoop is a technology to implement the concept.

“Data Lake” is a massive, easily accessible data repository for storing “big data”. Unlike traditional data warehouses, which are optimized for data analysis by storing only some attributes and dropping data below the level aggregation, a data lake is designed to retain all attributes, especially when you do not yet know what the scope of data or its use.

A data lake holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question.

https://www.bbvaopenmind.com/en/data-lake-an-opportunity-or-a-dream-for-big-data/

The Data Lake is an approach to big data architecture that focuses on storing unstructured or semi-structured data in its original form, in a single repository that serves multiple analytic use cases or services.

https://www.kdnuggets.com/2018/06/why-data-lake-matters.html