Databases are typically structured with a defined schema. Items are organized as a set of tables with columns and rows. Columns include attributes and rows indicate an object or entity. Database is typically designed to be transactional and they are not designed to perform data analytics.
A data warehouse exists on top of several databases and used for business intelligence. Data warehouse consumes data from all these databases and creates a layer optimized to perform data analytics. Schema is done on import.
A data lake is a centralized repository for structured and unstructured data storage. Data lakes could be used to store raw data as is without any structure (schema). There is no need to perform any ETL or transformation jobs on it. You can store many types of data such images, text, files, videos.
You can store machine learning models artifacts, real-time data, and analytics outputs in data lakes. Processing could be done on export so schema is defined on read.
I hope you guys enjoyed my videos. Please subscribe for more videos!
#database #datalake #datawarehouse #s3