Timely data in a data warehouse is a challenge many of us face, often with there being no straightforward solution. Using a combination of batch and streaming data pipelines you can leverage the Delta Lake format to provide an enterprise data warehouse at a near real-time frequency. Delta Lake eases the ETL workload by enabling ACID transactions in a warehousing environment. Coupling this with structured streaming, you can achieve a low latency data warehouse. In this talk, we’ll talk about how to use Delta Lake to improve the latency of ingestion and storage of your data warehouse tables. We’ll also talk about how you can use spark streaming to build the aggregations and tables that drive your data warehouse.
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unified-data-analytics-platform
Connect with us:
Video Rating: / 5
Take an in-depth look at modern data warehousing using BigQuery and how to operate your data warehouse in the cloud. During this session, we’ll give lessons learned and best practices from prior implementations to give you the playbook for implementing your own modern data warehouse.
“BigQuery ML → https://bit.ly/2Khkqoq
BigQuery for data warehouse practioners → https://bit.ly/2TWh6P9”
Next ’19 Data Analytics Sessions here → https://bit.ly/Next19DataAnalytics
Next ‘19 All Sessions playlist → https://bit.ly/Next19AllSessions
Subscribe to the GCP Channel → https://bit.ly/GCloudPlatform
Speaker(s): Ryan McDowell, Alban Perillat-Merceroz
Session ID: DA307
product:BigQuery; fullname:Ryan McDowell;