Course Overview
The two main components of any data pipeline are data lakes and warehouses. This course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud in technical detail. Also, this course describes the role of a data engineer, the benefits of a successful data pipeline to business operations, and examines why data engineering should be done in a cloud environment.
Who should attend
This course is intended for developers who are responsible for querying datasets, visualizing query results, and creating reports.
Specific job roles include:
- Data engineer
- Data analyst
- Database administrators
- Big data architects
Certifications
This course is part of the following Certifications:
Prerequisites
Basic proficiency with a common query language such as SQL.
Course Objectives
- Differentiate between data lakes and data warehouses.
- Explore use-cases for each type of storage and the available data lake and warehouse solutions on Google Cloud.
- Discuss the role of a data engineer and the benefits of a successful data pipeline to business operations.
- Examine why data engineering should be done in a cloud environment.
Outline: Modernizing Data Lakes and Data Warehouses with Google Cloud (MDLDW)
Module 1 - Introduction to Data Engineering
Topics:
- The role of a data engineer
- Data engineering challenges
- Introduction to BigQuery
- Data lakes and data warehouses
- Transactional databases versus data warehouses
- Partnering effectively with other data teams
- Managing data access and governance
- Build production-ready pipelines
- Google Cloud customer case study
Objectives:
- Discuss the role of a data engineer.
- Discuss benefits of doing data engineering in the cloud.
- Discuss challenges of data engineering practice and how building data pipelines in the cloud helps to address these.
- Review and understand the purpose of a data lake versus a data warehouse, and when to use which.
Module 2 - Building a Data Lake
Topics:
- Introduction to data lakes
- Data storage and ETL options on Google Cloud
- Building a data lake by using Cloud Storage
- Securing Cloud Storage
- Storing all sorts of data types
- Cloud SQL as your OLTP system
Objectives:
- Discuss why Cloud Storage is a great option to build a data lake on Google Cloud.
- Explain how to use Cloud SQL for a relational data lake.
Module 3 - Building a Data Warehouse
Topics:
- The modern data warehouse
- Introduction to BigQuery
- Getting started with BigQuery
- Loading data into BigQuery
- Exploring schemas
- Schema design
- Nested and repeated fields
- Optimizing with partitioning and clustering
Objectives:
- Discuss the requirements of a modern warehouse.
- Explain why BigQuery is the scalable data warehousing solution on Google Cloud.
- Discuss the core concepts of BigQuery and review options of loading data into BigQuery.