Course Overview
This deep dive technical course provides technical training for Enterprise Data Warehouses (EDW) solutions using BigQuery.
The course provides guidance and training on technical challenges that can arise when migrating existing EDW and ETL pipelines to Google Cloud. Learners are provided with practical experience with real migration tasks, such as data migration, schema optimization, and SQL Query conversion and optimization.
The course also covers key aspects of ETL pipeline migration to Dataproc and using Pub/Sub, Dataflow, and Cloud Data Fusion.
Who should attend
The core audience for this course is Google Cloud Partners with the following relevant job roles:
- Data Warehouse Deployment Engineers
- Data Warehouse Consultants
- Data Warehouse Architects
These roles, though not the core audience, may find the course relevant should they meet the requirements:
- Technical Project Leads
- Technical Project Managers
- Data / Business Analysts
Prerequisites
Required:
- Have completed the Data Engineering on Google Cloud training.
- Be a Google Cloud Certified Professional Data Engineer or have equivalent expertise in Data Engineering.
- Have access to Cloud Connect - Partners.
Recommended:
- Experience building data processing pipelines.
- Experience with Apache Beam and Apache Hadoop.
- Java or Python programming expertise.
Organizational requirements:
- The Cloud Partner organization must have implemented at least one Data Warehouse solution previously on any Data Warehouse platform.
Course Objectives
- Discuss key elements of Google Data Warehouse solution portfolio and strategy.
- Map Enterprise Data Warehouses concepts and components to BigQuery and Google data services.
- Identify best practices for migrating Data Warehouses to BigQuery and demonstrate key skills required to perform successful migrations.
- Implement data load and transformation pipelines for a BigQuery Data Warehouse using Google data processing and integration services.
- Implement a streaming analytics solution using Pub/Sub, Dataflow, and BigQuery.
- Use Looker and LookML to generate reports and gain insights.
- Explore the GIS, GIS Visualization, and Machine Learning enhancements to BigQuery.
Outline: Data Warehousing for Partners (DWP)
Module 1 - Data Warehouse Solutions on Google Cloud
Topics:
- Implementing Big Data Solutions on Google Cloud
- Customer Needs
- Sample Architectures
- Migration Strategies and Planning
- Working with PSO
Objectives:
- Describe the Google portfolio of Data Warehouse and Data Processing services
- Identify the Google strategy for Data Warehouse products and services
- Locate technical resources for Data Warehouse partners
Module 2 - BigQuery for Data Warehousing Professionals
Topics:
- BigQuery Concepts
- BigQuery Permissions and Security
- Monitoring and Auditing
- Schema Design
- Partitioning and Clustering
- Data Capture and Load Jobs
- Handling Change and Slowly Changing Dimensions
- Querying Data
- Managing Workloads and Concurrency
- Analyzing Data
- Sizing and Cost Management
- Query Optimization
- Storage Optimization
Objectives:
- Describe the key components of a successful Data Warehouse implementation on BigQuery
- Identify best practices for implementing a Data Warehouse with BigQuery
- Use the Google Cloud console to access public datasets
- Perform queries using the console and analyze query results using client libraries
- Combine ecommerce datasets to create enhanced datasets using BigQuery joins and unions
Module 3 - Migrating to BigQuery
Topics:
- Migration Phases
- Security
- Google Cloud data warehouse Architecture
- Post Migration
- User Adoption
Objectives:
- Assess an existing data warehouse and develop a strategy to migrate it to BigQuery
- Describe best practices for migrating existing data warehouses to BigQuery
- Identify key resources, tools, and partner assets for migrating to BigQuery
- Migrate sample SQL Server data to BigQuery using Striim
- Identify resources to translate product-specific SQL queries to BigQuery Standard SQL
Module 4 - ETL Tools and Positioning
Topics:
- Dataproc
- Cloud Data Fushion
- Dataflow
Objectives:
- Describe the key features of Dataproc, Cloud Data Fusion, and Dataflow
- Migrate Apache Spark Jobs to Dataproc
- Identify best practices for creating Dataflow workflows using Dataflow templates
- Configure Cloud Data Fusion to create a data transformation pipeline joining multiple sources with BigQuery as an output data sink
- Build data pipelines that will ingest data from Cloud Storage into BigQuery using Dataflow
Module 5 - Streaming Analytics
Topics:
- Why Streaming Analytics?
- The Pub/Sub Service
- Dataflow Windows and Triggers
- Dataflow Sources and Sinks
- Migration and Adoption Challenges
Objectives:
- Identify the components of a streaming analytics solution on Google Cloud
- Create a streaming IoT pipeline using Pub/Sub and Kafka
- Explore design patterns and optimization considerations for streaming analytics solutions
- Create and run a streaming Dataflow pipeline that ingests data from Pub/Sub to BigQuery using a Dataflow template
Module 6 - Introduction to Looker as a Data Platform
Topics:
- Looker Platform Overview
- Looker Platform Architecture
- Paradigm Shift: Modeling Language versus Hardcoded SQL
- Core Analytical Concepts
Objectives:
- Navigate the Looker platform
- Describe the Looker platform architecture
- Discover the advantages of Looker Modeling Language (LookML) over hardcoded SQL
- Describe the four core analytical concepts in Looker
- Analyze and visualize data using Explores in Looker
Module 7 - BigQuery Extended Capabilities
Topics:
- BigQuery GIS
- BigQuery ML
Objectives:
- Describe the key features of BigQuery GIS and BigQuery ML
- Analyze data using BigQuery GIS functions and visualize results using BigQuery Geo Viz
- Train and evaluate an ML model with BigQuery ML to predict taxi fares