Google Cloud Certified Professional Data Engineer 2023
$199.99
Shop on Udemy

Description

Designing data processing systemsSelecting the appropriate storage technologies. Considerations include:●  Mapping storage systems to business requirements●  Data modeling●  Trade-offs involving latency, throughput, transactions●  Distributed systems●  Schema designDesigning data pipelines. Considerations include:●  Data publishing and visualization (e. g., BigQuery)●  Batch and streaming data (e. g., Dataflow, Dataproc, Apache Beam, Apache Spark and Hadoop ecosystem, Pub/Sub, Apache Kafka)●  Online (interactive) vs. batch predictions●  Job automation and orchestration (e. g., Cloud Composer)Designing a data processing solution. Considerations include:●  Choice of infrastructure●  System availability and fault tolerance●  Use of distributed systems●  Capacity planning●  Hybrid cloud and edge computing●  Architecture options (e. g., message brokers, message queues, middleware, service-oriented architecture, serverless functions)●  At least once, in-order, and exactly once, etc., event processingMigrating data warehousing and data processing. Considerations include:●  Awareness of current state and how to migrate a design to a future state●  Migrating from on-premises to cloud (Data Transfer Service, Transfer Appliance, Cloud Networking)●  Validating a migrationBuilding and operationalizing data processing systemsBuilding and operationalizing storage systems. Considerations include:●  Effective use of managed services (Cloud Bigtable, Cloud Spanner, Cloud SQL, BigQuery, Cloud Storage, Datastore, Memorystore)●  Storage costs and performance●  Life cycle management of dataBuilding and operationalizing pipelines. Considerations include:●  Data cleansing●  Batch and streaming●  Transformation●  Data acquisition and import●  Integrating with new data sourcesBuilding and operationalizing processing infrastructure. Considerations include:●  Provisioning resources●  Monitoring pipelines●  Adjusting pipelines●  Testing and quality controlOperationalizing machine learning modelsLeveraging pre-built ML models as a service. Considerations include:●  ML APIs (e. g., Vision API, Speech API)●  Customizing ML APIs (e. g., AutoML Vision, Auto ML text)●  Conversational experiences (e. g., Dialogflow)Deploying an ML pipeline. Considerations include:●  Ingesting appropriate data●  Retraining of machine learning models (AI Platform Prediction and Training, BigQuery ML, Kubeflow, Spark ML)●  Continuous evaluationChoosing the appropriate training and serving infrastructure. Considerations include:●  Distributed vs. single machine●  Use of edge compute●  Hardware accelerators (e. g., GPU, TPU)Measuring, monitoring, and troubleshooting machine learning models. Considerations include:●  Machine learning terminology (e. g., features, labels, models, regression, classification, recommendation, supervised and unsupervised learning, evaluation metrics)●  Impact of dependencies of machine learning models●  Common sources of error (e. g., assumptions about data)Ensuring solution qualityDesigning for security and compliance. Considerations include:●  Identity and access management (e. g., Cloud IAM)●  Data security (encryption, key management)●  Ensuring privacy (e. g., Data Loss Prevention API)●  Legal compliance (e. g., Health Insurance Portability and Accountability Act (HIPAA), Children's Online Privacy Protection Act (COPPA), FedRAMP, General Data Protection Regulation (GDPR))Ensuring scalability and efficiency. Considerations include:●  Building and running test suites●  Pipeline monitoring (e. g., Cloud Monitoring)●  Assessing, troubleshooting, and improving data representations and data processing infrastructure●  Resizing and autoscaling resourcesEnsuring reliability and fidelity. Considerations include:●  Performing data preparation and quality control (e. g., Dataprep)●  Verification and monitoring●  Planning, executing, and stress testing data recovery (fault tolerance, rerunning failed jobs, performing retrospective re-analysis)●  Choosing between ACID, idempotent, eventually consistent requirementsEnsuring flexibility and portability. Considerations include:●  Mapping to current and future business requirements●  Designing for data and application portability (e. g., multicloud, data residency requirements)●  Data staging, cataloging, and discovery

logo

Udemy