NEW Snowflake Advanced Data Scientist DSA-C02 Exam 2023!!
$129.99
Shop on Udemy

Description

The SnowPro Advanced: Data Scientist exam tests advanced knowledge and skills used toapply comprehensive data science principles, tools, and methodologies using Snowflake. Theexam will assess skills through scenario-based questions and real-world examples. This certification will test the ability to:● Outline data science concepts● Implement Snowflake data science best practices● Prepare data and feature engineering in Snowflake● Train and use machine learning models● Use data visualization to present a business caseDomain                                                   Weightings on Exams1.0 Data Science Concepts                                           15%2.0 Data Pipelining                                                          19%3.0 Data Preparation and Feature Engineering         30%4.0 Model Development                                                  20%5.0 Model Deployment                                                      16%Domain 1.0: Data Science Concepts1.1 Define machine learning concepts for data science workloads.● Machine Learning○ Supervised learning○ Unsupervised learning1.2 Outline machine learning problem types.● Supervised Learning○ Structured Data■ Linear regression■ Binary classification■ Multi-class classification■ Time-series forecasting○ Unstructured Data■ Image classification■ Segmentation● Unsupervised Learning○ Clustering○ Association models1.3 Summarize the machine learning lifecycle.● Data collection● Data visualization and exploration● Feature engineering● Training models● Model deployment● Model monitoring and evaluation (e. g., model explainability, precision, recall, accuracy, confusion matrix)● Model versioning1.4 Define statistical concepts for data science.● Normal versus skewed distributions (e. g., mean, outliers)● Central limit theorem● Z and T tests● Bootstrapping● Confidence intervalsDomain 2.0: Data Pipelining2.1 Enrich data by consuming data sharing sources.● Snowflake Marketplace● Direct Sharing● Shared database considerations2.2 Build a data science pipeline.● Automation of data transformation with streams and tasks● Python User-Defined Functions (UDFs)● Python User-Defined Table Functions (UDTFs)● Python stored procedures● Integration with machine learning platforms (e. g., connectors, ML partners, etc.)Domain 3.0: Data Preparation and Feature Engineering3.1 Prepare and clean data in Snowflake.● Use Snowpark for Python and SQL○ Aggregate○ Joins○ Identify critical data○ Remove duplicates○ Remove irrelevant fields○ Handle missing values○ Data type casting○ Sampling data3.2 Perform exploratory data analysis in Snowflake.● Snowpark and SQL○ Identify initial patterns (i. e., data profiling)○ Connect external machine learning platforms and/or notebooks (e. g. Jupyter)● Use Snowflake native statistical functions to analyze and calculate descriptivedata statistics.○ Window Functions○ MIN/MAX/AVG/STDEV○ VARIANCE○ TOPn○ Approximation/High Performing function● Linear Regression○ Find the slope and intercept○ Verify the dependencies on dependent and independent variables3.3 Perform feature engineering on Snowflake data.● Preprocessing○ Scaling data○ Encoding○ Normalization● Data Transformations○ Data Frames (i. e, Pandas, Snowpark)○ Derived features (e. g., average spend)● Binarizing data○ Binning continuous data into intervals○ Label encoding○ One hot encoding3.4 Visualize and interpret the data to present a business case.● Statistical summaries○ Snowsight with SQL○ Streamlit○ Interpret open-source graph libraries○ Identify data outliers● Common types of visualization formats○ Bar charts○ Scatterplots○ Heat mapsDomain 4.0: Model Development4.1 Connect data science tools directly to data in Snowflake.● Connecting Python to Snowflake○ Snowpark○ Python connector with Pandas support○ Spark connector● Snowflake Best Practices○ One platform, one copy of data, many workloads○ Enrich datasets using the Snowflake Marketplace○ External tables○ External functions○ Zero-copy cloning for training snapshots○ Data governance4.2 Train a data science model.● Hyperparameter tuning● Optimization metric selection (e. g., log loss, AUC, RMSE)● Partitioning○ Cross validation○ Train validation hold-out● Down/Up-sampling● Training with Python stored procedures● Training outside Snowflake through external functions● Training with Python User-Defined Table Functions (UDTFs)4.3 Validate a data science model.● ROC curve/confusion matrix○ Calculate the expected payout of the model● Regression problems● Residuals plot○ Interpret graphics with context● Model metrics4.4 Interpret a model.● Feature impact● Partial dependence plots● Confidence intervalsDomain 5.0: Model Deployment5.1 Move a data science model into production.● Use an external hosted model○ External functions○ Pre-built models● Deploy a model in Snowflake○ Vectorized/Scalar Python User Defined Functions (UDFs)○ Pre-built models○ Storing predictions○ Stage commands5.2 Determine the effectiveness of a model and retrain if necessary.● Metrics for model evaluation○ Data drift /Model decay■ Data distribution comparisons● Do the data making predictions look similar to the training data?● Do the same data points give the same predictions once a model is deployed?● Area under the curve● Accuracy, precision, recall● User defined functions (UDFs)5.3 Outline model lifecycle and validation tools.● Streams and tasks● Metadata tagging● Model versioning with partner tools● Automation of model retraining

logo

Udemy