Practical Data Science (PDS) Specialization

  • 3 courses
  • >
    Advanced
  • >
    3 months (5 hours/week)
  • >
    Antje Barth, Sireesha Muppala, Shelbee Eigenbrode, Chris Fregly
  • >
    Amazon Web Services (AWS)

What you will learn

Prepare data, detect statistical data biases, and perform feature engineering at scale to train models

Automatically train, evaluate, and tune models with automated machine learning (AutoML)

Store and manage machine learning features using a feature store

Debug, profile, tune and evaluate models while tracking data lineage and model artifacts

Build, deploy, monitor, and operationalize end-to-end machine learning pipelines.

Build data labeling and human-in-the-loop pipelines to improve model performance with human intelligence.

Skills you will gain

  • Automated Machine Learning (AutoML)
  • Natural Language Processing with BERT
  • ML Pipelines and ML Operations (MLOps)
  • A/B Testing, Model Deployment, and Monitoring
  • Data Labeling at Scale
  • Data Ingestion
  • Exploratory Data Analysis
  • Statistical Data Bias Detection
  • Multi-class Classification with FastText and BlazingText
  • Feature Engineering and Feature Store
  • Model Training, Tuning, and Deployment with BERT
  • Model Debugging, Profiling, and Evaluation
  • ML Pipelines and MLOps
  • Artifact and Lineage Tracking
  • Distributed Model Training and Hyperparameter Tuning
  • Cost Savings and Performance Improvements
  • Human-in-the-Loop Pipelines

Development environments might not have the exact requirements as production environments. Moving data science and machine learning projects from idea to production requires state-of-the-art skills. You need to architect and implement your projects for scale and operational efficiency. Data science is an interdisciplinary field that combines domain knowledge with mathematics, statistics, data visualization, and programming skills. 

The Practical Data Science Specialization brings together these disciplines using purpose-built ML tools in the AWS cloud. It helps you develop the practical skills to effectively deploy your data science projects and overcome challenges at each step of the ML workflow using Amazon SageMaker. 

This Specialization is designed for data-focused developers, scientists, and analysts familiar with the Python and SQL programming languages who want to learn how to build, train, and deploy scalable, end-to-end ML pipelines – both automated and human-in-the-loop – in the AWS cloud.

Each of the 10 weeks features a comprehensive lab developed specifically for this Specialization that provides hands-on experience with state-of-the-art algorithms for natural language processing (NLP) and natural language understanding (NLU), including BERT and FastText using Amazon SageMaker.

By the end of this program, you will be ready to: 

  1. Ingest, register, and explore datasets
  2. Detect statistical bias in a dataset
  3. Automatically train and select models with AutoML
  4. Create machine learning features from raw data
  5. Save and manage features in a feature store
  6. Train and evaluate models using built-in algorithms and custom BERT models
  7. Debug, profile, and compare models to improve performance
  8. Build and run a complete ML pipeline end-to-end
  9. Optimize model performance using hyperparameter tuning
  10. Deploy and monitor models
  11. Perform data labeling at scale
  12. Build a human-in-the-loop pipeline to improve model performance
  13. Reduce cost and improve performance of data products

Syllabus

Course 1: Analyze Datasets and Train ML Models using AutoML

In the first course, you will learn foundational concepts for exploratory data analysis (EDA), automated machine learning (AutoML), and text classification algorithms. With Amazon SageMaker Clarify and Amazon SageMaker Data Wrangler, you will analyze a dataset for statistical bias, transform the dataset into machine-readable features, and select the most important features to train a multi-class text classifier. You will then perform automated machine learning (AutoML) to automatically train, tune, and deploy the best text-classification algorithm for the given dataset using Amazon SageMaker Autopilot. Next, you will work with Amazon SageMaker BlazingText, a highly optimized and scalable implementation of the popular FastText algorithm, to train a text classifier with very little code.

Enroll

Week 1: Explore the Use Case and Analyze the Dataset

  • Ingest, explore, and visualize a product review data set for multi-class text classification.

Week 2: Data Bias and Feature Importance

  • Determine the most important features in a data set and detect statistical biases.

Week 3: Automated Machine Learning

  • Inspect and compare models generated with automated machine learning (AutoML).

Week 4: Built-in Algorithms

  • Train a text classifier with BlazingText and deploy the classifier as a real-time inference endpoint to serve predictions.

Course 2: Build, Train, and Deploy ML Pipelines using BERT

In the second course, you will learn to automate a natural language processing task by building an end-to-end machine learning pipeline using Hugging Face’s highly-optimized implementation of the state-of-the-art BERT algorithm with Amazon SageMaker Pipelines. Your pipeline will first transform the dataset into BERT-readable features and store the features in the Amazon SageMaker Feature Store. It will then fine-tune a text classification model to the dataset using a Hugging Face pre-trained model, which has learned to understand the human language from millions of Wikipedia documents. Finally, your pipeline will evaluate the model’s accuracy and only deploy the model if the accuracy exceeds a given threshold.

Enroll

Week 1: Feature Engineering and Feature Store

  • Transform a raw text dataset into machine learning features and store features in a feature store.

Week 2: Train, Debug, and Profile a Machine Learning Model

  • Fine-tune, debug, and profile a pre-trained BERT model.

Week 3: Deploy End-To-End Machine Learning Pipelines

  • Orchestrate ML workflows and track model lineage and artifacts in an end-to-end machine learning pipeline.

Course 3: Optimize ML Models and Deploy Human-in-the-Loop Pipelines

In the third course, you will learn a series of performance-improvement and cost-reduction techniques to automatically tune model accuracy, compare prediction performance, and generate new training data with human intelligence. After tuning your text classifier using Amazon SageMaker Hyper-parameter Tuning (HPT), you will deploy two model candidates into an A/B test to compare their real-time prediction performance and automatically scale the winning model using Amazon SageMaker Hosting. Lastly, you will set up a human-in-the-loop pipeline to fix misclassified predictions and generate new training data using Amazon Augmented AI and Amazon SageMaker Ground Truth.

Enroll

Week 1: Advanced Model Training, Tuning and Evaluation

  • Train, tune, and evaluate models using data-parallel and model-parallel strategies and automatic model tuning.

Week 2: Advanced Model Deployment and Monitoring

  • Deploy models with A/B testing, monitor model performance, and detect drift from baseline metrics.

Week 3: Data Labeling and Human-in-the-Loop Pipelines

  • Label data at scale using private human workforces and build human-in-the-loop pipelines.

Program Instructors

Antje Barth Instructor

Senior Developer Advocate, AI and Machine Learning, Amazon Web Services (AWS)

Sireesha Muppala Instructor

Principal Solutions Architect, AI and Machine Learning, Amazon Web Services (AWS)

Shelbee Eigenbrode Instructor

Principal Solutions Architect, AI and Machine Learning, Amazon Web Services (AWS)

Chris Fregly Instructor

Principal Developer Advocate, AI and Machine Learning, Amazon Web Services (AWS)

Sign Up

Be notified of new courses

    Frequently Asked Questions

    Development environments might not have the exact requirements as production environments. Moving data science and machine learning projects from idea to production requires state-of-the-art skills. You need to architect and implement your projects for scale and operational efficiency. Data science is an interdisciplinary field that combines domain knowledge with mathematics, statistics, data visualization, and programming skills. 

    The Practical Data Science Specialization brings together these disciplines using purpose-built ML tools in the AWS cloud. It helps you develop the practical skills to effectively deploy your data science projects and overcome challenges at each step of the ML workflow using Amazon SageMaker. 

    This Specialization is designed for data-focused developers, scientists, and analysts familiar with the Python and SQL programming languages and want to learn how to build, train, and deploy scalable, end-to-end ML pipelines – both automated and human-in-the-loop – in the AWS cloud.

    Each of the 10 weeks features a comprehensive lab developed specifically for this Specialization that provides hands-on experience with state-of-the-art algorithms for natural language processing (NLP) and natural language understanding (NLU), including BERT and FastText using Amazon SageMaker.

    In this Specialization, you will master how to remove obstacles to advance your projects from idea to deployment quickly and to scale to thousands of models or more. Learn how to prepare massive datasets originating from social media channels, mobile/web applications, or other public/private data sources that don’t fit in your local hardware and use them for training models. Understand how to tune models for the highest accuracy and then deploy and manage them over time.

    Practical data science is geared towards handling massive datasets that do not fit in your local hardware and could originate from multiple sources. One of the biggest benefits of developing and running data science projects in the cloud is the agility and elasticity that the cloud offers to scale up and out at a minimum cost.

    The cloud brings together data, low-cost storage, security, and ML services along with high-performance CPUs and GPUs for model training and deployment. As a result, you can store as much data as you need and use high-performance compute elastically, so it’s much faster to realize the value of machine learning. 

    This Specialization provides tools to analyze and clean the data, extract relevant features, and teach you how to use tools purpose-built for ML such as visual development environment, debuggers, profilers, and pipelines to scale models building, training, and deployment.

    1. Prepare data, detect inherent statistical data biases, and perform feature engineering at scale to train models.
    2. Automatically train, evaluate, and tune models with automated machine learning (AutoML)
    3. Store and manage machine learning features using a feature store.
    4. Debug, profile, tune and evaluate models while tracking data lineage and model artifacts
    5. Build, deploy, monitor, and operationalize end-to-end machine learning pipelines.
    6. Build data labeling and human-in-the-loop pipelines to improve model performance with human intelligence.

    By the end of this Specialization, you will be ready to:

    1. Ingest, register, and explore datasets
    2. Detect statistical bias in a dataset
    3. Automatically train and select models with AutoML
    4. Create machine learning features from raw data
    5. Save and manage features in a feature store
    6. Train and evaluate models using built-in algorithms and custom BERT models
    7. Debug, profile, and compare models to improve performance
    8. Build and run a complete ML pipeline end-to-end
    9. Optimize model performance using hyperparameter tuning
    10. Deploy and monitor models
    11. Perform data labeling at scale
    12. Build a human-in-the-loop pipeline to improve model performance
    13. Reduce cost and improve performance of data products

    Learners should have a working knowledge of ML algorithms and principles, be proficient in Python programming at an intermediate level, and be familiar with Jupyter notebooks and statistics. We recommend you complete the Deep Learning Specialization or an equivalent program. 

    Learners should also be familiar with the fundamentals of AWS and cloud computing. Completion of Coursera AWS Cloud Technical Essentials or similar is considered the prerequisite knowledge base.

    This Specialization is designed for data-focused developers, scientists, and analysts familiar with the Python and SQL programming languages who want to learn how to build, train, and deploy scalable, end-to-end ML pipelines – both automated and human-in-the-loop – in the AWS cloud.

    This Specialization consists of three courses. At the rate of 5 hours a week, it typically takes 4 weeks to complete course 1, 3 weeks to complete course 2, and 3 weeks to complete course 3.

    This Specialization was created by Antje Barth, Sireesha Muppala, Shelbee Eigenbrode, and Chris Fregly

     

    Antje Barth is a Sr. Developer Advocate for AI and Machine Learning at Amazon Web Services (AWS). She is co-author of the O’Reilly book – Data Science on AWS. Antje frequently speaks at AI / ML conferences, events, and meetups around the world. Previously, Antje worked in technical evangelism and solutions engineering at Cisco and MapR, focused on data center technologies, big data, and AI applications. Antje is also a co-founder of the Düsseldorf chapter of Women in Big Data.

     

    Sireesha Muppala is an Enterprise Principal SA, AI/ML at Amazon Web Services (AWS) who guides customers on architecting and implementing machine learning solutions at scale. She received her Ph.D. in Computer Science from the University of Colorado, Colorado Springs, and has authored several research papers, white papers, and blog articles. Sireesha frequently speaks at industry conferences, events, and meetups. She co-founded the Denver chapter of Women in Big Data.

     

    Shelbee Eigenbrode is a Principal AI and Machine Learning Specialist Solutions Architect at Amazon Web Services (AWS). She holds 6 AWS certifications and has been in technology for 23 years spanning multiple industries, technologies, and roles. She is currently focusing on combining her DevOps and ML background to deliver and manage ML workloads at scale. With over 35 patents granted across various technology domains, she has a passion for continuous innovation and using data to drive business outcomes. Shelbee co-founded the Denver chapter of Women in Big Data.

     

    Chris Fregly is a Principal Developer Advocate for AI and Machine Learning at Amazon Web Services (AWS) based in San Francisco, California. He is co-author of the O’Reilly book, “Data Science on AWS.” Chris is also the founder of the global “Data Science on AWS” Meetup series.  He regularly speaks at AI and Machine Learning conferences across the world including O’Reilly AI, Open Data Science Conference (ODSC), and Nvidia GPU Technology Conference (GTC).

    The Practical Data Science Specialization is made up of 3 courses.

    You can enroll in the Practical Data Science Specialization on Coursera. You will watch videos and complete assignments on Coursera as well.

    We recommend taking the courses in the prescribed order for a logical and thorough learning experience.

    A Coursera subscription costs $49 / month.

    Yes, Coursera provides financial aid to learners who cannot afford the fee. 

    You can audit the courses in the Practical Data Science Specialization for free. 

    Note that you will not receive a certificate at the end of the course if you choose to audit it for free instead of purchasing it.

    You will receive a certificate at the end of each course if you pay for the courses and complete the programming assignments. There is a limit of 180 days of certificate eligibility, after which you must re-purchase the course to obtain a certificate. If you audit the course for free, you will not receive a certificate.

    If you complete all 4 courses in the Specialization, you will also receive an additional certificate showing that you completed the entire Specialization.

    Visit coursera.org/business for more information, to pick up a plan, and to contact Coursera. For each plan, you decide the number of courses every member can enroll in and the collection of courses they can choose from. 

    1. Go to your Coursera account. 
    2. Click on My Purchases and find the relevant course or Specialization.
    3. Click Email Receipt and wait up to 24 hours to receive the receipt. 
    4. You can read more about it here.