How to Build a Fully Automated Data Drift Detection Pipeline | by Khuyen Tran | Aug, 2023

The workflow comprises the following tasks:

  1. Fetch reference data from the Postgres database.
  2. Get the current production data from the web.
  3. Detect data drift by comparing the reference and current data.
  4. Append the current data to the existing Postgres database.
  5. When there is data drift, the following actions are taken:
  • Send a Slack message to alert the data team.
  • Retrain the model to update its performance.
  • Push the updated model to S3 for storage.

This workflow is scheduled to run at specific times, such as 11:00 AM every Monday.

Image by Author

Overall, the workflow includes two types of tasks: data science and data engineering tasks.

Data science tasks, represented by pink boxes, are performed by data scientists and involve data drift…

Source link

Leave a Comment