The workflow comprises the following tasks:
- Fetch reference data from the Postgres database.
- Get the current production data from the web.
- Detect data drift by comparing the reference and current data.
- Append the current data to the existing Postgres database.
- When there is data drift, the following actions are taken:
- Send a Slack message to alert the data team.
- Retrain the model to update its performance.
- Push the updated model to S3 for storage.
This workflow is scheduled to run at specific times, such as 11:00 AM every Monday.
Overall, the workflow includes two types of tasks: data science and data engineering tasks.
Data science tasks, represented by pink boxes, are performed by data scientists and involve data drift…