Unit testing dbt models has always been one of the most critical missing pieces of the dbt ecosystem. This article proposes a new unit testing approach that relies on standards and dbt best practices
Ever since dbt introduced software engineering best practices to the realm of data engineering, its functionalities and the ecosystem around it have kept expanding to cover yet more areas of the data transformation space.
However, one essential piece of the “data engineering with software engineering best practices” puzzle remains elusive and an unsolved problem: unit testing.
Justifying the importance of unit tests, why they’re critical for any line of code before it can be called “production-ready”, and why they’re different from dbt Tests or data quality tests is something that has already been brilliantly tackled and explained. But if we wanted to summarize their importance in a one-minute elevator pitch, it’d be the following:
In data engineering there are generally two different elements that we want to test: the data and our code — dbt Tests (and other data quality systems/tools) allow us to test the data, while unit tests allow us to test our code.
With the above in mind, it’s only natural that there have been multiple initiatives by the community to enhance dbt with an open-source unit testing capability (like Equal Experts’ dbt Unit Testing package or GoDataDriven’s dbt-focused Pytest plugin). However, these packages remain limited in functionalities and have a steep learning curve.
This article introduces a different approach that’s much simpler yet more elegant, relying on standards and dbt best practices to implement a scalable and reliable unit testing process.
Before diving into the approach, let’s first define the level at which we want to run our unit tests. The question to answer…