1. Introduction: Why Data Contracts Matter in Modern Analytics
Modern analytics stacks are becoming increasingly complex, with data flowing from numerous sources through multiple transformation layers before reaching dashboards, reports, and ML models. Without clear agreements about data structure, meaning, and quality, this complexity quickly leads to instability and distrust.
This is where data contracts come in, there are formal agreements that define what data should look like at different stages of your pipeline.
Data contracts aren’t just nice-to-have documentation; they’re executable specifications that verify your data matches expected patterns, ensuring:
- Prevention of silent data pipeline failures
- Immediate detection of schema violations
- Clearer communication between data producers and consumers
- Reduced time spent debugging unexpected data issues
In this guide, we’ll explore how to implement robust data contracts in dbt using the recently added schema contract enforcement features. You’ll learn implementation strategies, best practices, and how to integrate these contracts into your broader data quality framework.
2. Understanding dbt Contracts: More Than Just Types
Since version 1.3, dbt has offered contract enforcement capabilities that go beyond simple typing systems. Here’s what they enable you to do:
Core Contract Capabilities
- Define expected column types for models
- Enforce presence of required columns
- Control contract strictness to match your team’s needs
- Validate against physical schemas in your data warehouse
What’s New in dbt-core 1.5+
dbt-core 1.5+ enhanced contract enforcement with:
- Constraints on column values (e.g., non-null)
- Per-column policies for selective enforcement
- Graceful contract evolution with better error messages
These features have transformed dbt contracts from basic type checking to comprehensive schema management tools that can enforce complex rules across your data models.
3. When to Use Contracts: Strategic Implementation
Data contracts aren’t needed everywhere, and over-implementing them can create unnecessary maintenance overhead. Here’s a strategic approach to where contracts deliver the most value:
High-Value Use Cases
- Interface layers between teams or systems
- Published data products consumed by multiple stakeholders
- Critical analytical tables that power key business decisions
- External data sharing with partners or customers
- Models with strict SLAs where failures are costly
Lower-Value Use Cases
- Internal intermediate models
- Exploratory or experimental models
- Rapidly evolving models during development
📌 Key Insight: Focus contract enforcement where stability and reliability matter most: interfaces, outputs, and critical business entities. Don’t try to contract everything at once.
4. How to Implement dbt Contracts: A Step-by-Step Guide
Let’s walk through implementing contracts in dbt from basic to advanced patterns.
Setting Up Your Project for Contracts
First, make sure you’re using dbt version 1.3 or later. Then, add these configurations to your dbt_project.yml
to enable contracts:
# dbt_project.yml
models:
your_project_name:
+contract: false # Default: don't enforce contracts
marts: # Apply contracts to exposed marts/dimensional models
+contract: true # Enable contract enforcement
Basic Column-Level Contracts
For your first model contract, focus on defining the essential columns and their expected types:
# models/marts/core/schema.yml
version: 2
models:
- name: dim_customers
description: "Core customer dimension with validated schema"
config:
contract:
enforced: true
columns:
- name: customer_id
data_type: varchar
description: "Primary key for the customer dimension"
constraints:
- type: not_null
- type: unique
- name: customer_email
data_type: varchar
description: "Customer email address"
constraints:
- type: not_null
- name: customer_name
data_type: varchar
description: "Customer full name"
- name: signup_date
data_type: date
description: "Date when customer signed up"
- name: total_orders
data_type: integer
description: "Count of customer's lifetime orders"
constraints:
- type: not_null
- type: greater_than_or_equal_to:
value: 0
What Happens Behind the Scenes
When you run dbt build
, dbt will:
- Generate a contract validation query for
dim_customers
- Compare the model’s actual schema against the expected contract
- Fail the build if:
- Required columns are missing
- Column data types don’t match
- Column constraints are violated
Database-Specific Data Types
dbt contracts handle database-specific types through abstraction. Here’s a reference table for common data types across warehouses:
dbt Type | Snowflake | BigQuery | Redshift | Postgres |
---|---|---|---|---|
varchar | VARCHAR | STRING | VARCHAR | VARCHAR |
integer | INTEGER | INT64 | INTEGER | INTEGER |
float | FLOAT | FLOAT64 | FLOAT | FLOAT |
numeric | NUMERIC | NUMERIC | NUMERIC | NUMERIC |
boolean | BOOLEAN | BOOL | BOOLEAN | BOOLEAN |
timestamp | TIMESTAMP | TIMESTAMP | TIMESTAMP | TIMESTAMP |
date | DATE | DATE | DATE | DATE |
array | ARRAY | ARRAY | SUPER | ARRAY |
object | OBJECT | STRUCT | SUPER | JSONB |
Use these types in your contract definitions for cross-database compatibility.
5. Advanced Contract Enforcement Strategies
Once you’re comfortable with basic contracts, you can implement more sophisticated enforcement approaches.
Contract Strictness Levels
dbt offers different levels of contract enforcement:
# models/marts/finance/schema.yml
models:
- name: fct_transactions
config:
contract:
enforced: true
strictness: strict # Options: strict, non-strict
Strictness Levels:
- strict: Requires exact schema match (columns, types, constraints)
- non-strict: Only enforces defined columns, allows additional columns
Partial Contracts
For large models, you can focus contract enforcement on critical columns:
# models/marts/finance/schema.yml
models:
- name: large_analytical_model
config:
contract:
enforced: true
strictness: non-strict
columns:
# Only define and enforce contracts on critical columns
- name: transaction_id
data_type: varchar
- name: amount
data_type: numeric(18,2)
# Other columns exist but aren't enforced
Conditional Contract Enforcement
Use dbt’s macro system to conditionally enforce contracts in different environments:
-- models/marts/core/dim_products.sql
{{
config(
contract = {
'enforced': env_var('DBT_ENVIRONMENT', 'development') == 'production'
}
)
}}
select
product_id,
product_name,
category,
price
from {{ ref('stg_products') }}
This approach allows you to:
- Enforce contracts strictly in production
- Be more permissive during development and testing
6. Real-World Implementation: Data Contract Patterns
Let’s explore practical contract patterns for different types of models.
Pattern 1: Core Dimensional Models
For dimension tables that represent core business entities:
# models/marts/core/schema.yml
models:
- name: dim_customers
config:
contract:
enforced: true
columns:
- name: customer_id
data_type: varchar
constraints:
- type: not_null
- type: unique
- name: customer_email
data_type: varchar
tests:
- not_null
- unique
- name: first_name
data_type: varchar
- name: last_name
data_type: varchar
- name: full_name
data_type: varchar
- name: signup_date
data_type: date
- name: customer_status
data_type: varchar
constraints:
- type: accepted_values:
values: ['active', 'inactive', 'churned']
- name: is_deleted
data_type: boolean
constraints:
- type: not_null
Pattern 2: Fact Tables with Constraints
For fact tables that capture business events:
# models/marts/sales/schema.yml
models:
- name: fct_orders
config:
contract:
enforced: true
columns:
- name: order_id
data_type: varchar
constraints:
- type: not_null
- type: unique
- name: customer_id
data_type: varchar
constraints:
- type: not_null
tests:
- relationships:
to: ref('dim_customers')
field: customer_id
- name: order_date
data_type: date
constraints:
- type: not_null
- name: order_status
data_type: varchar
constraints:
- type: not_null
- type: accepted_values:
values: ['pending', 'processing', 'shipped', 'delivered', 'cancelled']
- name: item_count
data_type: integer
constraints:
- type: not_null
- type: greater_than_or_equal_to:
value: 1
- name: order_amount
data_type: numeric(18,2)
constraints:
- type: not_null
- name: shipping_cost
data_type: numeric(18,2)
- name: tax_amount
data_type: numeric(18,2)
- name: total_amount
data_type: numeric(18,2)
constraints:
- type: not_null
Pattern 3: Data Product APIs
For models explicitly exposed to consumers:
# models/apis/public_data_products/schema.yml
models:
- name: product_api_daily_sales
description: |
Public data product showing daily sales aggregates.
This model has a strict contract that will not change
without explicit versioning and migration support.
config:
contract:
enforced: true
strictness: strict
columns:
- name: date_day
data_type: date
constraints:
- type: not_null
- type: unique
- name: product_id
data_type: varchar
constraints:
- type: not_null
- name: product_name
data_type: varchar
constraints:
- type: not_null
- name: category
data_type: varchar
constraints:
- type: not_null
- name: total_quantity_sold
data_type: integer
constraints:
- type: not_null
- type: greater_than_or_equal_to:
value: 0
- name: total_revenue
data_type: numeric(18,2)
constraints:
- type: not_null
- name: average_unit_price
data_type: numeric(18,2)
7. Implementing Contracts with Existing Tests
Data contracts and tests serve complementary purposes:
- Contracts enforce structural expectations (columns, types)
- Tests verify data quality expectations (uniqueness, relationships)
Here’s how to implement both effectively:
Combine Contracts with Tests
For comprehensive validation, combine contracts with tests:
# models/marts/core/schema.yml
models:
- name: dim_products
config:
contract:
enforced: true
columns:
- name: product_id
data_type: varchar
constraints:
- type: not_null
tests:
- unique
- name: product_name
data_type: varchar
constraints:
- type: not_null
tests:
- not_null_proportion:
at_least: 0.99 # Allow up to 1% missing names
- name: category_id
data_type: varchar
constraints:
- type: not_null
tests:
- relationships:
to: ref('dim_categories')
field: category_id
- name: price
data_type: numeric(18,2)
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1000
mostly: 0.95 # Allow some outliers
Testing vs. Constraints: When to Use Each
Validation Type | Use Constraints When | Use Tests When |
---|---|---|
Not null checks | Critical for model functionality | Monitoring data quality |
Uniqueness | Core identity requirement | Statistical validation |
Accepted values | Small, stable set of values | Larger, changing set of values |
Data boundaries | Hard limits that shouldn’t be crossed | Statistical ranges with exceptions |
Relationships | N/A - Use tests for this | Always use tests for relationships |
📌 Best Practice: Use constraints for structural guarantees that should never be violated, and tests for data quality checks that may have exceptions or require monitoring.
8. Evolving Data Contracts Over Time
Data contracts shouldn’t be rigid - they need to evolve. Here’s how to manage that evolution:
Contract Versioning Strategy
-
Minor changes (adding optional columns, relaxing constraints):
- Can be done without breaking consumers
- Update documentation and notify users
-
Major changes (removing columns, changing types, adding required fields):
- Create a new version of the model
- Support both versions during migration period
- Explicitly deprecate the old version
Example: Versioning a Contract
# Original model
models:
- name: customer_api_v1
config:
contract:
enforced: true
columns:
- name: customer_id
data_type: varchar
- name: email
data_type: varchar
- name: name
data_type: varchar
When adding a breaking change:
# New version with breaking changes
models:
- name: customer_api_v2
config:
contract:
enforced: true
columns:
- name: customer_id
data_type: varchar
- name: email
data_type: varchar
- name: first_name # Split name into components
data_type: varchar
- name: last_name
data_type: varchar
- name: phone # New required field
data_type: varchar
constraints:
- type: not_null
# Keep old version during transition
- name: customer_api_v1
config:
contract:
enforced: true
materialized: view # Make it a view on top of v2
columns:
- name: customer_id
data_type: varchar
- name: email
data_type: varchar
- name: name
data_type: varchar
With corresponding SQL for backward compatibility:
-- models/apis/customer_api_v1.sql
{{
config(
contract = {
'enforced': true
},
materialized = 'view'
)
}}
select
customer_id,
email,
concat(first_name, ' ', last_name) as name
from {{ ref('customer_api_v2') }}
9. Handling Failures and Troubleshooting
When contracts fail, you need clear debugging paths:
Common Contract Failure Scenarios
Failure Type | Example Error | Troubleshooting Steps |
---|---|---|
Missing column | Column 'customer_status' not found in model | Check model SQL for missing column, verify it’s being selected |
Type mismatch | Expected type 'numeric', got 'varchar' | Examine source data, add explicit casting in model |
Constraint violation | Not null constraint failed for column 'order_id' | Check for null handling in joins, verify source data quality |
Contract reference error | Contract not found for model 'dim_products' | Check schema file paths, verify model name spelling |
Debugging Contract Issues
When you encounter contract errors, here’s a systematic approach:
- Examine the error message for specific column and type information
- Preview model data with a simple SELECT to verify actual types and values
- Check model SQL for missing columns or incorrect transformations
- Verify upstream data hasn’t changed unexpectedly
- Consider if contract is too strict for the current use case
Example troubleshooting command:
# Run with --fail-fast to stop at the first error
dbt build --select my_model --fail-fast
Then check the compiled SQL and actual output:
# Preview model output with inferred column types
dbt compile --select my_model
# Check the compiled SQL in ./target/compiled/{project_name}/my_model.sql
# Run this in your warehouse to examine actual data and types
10. Advanced Patterns: Beyond Basic Contracts
Let’s explore some advanced patterns for large-scale implementations:
Contract Inheritance
For related models that share similar contracts:
# Define a base contract using YAML anchors
base_contracts:
&user_contract
columns:
- name: user_id
data_type: varchar
constraints:
- type: not_null
- type: unique
- name: email
data_type: varchar
constraints:
- type: not_null
- name: signup_date
data_type: timestamp
models:
# Inherit the base contract
- name: dim_users
config:
contract:
enforced: true
columns: *user_contract # Reference the base contract
# Extend the base contract
- name: dim_premium_users
config:
contract:
enforced: true
columns:
# Include base contract
- *user_contract
# Add more columns
- name: subscription_level
data_type: varchar
- name: monthly_fee
data_type: numeric(12,2)
Automated Contract Generation
For existing models without contracts, you can bootstrap contracts using dbt’s run-operation:
-- macros/generate_model_contract.sql
{% macro generate_model_contract(model_name) %}
{% set relation = adapter.get_relation(
database=target.database,
schema=target.schema,
identifier=model_name
) %}
{% if relation %}
{% set columns = adapter.get_columns_in_relation(relation) %}
{% if execute %}
{{ log('# Contract for ' ~ model_name ~ ':', info=True) }}
{{ log('columns:', info=True) }}
{% for column in columns %}
{{ log(' - name: ' ~ column.name, info=True) }}
{{ log(' data_type: ' ~ column.data_type, info=True) }}
{% endfor %}
{% endif %}
{% else %}
{{ exceptions.raise_compiler_error("Model " ~ model_name ~ " does not exist in the current environment.") }}
{% endif %}
{% endmacro %}
Run it to generate a contract yaml:
dbt run-operation generate_model_contract --args '{"model_name": "dim_customers"}'
11. Integration with dbt Project Structure
Contracts should be integrated thoughtfully into your overall dbt project structure:
Organizing Contract Files
For large projects, consider these options:
-
Embedded in schema.yml (simplest approach)
models/ marts/ core/ schema.yml # Contains both tests and contracts
-
Dedicated contract files (for complex projects)
models/ marts/ core/ schema.yml # Contains tests and docs contracts.yml # Contains only contracts
Contract Implementation by Layer
Different layers of your dbt project require different contract approaches:
Layer | Contract Approach | Example |
---|---|---|
Sources | Monitoring, not enforcement | Source freshness checks |
Staging | Light contracts on critical fields | Basic type enforcement |
Intermediate | Minimal contracts | Focus on critical models only |
Marts/Dimensions | Comprehensive contracts | Full schema enforcement |
APIs/Exposed models | Strict contracts | Version and document carefully |
Naming Conventions for Versioned Contracts
For explicit contract versioning:
models/
apis/
v1/
customer_api.sql
schema.yml
v2/
customer_api.sql
schema.yml
12. Integrating Contracts with Your Data Development Lifecycle
To get maximum value from contracts, integrate them into your full development cycle:
CI/CD Integration
-
Run contract validation in CI:
# .github/workflows/dbt-contracts.yml name: Validate dbt Contracts on: pull_request: branches: [ main ] jobs: validate-contracts: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup dbt uses: dbt-labs/dbt-github-actions/setup@v1.0 - name: Run contract validation run: dbt compile --models tag:contract-critical
-
Pre-commit hooks for early feedback:
# .pre-commit-config.yaml repos: - repo: local hooks: - id: dbt-compile name: dbt compile entry: dbt compile --models tag:contract-critical language: system pass_filenames: false
Documentation Integration
Enhance your dbt docs with contract information:
# models/marts/core/schema.yml
models:
- name: dim_customers
description: |
Core customer dimension table.
## Contract Information
This model has a strict contract that guarantees:
- Every row has a unique customer_id
- Email addresses are always present
- Customer status is one of: active, inactive, churned
See our [data contract documentation](link-to-docs) for details on
breaking vs. non-breaking changes.
config:
contract:
enforced: true
This documentation will appear in your dbt docs site, helping users understand the guarantees provided by the contract.
13. Measuring Contract Effectiveness
To demonstrate the value of contracts, track metrics like:
-
Contract coverage:
- % of critical models with contracts
- % of columns under contract enforcement
-
Contract violations:
- Count of contract failures caught in CI/CD
- Time saved by early detection
-
Consumer impact:
- Reduction in downstream data quality issues
- Decreased time spent debugging schema issues
Use these metrics to guide your contract implementation strategy and show the business value of your data quality investments.
14. Conclusion: The Future of Data Contracts in dbt
Data contracts represent a significant advancement in how we manage data quality and reliability in analytics workflows. By defining and enforcing expectations about data structure directly in dbt, we create more resilient pipelines and clearer communication between data producers and consumers.
As you implement data contracts in your organization:
- Start small with your most critical models
- Balance strictness against development velocity
- Integrate contracts with your existing testing strategy
- Document your contracts for stakeholders
- Version and evolve contracts intentionally
Remember that contracts are not a silver bullet for data quality, but part of a comprehensive approach that includes testing, monitoring, and governance. Use them strategically to reinforce the foundation of your data platform and build trust in your analytics outputs.
What are your experiences with data contracts in dbt? Have you found certain patterns particularly effective? Share your thoughts in the comments!
Additional Resources: