Whether you’re a solo analytics engineer or part of a large data team, documentation is often an afterthought in data projects, but in dbt, it’s a first-class citizen that can dramatically improve your team’s understanding and efficiency.

dbt (data build tool) is a popular open-source framework for transforming raw data in your warehouse with SQL and maintaining modular, testable pipelines.

Who this guide is for:

  • Analytics engineers using dbt for data transformation
  • Data engineers establishing documentation standards
  • Data team leads looking to improve team collaboration
  • Anyone transitioning from ad-hoc SQL to dbt pipelines

This guide walks you through creating comprehensive, meaningful documentation for your dbt projects, with practical examples from real-world implementations.

Why Documentation Matters in dbt

Documentation in dbt isn’t just about writing descriptions; it’s about creating a living, breathing map of your data transformations. Good documentation:

  • Provides clarity on data lineage - Traces data from source to consumption
  • Helps new team members understand complex transformations - Reduces onboarding time
  • Acts as a self-service resource for analysts and stakeholders - Reduces ad-hoc questions
  • Supports data governance and compliance efforts - Critical for regulated industries
  • Improves data quality - Well-documented data is typically better understood and maintained

Here’s an example of before/after documentation:

Before documentation - Hard to understand the purpose and relationships:

version: 2
 
models:
  - name: order_items
    columns:
      - name: id
      - name: order_id
      - name: product_id
      - name: quantity
      - name: price

After documentation - Clear purpose and relationships:

version: 2
 
models:
  - name: order_items
    description: >
      Individual line items within customer orders. 
      One order can have multiple items.
    columns:
      - name: id
        description: Surrogate key, format 'oi_[order_id]_[product_id]'
        data_type: INTEGER
        tests:
          - unique
          - not_null
      - name: order_id
        description: Foreign key to orders table
        data_type: INTEGER
        tests:
          - relationships:
              to: ref('orders')
              field: order_id
      - name: product_id
        description: Foreign key to products table
        data_type: INTEGER
        tests:
          - relationships:
              to: ref('products')
              field: product_id
      - name: quantity
        description: Number of units purchased (integer)
        data_type: INTEGER
      - name: price
        description: Price per unit in USD at time of purchase
        data_type: FLOAT

It’s pretty straightforward that the second example is way more convenient for both technical and non-technical users to have as much information as possible regarding the order_items model and its columns.

Documentation Strategies in dbt

1. Column-Level Documentation

Use .md files to document individual columns with precise, informative descriptions. It’s pretty useful because each time I want to document the order_id column in different models I can use the {{ doc() }} function (more information on dbt documentation).

columns:
      - name: order_id
        description: '{{ doc("order_id") }}'
        data_type: STRING

Here’s an example based on our orders.md on how to document using Markdown:

{% docs order_id %}
Unique identifier for each order in the Shopify system. 
Primary key for tracking individual transactions.
{% enddocs %}
 
{% docs session_id %}
Unique identifier of a browsing session in Shopify.
 
Helps link multiple user interactions (e.g., page views, cart additions) within the same visit.
{% enddocs %}

When to use {{ doc() }} vs inline description?

  • Use {{ doc() }} when you have reusable documentation blocks that appear in multiple models or columns (e.g., order_id, customer_id). This avoids duplication and keeps your docs DRY (Don’t Repeat Yourself).
  • Use inline description: when the context is unique to that specific model or column. For example, a calculated metric that only appears once, or something highly contextual to a particular dataset.

An example of how it looks in the dbt docs UI

Best Practices:

  • Be specific about the column’s purpose
  • Explain any calculations or transformations
  • Mention the data type and format
  • Provide context about the source system
  • Document business rules affecting the data

✅ Good vs. ❌ Bad Description Examples

Bad:

{% docs total_price %}
Order amount
{% enddocs %}

❌ Too vague — it doesn’t explain what is included in the amount, what currency is used, or where it comes from.

Good:

{% docs total_price %}
Total monetary value of the order in USD.  
Includes the cost of all items, taxes, discounts, and shipping fees.
{% enddocs %}

✅ Clear, specific, and informative. Includes data type, units, and all components included in the total.

NOTE

In the rest of the tutorial I describe directly the column or models in the yaml to give context information but in the repository I’m using doc blocks.

2. Model-Level Documentation

In your schema.yml files, add descriptions that explain the purpose and transformation logic of each model:

version: 2
 
models:
  - name: fct_revenue_by_campaign
    description: >
      Aggregates order details with campaign information to provide a comprehensive view of campaign performance. Used by marketing team to evaluate ROI on campaigns and optimize budget allocation.
    columns:
      - name: campaign_id
        description: Unique identifier for marketing campaigns
	    data_type: INTEGER
      - name: orders_count
        description: Number of distinct orders attributed to each campaign
	    data_type: INTEGER
        tests:
          - not_null

Best Practices:

  • Describe the model’s business purpose, not just its technical function
  • Explain any complex joins or transformations
  • Link related models or sources
  • Add tests to validate data quality
  • Include information about who uses the model and for what purpose

3. Enhancing Model Documentation with meta and tags

In addition to writing good descriptions, dbt allows you to enrich your models with metadata and classification using the meta and tags fields in your schema.yml files.

Use tags to categorize models

Tags help you classify models by domain, purpose, or team ownership. You can filter and search by tags in the dbt Docs UI, or use them in automation workflows like CI/CD, testing, or documentation generation.

Examples:

`tags: ["marketing", "finance", "core_data", "daily_refresh"]`

Use meta for structured metadata

The meta field allows you to store additional model-level information that can be used internally or by external tools like data catalogs and dashboards. This structured metadata can improve documentation and enable automation.

Useful fields to include:

meta:
  owner: "marketing_analytics"
  data_owner: "jane.smith@company.com"
  refreshed: "daily"
  upstream_sources: ["shopify", "google_analytics"]
  related_dashboards: ["marketing_campaign_overview", "executive_summary"]
  contains_pii: false
  sla_hours: 4
  maturity: "production"  # vs. "development" or "deprecated"

You can include any custom key-value pairs relevant to your team. Some use cases:

  • Identifying model owners for alerts or governance
  • Indicating data maturity (development, production, deprecated)
  • PII identification for compliance
  • SLA requirements for monitoring
  • Related business processes or dashboards

For more information, see the dbt official documentation on meta fields.

4. Source Documentation

Document your source tables to provide transparency about raw data:

version: 2
 
sources:
  - name: shopify
    description: >
      Raw e-commerce data from Shopify platform.
      Data is loaded via Fivetran connector with 15-minute sync frequency.
    tables:
      - name: orders
        description: Contains all customer order transactions
        columns:
          - name: order_id
            description: Unique identifier for Shopify orders
            data_type: INTEGER
            tests:
              - unique
              - not_null
          - name: customer_id
            description: Reference to the customer who placed the order
            data_type: INTEGER
          - name: created_at
            description: Timestamp when the order was created in UTC
            data_type: TIMESTAMP
          - name: total_amount
            description: Total order amount in USD
            data_type: FLOAT

An example of documented source below:

5. Documenting Tests

Tests are a critical part of data quality, and documenting their purpose helps team members understand expectations for the data.

version: 2
 
models:
  - name: customers
    description: Deduplicated customer records from multiple sources
    columns:
      - name: customer_id
        description: Unique identifier for each customer
        data_type: INTEGER
        tests:
          - unique:
              severity: error
              meta:
                description: "Each customer should appear exactly once in this table"
          - relationships:
              to: ref('orders')
              field: customer_id
              meta:
                description: "All customers should have at least one order"

Advanced Documentation Techniques

Documentation Generation & Hosting

Use dbt docs generate to create interactive documentation:

# Generate documentation
dbt docs generate
 
# Serve documentation locally
dbt docs serve --port 8080

This creates a web interface at http://localhost:8080 where team members can explore your entire data model.

Hosting Options for Production

Since the website generated by dbt docs generate is a static website, these hosting options are perfect for production environments:

  1. Static Site Hosting:
    • Amazon S3 + CloudFront (or equivalent on GCP / Azure)
    • GitHub Pages
    • Netlify or Vercel
  2. Internal Knowledge Base - Embed within internal wikis like Confluence

Another tutorial will be published soon in order to explain how you could deploy the dbt documentation on GitHub Pages.

Integrating Documentation into Your Workflow

Documentation is most effective when it’s integrated into your development workflow:

Documentation-First Approach

  1. Start with documentation before writing SQL
    • Define the purpose and expected columns
    • Align with stakeholders on naming and definitions
    • Document expected business rules
  2. Include documentation in code reviews
    • Make documentation a required part of pull requests
    • Add documentation checklist items to PR templates
  3. Automate documentation checks
    • Use CI/CD to ensure models have descriptions
    • Flag models or columns missing documentation

Example PR Template

## Changes
- Added new customer segmentation model
 
## Documentation
- [x] Model has a clear description
- [x] All columns are documented
- [x] Added appropriate tests
- [x] Updated lineage documentation for downstream models

Common Documentation Pitfalls to Avoid

  1. Vague Descriptions:

    • Bad: “This table has order data”
    • Good: “Contains completed e-commerce orders from Shopify, including transaction details and customer information”
  2. Outdated Documentation:

    • Treat documentation as code
    • Version control your docs
    • Update documentation with each model change
  3. Overlooking Transformations:

    • Explain complex calculations
    • Document any data cleaning or aggregation logic
  4. Inconsistent Standards:

    • Establish team-wide documentation conventions
    • Use templates for common model types
  5. Documentation/Code Mismatch:

    • Ensure documentation reflects the current implementation
    • Review documentation during refactoring

Documentation for Different Audiences

One last tip: it’s also important to tailor your documentation for different user groups depending on their business/technical skills. A Chief Marketing Officier would prefer to understand the business outcomes of a model while an Analytics Engineer would focus more on the calculation method.

Technical Users (Data Engineers, Analytics Engineers):

  • Include implementation details
  • Document optimization techniques
  • Explain complex SQL operations

Business Users (Analysts, Stakeholders):

  • Focus on business meaning and context
  • Explain calculation methodologies
  • Link to relevant business processes

An example of dual-purpose documentation:

models:
  - name: fct_customer_ltv
    description: >
      Customer lifetime value calculations.
      
      **Business Context:** Used to identify high-value customers for
      retention marketing programs.
      
      **Technical Details:** Uses a 3-year rolling window and discounted
      cash flow methodology with 10% annual discount rate.

Conclusion

Effective dbt documentation transforms your data project from a black box into a transparent, understandable system. By investing time in clear, comprehensive documentation, you create a valuable resource that enhances collaboration, reduces onboarding time, and improves overall data literacy.

Documentation in dbt is more than just comments—it’s a critical component of your data infrastructure that enables self-service analytics and supports data governance efforts.

Summary of Best Practices

AreaBest Practices
Column Documentation• Be specific and detailed
• Include data types and units
• Explain transformations
Model Documentation• Focus on business purpose
• Document complex logic
• Link related models
Metadata• Use consistent tags
• Add ownership information
• Document SLAs and refresh schedules
Workflow• Document before coding
• Review docs in PRs
• Keep documentation updated
Usability• Consider different audiences
• Host docs for easy access
• Maintain consistency

You can explore a live version of the dbt documentation for this tutorial project here:
🌐 dbt Documentation Example (GitHub Pages)

👉 The full source code for the project is available on GitHub:
📂 p-munhoz/dbt-documentation-tutorial

Have your own best practices? Share them with the community to help build better data culture together!

Happy documenting! 📝