Context
I’m pretty sure that you already had renamed or removed a model in dbt and after few weeks/months you noticed that you have the old model and the new one that co-exist in your data warehouse.
This is particularly annoying because it can be misleading especially when you aren’t the owner of the table and you don’t know which one is deprecated.
Unfortunately, dbt doesn’t offer the possibility to remove with just a command the deprecated models. But the good news is that with macros you can find a way to do it.
Two macros (or one if you want)
I created two macros to do it:
list_deleted_models.sql
{% macro list_deleted_models() -%}
-- First step, listing all the current_models
{%- if execute -%}
{%- set current_models = [] -%}
{%- for node in graph.nodes.values() -%}
{%- do current_models.append(node.name) -%}
{%- endfor -%}
{%- endif -%}
-- Extract the table_catalog, schema, table_name and command 'BASE + the name of the table'
select
table_catalog,
table_schema,
table_name,
concat(
replace(table_type, 'BASE ', ''),
concat(
' `{{target.database}}.{{target.schema}}', '.', concat(table_name, '`')
)
) as command
from `region-eu.information_schema.tables` -- Since I have my datasets set on the EU region this is why I use this to list all existing tables
where
-- I filter all tables that doesn't exist in my current_models variable
upper(table_name) not in (
{%- for model in current_models -%}
'{{ model.upper() }}' {%- if not loop.last -%}, {% endif -%}
{%- endfor -%}
)
-- I filter by using regex in order to only check models in staging/hub/mart/expo/snapshot/seeds layers to avoid removing raw data
and regexp_contains(
lower(table_name), r'^(stg|prep|fact|fct|dim|mart|expo|snapshot|seed)_'
)
and regexp_contains(
lower(table_schema), r'_(staging|seeds|mart|hub|expo|snapshots)$'
)
{%- endmacro %}
delete_old_models.sql
{% macro delete_old_models() -%}
{% set query = list_deleted_models() %}
{% set results = run_query(query) %}
{% set drop_commands = [] %}
{# Process the query results #}
{% if results and results.columns %}
{% for row in results.rows %}
{% set catalog = row['table_catalog'] %}
{% set dataset = row['table_schema'] %}
{% set table = row['table_name'] %}
{# Construct the DROP TABLE command #}
{% set drop_command = "DROP TABLE {}.{}.{}".format(catalog, dataset, table) %}
{% do drop_commands.append(drop_command) %}
{% endfor %}
{% endif %}
{# Execute each DROP TABLE command #}
{% for command in drop_commands %}
{% do run_query(command) %}
{% endfor %}
{% endmacro %}
I split it in two macros to avoid having 1 big macro to maintain but it’s totally relevant to have only one macro to call.
on-run-end
config
Then I use the on-run-end
project config in dbt_project.yml
in order to execute the delete_old_models
macro at each run.
on-run-end:
- "{{ delete_old_models() }}"
Based on that everyday the removed models are deleted in BigQuery.
Conclusion
Here a quick method to delete deprecated models for BigQuery but I’m pretty convinced that it could be done elsewhere.
Source of the creation of this: https://discourse.getdbt.com/t/faq-cleaning-up-removed-models-from-your-production-schema/113