You’ve hit the point where copy-pasting SQL between models feels wrong. The same date truncation logic appears in five places. The same null-handling pattern keeps showing up. Your instinct says “make a macro,” and that instinct is worth questioning.
Most dbt projects have too many macros, not too few. The drive to eliminate duplication creates abstractions that obscure what the SQL actually does. A model that requires reading four macro files just to understand a single query is unreadable. Good macro design creates code that your future self (and your teammates) can maintain, not just code with fewer repeated lines. If you’re looking for macros to adopt rather than build, see essential macros every dbt project needs. For the fundamentals of Jinja in dbt, see the dbt macros guide.
The rule of three
The first time you write a pattern, just write it. Inline. No macro.
The second time, note the pattern. Maybe add a comment: “Similar logic in base__shopify__orders.” Still no macro.
The third time, you have enough information to extract something useful. You’ve seen how the pattern actually gets used. You know which parts vary and which stay constant. Now you can build an abstraction that fits reality instead of guessing at what you might need.
This rule comes from software engineering practice, but dbt’s own documentation endorses it: “Favor readability when mixing Jinja with SQL, even if it means repeating some lines.”
In practice, the progression looks like this:
-- First time: inlineSELECT order_id, ROUND(amount_cents / 100.0, 2) AS order__amount_dollarsFROM {{ ref('base__shopify__orders') }}-- Second time: still inline, but you notice the patternSELECT payment_id, ROUND(amount_cents / 100.0, 2) AS payment__amount_dollarsFROM {{ ref('base__stripe__payments') }}-- Third time: now extract the macro{% macro cents_to_dollars(column_name, scale=2) %} ROUND({{ column_name }} / 100.0, {{ scale }}){% endmacro %}The temptation is to skip ahead. You see the pattern on day one and think, “I’ll definitely use this again.” Sometimes you’re right. More often, the second and third uses have subtle differences that break your original abstraction. Waiting lets you build the right abstraction instead of the imagined one.
The cost of over-abstraction
Premature macros create three problems:
Reduced readability. Every macro reference is a mental context switch. The reader has to open another file, understand the macro’s logic, then return to the model. With inline code, everything is visible in one place.
Increased complexity. To handle all the cases you imagined, you add parameters. Each parameter is a decision point. A macro with seven parameters is harder to use correctly than inline code would have been.
Fragile code. When a macro serves five models, changes get scary. You want to fix the behavior for one model but can’t risk breaking the others. So you add another parameter, making the complexity problem worse.
The rule of three is about gathering enough information to build the right abstraction, not just counting occurrences.
Single responsibility macros
Once you’ve decided a macro is worth creating, keep it focused on one job.
This macro tries to do too much:
{% macro process_amount(column, apply_discount=false, discount_rate=0.1, convert_currency=false, target_currency='USD') %} {% set result = column %} {% if apply_discount %} {% set result = result ~ ' * (1 - ' ~ discount_rate ~ ')' %} {% endif %} {% if convert_currency %} {% set result = result ~ ' * get_exchange_rate(\'' ~ target_currency ~ '\')' %} {% endif %} {{ result }}{% endmacro %}This macro has five parameters and combines three different operations. Using it requires understanding all the flags, testing it means covering every combination, and changing the discount logic risks breaking currency conversion.
Break it into focused macros instead:
{% macro cents_to_dollars(column_name, scale=2) %} ROUND({{ column_name }} / 100.0, {{ scale }}){% endmacro %}
{% macro apply_discount(amount_column, discount_rate) %} {{ amount_column }} * (1 - {{ discount_rate }}){% endmacro %}
{% macro convert_currency(amount_column, target_currency) %} {{ amount_column }} * {{ get_exchange_rate(target_currency) }}{% endmacro %}Each macro does one thing, and the name explains what. Composing them in a model is explicit:
SELECT order_id, {{ convert_currency( apply_discount(cents_to_dollars('amount_cents'), 0.1), 'EUR' ) }} AS order__discounted_amount_eurFROM {{ ref('base__shopify__orders') }}Yes, the model code is longer than a single process_amount call would be. But anyone reading it can see exactly what’s happening. When finance asks “how do we calculate discounted amounts in EUR?”, the answer is right there in the SQL.
Warning sign: parameter explosion. When a macro grows past five or six parameters, it’s probably doing too much. Look for ways to split it. If splitting doesn’t work cleanly, the abstraction might not be the right one.
Naming conventions that scale
A macro’s name is its documentation at the call site. Make it count.
Use verb prefixes for actions:
get_for macros that retrieve data:get_column_values,get_latest_partitiongenerate_for macros that create SQL:generate_surrogate_key,generate_schema_nameformat_for macros that transform output:format_timestamp,format_currency
Use descriptive names for transformations:
cents_to_dollarsoverconvert_amountextract_domain_from_emailoverparse_emailcalculate_days_sinceoverdate_diff_helper
The name should tell you what the macro does without opening the file.
Use underscore prefix for internal helpers:
{% macro _build_join_condition(columns) %} {# Internal helper, not for direct use in models #}{% endmacro %}
{% macro generate_merge_statement(source, target, columns) %} {# Uses _build_join_condition internally #}{% endmacro %}The underscore signals “don’t call this directly.” Your team can use generate_merge_statement without understanding the internals.
One macro per file, filename matches macro name. When you need cents_to_dollars, you know it lives in cents_to_dollars.sql. No hunting. No scrolling through a 500-line utils.sql file. This rule comes from the Brooklyn Data style guide, and it’s one of the most practical conventions you can adopt.
Folder structure and organization
Macros need a logical home. This structure scales well:
macros/├── _macros.yml # Documentation for all macros├── generate_schema_name.sql # dbt override macros at root├── generate_alias_name.sql├── utils/│ ├── cents_to_dollars.sql│ ├── limit_data_in_dev.sql│ └── add_audit_columns.sql├── transformations/│ ├── finance/│ │ ├── calculate_mrr.sql│ │ └── prorate_amount.sql│ └── marketing/│ ├── attribution_weight.sql│ └── channel_grouping.sql└── tests/ └── test_row_count_match.sqlOverride macros live at the root. dbt’s generate_schema_name, generate_alias_name, and generate_database_name are special. Keeping them at macros/ makes them easy to find and signals their importance.
Utility macros get their own folder. Generic helpers like add_audit_columns or limit_data_in_dev belong in utils/. These are the macros any model might use.
Domain-specific macros group by business area. Finance has its own logic. Marketing has its own logic. Grouping by domain makes it easier to find relevant macros and to assign ownership.
Custom generic tests go in tests/. When you write a test macro that can apply to any column (like test_row_count_match), put it with the other tests.
The exact structure matters less than consistency. Pick a pattern, document it, and stick to it. For the broader project layout, see my guide to dbt project structure.
Documenting macros
Documentation in _macros.yml beats comments in SQL files. YAML documentation shows up in dbt docs, supports structured argument definitions, and creates a single source of truth.
version: 2
macros: - name: cents_to_dollars description: | Converts an integer cents column to a decimal dollars value.
## Usage ```sql {{ cents_to_dollars('amount_cents') }} {{ cents_to_dollars('amount_cents', scale=4) }} ```
## Notes - Assumes input is already an integer (no validation) - Uses ROUND for consistent decimal places arguments: - name: column_name type: string description: The column containing cents values - name: scale type: integer description: Number of decimal places. Defaults to 2.
- name: limit_data_in_dev description: | Adds a date filter in development environments to limit data volume. Returns empty string in production.
## Usage Add to WHERE clause with AND: ```sql WHERE 1=1 {{ limit_data_in_dev('created_at', 7) }} ``` arguments: - name: column_name type: string description: Date column to filter on. Defaults to 'created_at'. - name: days type: integer description: Number of days to include. Defaults to 3.The description includes a usage example because that’s what developers actually need. They want to copy-paste something that works, then modify it. Abstract argument descriptions don’t help as much as a concrete example.
Testing macro behavior
Macros need tests like any other code, and they fit naturally into your broader testing strategy. dbt gives you two approaches.
Integration tests create models that use your macros and verify the output:
-- models/tests/test_cents_to_dollars.sql{{ config(materialized='table', schema='dbt_tests') }}
WITH test_data AS ( SELECT 1000 AS amount_cents, 10.00 AS expected UNION ALL SELECT 999 AS amount_cents, 9.99 AS expected UNION ALL SELECT 1 AS amount_cents, 0.01 AS expected)
SELECT amount_cents, {{ cents_to_dollars('amount_cents') }} AS actual, expected, {{ cents_to_dollars('amount_cents') }} = expected AS passedFROM test_dataRun dbt build --select test_cents_to_dollars and check that all rows have passed = true.
Unit tests (dbt 1.8+) let you test SQL logic before materialization:
unit_tests: - name: test_cents_to_dollars_conversion model: mrt__finance__orders given: - input: ref('base__shopify__orders') rows: - {order_id: 1, amount_cents: 1000} - {order_id: 2, amount_cents: 50} expect: rows: - {order_id: 1, order__amount_dollars: 10.00} - {order_id: 2, order__amount_dollars: 0.50}This tests the macro in context, as part of a real model transformation. For a deeper dive, see the complete guide to dbt unit testing.
Check compiled output. When debugging, run dbt compile --select model_name and look at target/compiled/. The compiled SQL is the truth; if it looks wrong, the macro is wrong.
Handling breaking changes
Sometimes you need to change how a macro works, and the change will affect existing behavior.
The dbt-utils package demonstrates the pattern:
- Create a new macro with the improved behavior and a clear name
- Keep the old macro but add a deprecation warning using
exceptions.warn() - Document the migration path in your changelog or README
- Remove the old macro in a major version
{% macro cents_to_dollars_v2(column_name, scale=2, round_mode='half_up') %} {# New version with configurable rounding #}{% endmacro %}
{% macro cents_to_dollars(column_name, scale=2) %} {{ exceptions.warn("cents_to_dollars is deprecated. Use cents_to_dollars_v2 instead.") }} ROUND({{ column_name }} / 100.0, {{ scale }}){% endmacro %}For internal projects, you might not need this formality. If your team is small and you can update all usages in one PR, just change the macro and fix the call sites. Versioning matters more when macros are shared across teams or published as packages.
Communicate changes clearly. A Slack message, a PR description, a note in your project’s changelog, whatever works for your team. Surprise behavior changes cause more friction than the change itself.
Start with restraint
The best macro code is often no macro at all. Inline SQL, repeated a few times, gives you the information you need to build the right abstraction when it finally earns one. A focused macro with a clear name and good documentation pays for itself; a premature one just adds indirection.
Your goal is building a dbt project that your team can understand and maintain. Sometimes that means three similar lines of code instead of one macro call. That’s good judgment, not failure.