Window functions are prime candidates for unit testing because their behavior depends on three interacting factors: partitioning (which rows belong together), ordering (sequence within each partition), and framing (which rows are included in the calculation). Get any one wrong and your results will be subtly incorrect in ways that are hard to spot through code review.
The key to testing window functions well is designing test data that would produce different results if any of the three factors were wrong.
The Design Principles
Three principles for window function test data:
-
Use out-of-order source rows. If your window uses
ORDER BY event_timestamp, put the source rows in non-chronological order. This proves the ordering clause actually works. If someone accidentally changes the order toevent_id, the test fails. -
Include multiple partitions. If your window uses
PARTITION BY user_id, include at least two users. This proves partition isolation — User 1’s calculations shouldn’t bleed into User 2’s results. -
Choose values that make wrong results obvious. If you’re testing a cumulative sum, pick input values that produce a unique running total at each step. Values like 1, 1, 1 make it hard to distinguish a correct 3-row sum from an incorrect 2-row sum.
Complete Example: Multiple Window Functions
Consider this model that calculates session-level metrics using three different window functions:
-- models/intermediate/int__users_sessions.sqlselect user_id, event_timestamp, row_number() over (partition by user_id order by event_timestamp) as session_event_number, first_value(page_path) over (partition by user_id order by event_timestamp) as landing_page, sum(page_views) over ( partition by user_id order by event_timestamp rows unbounded preceding ) as cumulative_page_viewsfrom {{ ref('base__ga4__events') }}unit_tests: - name: test_int_users_sessions_window_functions model: int__users_sessions description: "Window functions should correctly partition and order by user" given: - input: ref('base__ga4__events') rows: # User 1's events (deliberately out of order) - {user_id: 1, event_timestamp: "2024-06-01 10:05:00", page_path: "/products", page_views: 2} - {user_id: 1, event_timestamp: "2024-06-01 10:00:00", page_path: "/home", page_views: 1} - {user_id: 1, event_timestamp: "2024-06-01 10:10:00", page_path: "/cart", page_views: 1} # User 2's events - {user_id: 2, event_timestamp: "2024-06-01 11:00:00", page_path: "/about", page_views: 3} expect: rows: # User 1 - ordered by timestamp, not source order - {user_id: 1, event_timestamp: "2024-06-01 10:00:00", session_event_number: 1, landing_page: "/home", cumulative_page_views: 1} - {user_id: 1, event_timestamp: "2024-06-01 10:05:00", session_event_number: 2, landing_page: "/home", cumulative_page_views: 3} - {user_id: 1, event_timestamp: "2024-06-01 10:10:00", session_event_number: 3, landing_page: "/home", cumulative_page_views: 4} # User 2 - {user_id: 2, event_timestamp: "2024-06-01 11:00:00", session_event_number: 1, landing_page: "/about", cumulative_page_views: 3}Let’s trace through why this test data is deliberately designed:
Out-of-order source rows. User 1’s events are listed as 10:05, 10:00, 10:10 in the given block. The expected output reorders them to 10:00, 10:05, 10:10. If the model accidentally ordered by something other than event_timestamp, the session_event_number values and landing_page would be wrong.
Multiple users with partition isolation. User 2 has one event. Their session_event_number should be 1, not 4. Their landing_page should be “/about”, not “/home”. If the PARTITION BY user_id clause were missing, User 2’s values would incorporate User 1’s data.
Distinctive cumulative values. The page_views values (1, 2, 1, 3) produce unique running totals: 1, 3, 4. If the frame clause were ROWS BETWEEN 1 PRECEDING AND CURRENT ROW instead of ROWS UNBOUNDED PRECEDING, the values would be 1, 3, 3 — the test would catch the difference.
This single test verifies ROW_NUMBER, FIRST_VALUE, and SUM with a window frame — all in one pass. If any of the window specifications are wrong, at least one expected value will be incorrect.
What Each Window Function Type Needs
ROW_NUMBER / RANK / DENSE_RANK: Test with ties. If two rows have the same timestamp, does ROW_NUMBER produce different values (non-deterministic unless you add a tiebreaker)? Does RANK produce the same value? Include tied rows in your test data if your model uses ranking.
FIRST_VALUE / LAST_VALUE: Test that the ordering is correct. LAST_VALUE in particular is tricky — its default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which means it returns the current row, not the actual last row. If your model uses LAST_VALUE, the test should verify whether you need ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
LAG / LEAD: Test the first and last rows in each partition. LAG on the first row returns NULL (or the default value). LEAD on the last row returns NULL. These boundary cases are where off-by-one errors appear.
SUM / AVG with frame clauses: The frame specification is the biggest source of bugs. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW gives a running total. ROWS BETWEEN 1 PRECEDING AND CURRENT ROW gives a rolling two-row sum. The difference matters, and it’s hard to catch in code review. Choose test values that produce different results for different frame widths.
Common Pitfalls
Testing with already-sorted data. If your source rows happen to be in the right order, the test passes even if the ORDER BY clause is missing. Always put source rows in wrong order deliberately.
Single-row partitions only. If every partition has exactly one row, window functions trivially produce correct results. You need at least one partition with multiple rows to exercise the logic.
Symmetric values. If all page_views values are 1, then cumulative_page_views at row 2 is 2 regardless of whether the frame includes 1 or 2 preceding rows. Use asymmetric values so each possible frame width produces a different result.