Meta Ads Pipeline Maintenance

Meta’s API changes frequently enough that pipeline maintenance is a recurring cost. The three practices that keep pipelines running reliably: token expiry tracking, spend reconciliation, and a circuit breaker that stops extraction after consecutive failures rather than exhausting rate limits on a broken endpoint.

Token Expiry Monitoring

The most common silent failure in Meta pipelines is token expiration. When a long-lived user token or personal access token expires, the API starts returning 401 errors. Unless you’re monitoring for auth errors specifically, the pipeline stops and nothing alerts you. By the time someone notices (usually when a dashboard goes stale), you’ve lost days of data.

Use System User Access Tokens. Unlike user tokens that expire after ~60 days, System User tokens don’t expire. Set them up once in Business Settings under System Users, and they remain valid until you revoke them. This eliminates the category of silent expiration failures entirely for most pipelines.

If you’re stuck with user tokens (common when using BigQuery Data Transfer Service for Meta, which requires a user token), build expiry monitoring:

import datetime
from facebook_business.api import FacebookAdsApi
from facebook_business.adobjects.user import User

def check_token_expiry(access_token: str) -> datetime.datetime:
    api = FacebookAdsApi.init(access_token=access_token)
    me = User(fbid='me', api=api)
    fields = ['token_for_business']
    result = me.api_get(fields=fields)
    # Token debug endpoint gives expiry
    debug = api.call(
        'GET',
        '/debug_token',
        {'input_token': access_token, 'access_token': access_token}
    ).json()
    expires_at = debug['data'].get('expires_at', 0)
    return datetime.datetime.fromtimestamp(expires_at) if expires_at else None

Alert one week before the 60-day mark. One week gives you enough time to refresh without an emergency. A day’s notice does not.

Spend Reconciliation

The second operational practice that pays for itself quickly: weekly comparison of warehouse spend totals against Ads Manager.

Pull your warehouse total for the trailing 7 days:

SELECT
    SUM(spend) AS warehouse_spend_7d,
    CURRENT_DATE() AS report_date
FROM `your_project.meta_ads.ad_insights_daily`
WHERE date_start >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)

Compare this against the Ads Manager interface (or the API) for the same period. A variance under 5% is normal — timezone differences, attribution window rounding, and modeling uncertainty all contribute. A variance over 10% on a consistent basis means something is wrong with your pipeline.

Common causes of large variances:

Lookback window too short: you’re not re-pulling data that Meta has retroactively updated
Attribution window mismatch: your API call and Ads Manager use different windows
Missing ad accounts: your pipeline doesn’t have access to all accounts contributing to Ads Manager totals
Currency normalization bug: spend in non-primary currencies isn’t being converted

Automate this check. Run it as a dbt test, a scheduled query, or a simple script with an alert. The value is in the frequency and consistency — catching variance early means fixing a small problem before it becomes a large data quality incident.

-- dbt singular test: assert spend reconciliation within 10%
-- tests/assert_meta_spend_reconciles.sql
WITH warehouse_spend AS (
    SELECT SUM(spend) AS total_spend
    FROM {{ ref('mrt__marketing__meta__campaign_performance') }}
    WHERE date_day >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
),
-- Compare against a manually maintained reference or API check
-- This pattern flags for manual review when warehouse looks off
-- Adjust the threshold based on acceptable variance for your setup
meta_reference AS (
    SELECT SUM(spend) AS total_spend
    FROM {{ source('meta_ads', 'spend_reference') }}
    WHERE date_day >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
)
SELECT
    warehouse_spend.total_spend AS warehouse_total,
    meta_reference.total_spend AS reference_total,
    ABS(warehouse_spend.total_spend - meta_reference.total_spend) /
        NULLIF(meta_reference.total_spend, 0) AS variance_pct
FROM warehouse_spend
CROSS JOIN meta_reference
WHERE ABS(warehouse_spend.total_spend - meta_reference.total_spend) /
    NULLIF(meta_reference.total_spend, 0) > 0.10

API Version Lifecycle Management

Meta releases roughly two major API versions per year, each supported for approximately two years. That means you need to upgrade your pipeline’s API version at least once a year to stay on a supported version.

The process isn’t just changing a version number in a URL. New API versions sometimes rename fields, change data types, or restructure responses. The canonical place to find these changes is Meta’s API changelog.

Build a version tracking discipline:

Note the current version when you deploy your pipeline
Subscribe to Meta’s developer changelog announcements
Plan upgrades at least six months before the version’s end-of-life date
Test on the new version in a staging environment before cutting over

If you’re using a managed extraction tool (Fivetran, Airbyte), they handle API version upgrades for you — this is a large part of what you’re paying for. If you’re running custom extraction, version management is your responsibility.

The failure mode to avoid: discovering a version is deprecated because your API calls start returning errors in production. At that point you’re doing an emergency migration under pressure. The same work done proactively, six months earlier, is a scheduled maintenance task.

Circuit Breaker Pattern

A pipeline that hits errors should stop, not retry forever. Every failed API call burns rate limit budget and delays the actual problem diagnosis.

The circuit breaker pattern: track consecutive failures and pause extraction after a threshold is hit.

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Stopped after failures
    HALF_OPEN = "half_open"  # Testing recovery

class MetaApiCircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=300):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = CircuitState.CLOSED

    def call(self, fn, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker open — pausing extraction")

        try:
            result = fn(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise

    def _on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

After 5 consecutive failures, the circuit opens and extraction pauses for 5 minutes before trying again. This prevents a broken endpoint from exhausting your rate limit budget while also giving the service time to recover.

The circuit breaker is most valuable in combination with proper alerting. When the circuit opens, fire an alert so a human knows something needs attention. Don’t silently pause — the whole point is to surface problems, not bury them.

Raw Data Storage

Store raw API responses before transformation. This sounds expensive but isn’t — raw JSON compressed in BigQuery storage is cheap. The value is operational: when something goes wrong in your transformation logic, or when Meta changes a field you hadn’t anticipated, you can debug without re-pulling from the API.

-- Raw storage table structure
CREATE TABLE raw_meta_ads.api_responses (
  ingestion_timestamp TIMESTAMP,
  ad_account_id STRING,
  report_date DATE,
  api_version STRING,
  response_json JSON  -- or STRING if you prefer
)
PARTITION BY report_date
OPTIONS (
  require_partition_filter = false
);

Partition by report date so queries against specific date ranges don’t scan the entire table. Keep raw responses for 90 days — enough to investigate any issue that a normal lookback window might miss.

When you’re investigating a discrepancy, the raw table lets you inspect exactly what the API returned before your transformation touched it. This eliminates an entire category of debugging questions (“is the bug in extraction or transformation?”).

What to Monitor

A minimal monitoring setup for a Meta pipeline:

Check	Frequency	Alert Threshold
Token validity (if using user tokens)	Daily	Expiry within 7 days
Spend reconciliation vs. Ads Manager	Weekly	>10% variance
Row count for yesterday	Daily	>20% drop from trailing 7-day average
API error rate	Per run	>5 consecutive failures
API version end-of-life	Monthly	<6 months to EOL

The row count check is a simple proxy for “is data still flowing.” A 20% drop from the trailing average is often the first signal of a silent failure — the pipeline is running but returning less data than expected, which means the API is misbehaving, an account lost access, or a rate limit is being hit silently.

These five checks cover the vast majority of real production failures. More sophisticated monitoring is possible — schema drift detection, coverage checks across accounts, completeness validation by campaign — but start here and add complexity only when the basics are stable.