ServicesAboutNotesContact Get in touch →
EN FR
Note

Ad Pipeline Engineering Challenges

The operational challenges of maintaining advertising data pipelines — API rate limits, schema changes, attribution window normalization, currency handling, and privacy compliance

Planted
google adsdata engineeringetl

Keeping advertising data flowing reliably across multiple platforms, API versions, and regulatory regimes involves a distinct set of engineering challenges beyond initial extraction.

API Rate Limits

Every platform throttles differently. Building backoff logic that handles all of them requires platform-specific implementation.

Meta uses a rolling 1-hour window that scales with your number of active ads. More ads means more API budget, but the relationship isn’t linear and the exact formula isn’t fully documented. Hit the limit and you’re locked out until the window rolls forward.

Google has quotas per developer token tier. Basic access has tight limits; Standard access opens up significantly. The tiering means your pipeline’s maximum throughput depends on your API access level.

LinkedIn’s limits are unpublished. You have to check the Developer Portal’s Analytics tab and hope for the best. There’s no programmatic way to know how close you are to a limit until you hit it.

Every extraction pipeline requires exponential backoff with jitter, per-platform rate limit tracking, and alerting when approaching limits. Managed extraction tools handle this; custom pipelines must implement it directly.

Schema Changes

Advertising APIs break pipelines regularly, creating a constant maintenance cost.

Meta’s June 2025 attribution overhaul changed how off-Meta conversions are attributed. On-Meta events now attribute to impression time; off-Meta events attribute to conversion time, with no opt-out. Any pipeline comparing conversions across event types required updating; historical analysis crossing the June 2025 boundary requires accounting for the methodology change.

Google’s API v14 to v16 migration (June 2024) renamed columns from flat names to metrics_* and segments_* prefixes. Custom pipelines with hardcoded column names broke; Fivetran-managed pipelines were updated automatically.

If you’re running custom pipelines, you need to:

  • Track API changelogs for every platform you use
  • Plan migrations months in advance (not all changes are backward-compatible)
  • Maintain version-aware extraction logic that can handle transition periods
  • Test pipeline changes against production-like data before deploying

Attribution Window Normalization

Comparing “conversions” across platforms without normalizing attribution windows is comparing different things.

PlatformDefault Click WindowDefault View Window
Google Ads30 daysNone (search)
Meta7 days1 day
LinkedIn30 days7 days

A Google Ads “conversion” includes clicks from up to 30 days ago. A Meta “conversion” only counts clicks from the last 7 days. Comparing these numbers side by side without normalization is misleading — Google will appear to drive more conversions partly because it’s looking at a longer window.

The warehouse-based solution is to either:

  1. Normalize all platforms to the same attribution window (e.g., 7-day click only) by re-requesting data with matching parameters where the API supports it
  2. Use your own attribution model built from click-level data, bypassing platform attribution entirely
  3. Use blended ROAS (total revenue / total spend) which sidesteps the attribution disagreement altogether

Meta’s attribution data also updates retroactively over 3-7 day windows. Yesterday’s numbers will change tomorrow. Any pipeline must either reprocess a rolling lookback window or accept that recent data is provisional.

Currency and Timezone Handling

Currency and timezone handling produce discrepancies between warehouse totals and platform UIs when not explicitly addressed.

Google reports cost in micros (millionths of a currency unit). A cost value of 1500000 means $1.50. Miss the /1,000,000 conversion and your cross-platform spend totals are off by six orders of magnitude. This normalization belongs in the base layer of your dbt project.

Meta defaults to ad account currency. If you have ad accounts in multiple currencies, cross-account reporting requires explicit currency standardization.

Daily totals differ between warehouse and platform UIs due to timezone pre-aggregation differences. Platforms aggregate to their own timezone (often the ad account’s timezone), while your warehouse likely stores everything in UTC. A campaign that spends $100 between 10pm and 2am in the ad account’s timezone will show that spend split across two different days in UTC. This is a well-documented challenge — Fivetran’s dbt_ad_reporting DECISIONLOG explicitly discusses the variance this introduces.

Allow 1-3% variance between your warehouse totals and platform UI totals for timezone-related discrepancies. Anything larger signals a pipeline bug.

Privacy Regulation

Building compliant pipelines is a requirement. Key regulatory developments:

Eight new US state privacy laws took effect in 2025. The EU AI Act begins enforcement in August 2026. Google Consent Mode v2 is now an industry standard requirement for any site collecting data from EU users.

Server-side tracking has become the primary response to browser-level restrictions. 67% of B2B companies have already migrated to server-side tracking, recovering 20-40% of attribution data lost to browser restrictions (ad blockers, ITP, cookie restrictions). Companies implementing server-side tracking report 15-25% improvement in reported conversion rates within the first quarter.

For data engineers, privacy compliance means:

  • Ensuring consent signals propagate through your pipeline (not just collected but respected during transformation)
  • Handling data deletion requests that need to cascade through raw, intermediate, and mart layers
  • Auditing which user-level data your pipeline stores and for how long
  • Understanding that the attribution data you receive is already filtered by consent — your warehouse numbers reflect only consented users, not total activity

As browser restrictions tighten and third-party cookies disappear, server-side collection becomes the primary path to accurate attribution data, shifting implementation complexity from browser to server and from marketing teams to data engineering teams.