When to Build vs. Buy Your Data Pipelines in 2026

For years, the advice was simple: use Fivetran. Don’t waste engineering time building connectors when you could pay for reliability and move on to higher-value work. The math was straightforward: 50 to 100 hours per custom connector, 44% of engineer time lost to pipeline maintenance, versus a predictable monthly bill.

That advice is breaking down. Pricing changes, AI-assisted development, and mature open-source alternatives have fundamentally shifted the calculation. Managed solutions are convenient. The real question is whether that convenience still justifies the cost.

The Old Calculus Is Dead

The traditional “buy” argument rested on a Wakefield Research finding: data engineers spend 44% of their time building and maintaining pipelines, costing companies approximately $520,000 per year. Custom connector development takes 50-100 hours each. When Fivetran costs a few thousand dollars monthly and your engineers cost much more, buying wins.

Then came March 2025.

Fivetran shifted from account-wide to per-connector Monthly Active Row (MAR) pricing, eliminating bulk discounts that made the service affordable at scale. Reddit users report 4-8x cost increases. One user described going from $20/month to $2,000/month as their data volume grew. 35% of recent G2 reviewers now cite cost as their primary concern.

Marketing data is particularly brutal. MAR pricing charges you for every row that changes, and marketing data changes constantly. Ad metrics update retroactively, attribution windows shift, and campaign performance data refreshes daily. What looked like a reasonable expense becomes a runaway cost center.

The hidden cost of “buying” has become very visible.

What Actually Changed

Three developments have converged to flip the economics.

Fivetran’s pricing became unpredictable. The March 2025 changes weren’t a simple price increase. They fundamentally changed how costs scale. Per-connector MAR tiering means adding connectors no longer benefits from volume discounts. Teams with many connections see 70% cost increases. The minimum annual contract sits at $12,000, and that’s before your data actually flows.

AI development velocity has been measured. The productivity gains aren’t hypothetical anymore. A controlled experiment published on arXiv found developers completed an HTTP server implementation 55.8% faster with GitHub Copilot (1 hour 11 minutes versus 2 hours 41 minutes). A Microsoft and Accenture field experiment showed 12-21% more pull requests per week. GitHub’s own research found developers were 56% more likely to pass all unit tests with AI assistance.

dlt reached production maturity. The Python-native data loading library hit 3 million monthly downloads. In September 2024 alone, users created 50,000 custom connectors, a 20× increase from January. The library is now at version 1.19, past the 1.0 stability milestone, with production users including Artsy and PostHog.

These three factors compound. Building pipelines is cheaper because AI accelerates development. dlt provides the framework that makes AI-assisted pipeline development practical. And the baseline cost of managed solutions keeps climbing.

What AI Handles Well (And What It Doesn’t)

AI assistance isn’t magic. Understanding where it helps (and where it fails) determines whether the “build” option actually delivers.

AI excels at the tedious parts. Boilerplate code, API connector scaffolding, ETL structure, configuration files, SQL generation, and test creation. Pattern-based code where you’re implementing something similar to thousands of existing examples. A dlt user described completing an entire pipeline “in five minutes using the library’s documentation.” The library’s LLM-friendly documentation makes this workflow practical.

AI struggles with the parts that matter most. Complex business logic. Edge cases your API doesn’t document. System architecture decisions. Security (29.1% of AI-generated Python code contains security weaknesses according to one study). Performance optimization for high-volume scenarios. The judgment calls that distinguish working code from production-ready code.

The maintenance question is more nuanced. GitClear research projects code churn doubling in 2024 versus the pre-AI baseline. More code added, more code copy-pasted, less refactoring. AI often reproduces outdated patterns. But counterexamples exist: Amazon Q Developer reduced Java upgrade times from 50 developer-days to hours for Kyndryl, with estimated savings equivalent to 4,500 developer-years of work.

AI accelerates the buildout phase dramatically, but it doesn’t eliminate the need for experienced engineers to guide architecture and catch issues. For data pipeline development specifically (where patterns are well-established and dlt provides the framework), AI assistance delivers real gains.

The dlt + AI Workflow in Practice

dlt’s design philosophy maps well to AI-assisted development. It’s Python-native, meaning you write standard Python rather than YAML configurations or proprietary DSLs. It requires no backends: pip install and run. Schemas are inferred automatically. Incremental loading, the part that usually requires careful state management, is handled declaratively.

A typical marketing API pipeline targeting BigQuery takes about 30 lines:

import dlt
from dlt.sources.rest_api import rest_api_source

# Define a marketing API source with pagination
source = rest_api_source({
    "client": {
        "base_url": "https://api.marketing-platform.com/v1",
        "auth": {"type": "bearer", "token": dlt.secrets.value}
    },
    "resources": [
        {
            "name": "campaigns",
            "endpoint": {
                "path": "campaigns",
                "paginator": {"type": "offset", "limit": 100}
            },
            "write_disposition": "merge",
            "primary_key": "id"
        }
    ]
})

# Create pipeline targeting BigQuery
pipeline = dlt.pipeline(
    pipeline_name="marketing_data",
    destination="bigquery",
    dataset_name="marketing"
)

# Run it
load_info = pipeline.run(source)

This covers pagination, authentication, incremental loading via merge disposition, and BigQuery-specific optimizations. The dlt documentation is structured for LLM consumption, so AI assistants can help generate these configurations from API documentation.

For BigQuery specifically, dlt supports GCS staging for large loads, partitioning and clustering via bigquery_adapter(), and streaming inserts for low-latency scenarios. Data lands in datasets with tables named after resources. Nested JSON flattens automatically into child tables with configurable nesting depth.

The production results back this up. Artsy replaced a 10-year-old Ruby pipeline with dlt, reducing load times from 2.5 hours toward under 30 minutes. Some pipelines improved 98%, with cost savings of 96% or more. One user reported ETL cost reductions of 182x per month after dropping Fivetran for dlt.

When Buying Still Makes Sense

The argument isn’t that managed solutions are useless. For certain scenarios, they remain the right choice.

Compliance-heavy environments. SOC 2 Type II, HIPAA, GDPR compliance are built into Fivetran and Airbyte Enterprise. Building equivalent audit trails, access controls, and security infrastructure yourself takes significant effort. If your organization requires these certifications and lacks security engineering capacity, the premium may be justified.

Non-technical data teams. If your team lacks Python proficiency and the organization won’t invest in developing it, code-first tools aren’t practical. No-code platforms serve teams that need data without engineering capacity.

Extreme connector breadth. Fivetran offers 700+ connectors. If you need reliable integrations with dozens of SaaS tools you’d never build yourself, the coverage matters. Some community Airbyte connectors have reliability issues, and dlt’s verified connector list sits at 60+ plus its REST API builder. (For a detailed breakdown, see my Fivetran vs Airbyte vs dlt comparison.)

Time-to-value urgency. Bought solutions deploy in days. Built solutions, even with AI assistance, require development cycles measured in weeks for production readiness. If you need data flowing next week for a critical decision, managed wins.

The question is whether these scenarios describe your situation. For most data teams loading marketing data to BigQuery, they don’t.

The Hybrid Reality

The binary framing of “build versus buy” obscures the practical answer: do both, strategically.

Use Fivetran or Airbyte for stable, standard sources with infrequent schema changes. ERP systems, CRM platforms with well-documented APIs, sources where the connector maintenance burden is genuinely low. Accept the cost for the genuine reduction in operational overhead.

Build with dlt + AI for:

High-MAR sources where pricing scales painfully. Marketing platforms, ad networks, anything with granular row-level data that updates frequently.
Custom integrations your managed provider doesn’t support well or at all. dlt’s REST API framework makes these straightforward to build.
Sources where you need control over exactly what data gets extracted, at what frequency, with what transformations.

The $520,000 annual pipeline maintenance cost cited in traditional analyses becomes an investment when you’re saving $100,000+ in MAR fees. The 50-100 hours per connector estimate predates AI assistance that cuts that time by half or more for standard API patterns.

Making the Switch

If you’re paying significant Fivetran bills for marketing data, here’s a practical path forward.

Start with your highest-MAR connector. Identify the source that’s most expensive relative to its business value. Marketing platforms are usually the answer: Google Ads, Meta Ads, TikTok Ads. These have high update frequency, granular data, and well-documented APIs that dlt + AI can handle.

Use dlt’s BigQuery-specific features. GCS staging for large loads avoids BigQuery streaming insert costs. Partition by date for marketing data. Cluster on campaign or ad group IDs for query performance. These optimizations are simple configuration, not custom engineering.

Measure honestly. Track actual development time, not estimates. Include the time to understand the API, handle edge cases, and get to production. Compare against what you’re currently paying in MAR fees. The math usually works, but verify it for your specific situation.

Build the next connector faster. Each pipeline you build develops patterns and reusable components. Authentication handling, error management, deployment scripts. The second connector takes less time than the first. The fifth takes a fraction.

The framework for deciding: if your monthly MAR cost for a source exceeds what a senior engineer costs for a day of work, building probably wins. If you have Python proficiency on the team, AI assistant access, and tolerance for managing your own infrastructure, the economics have shifted in your favor.

The advice used to be simple. It isn’t anymore.

Managed ELT was the right answer when building took months and maintenance was a full-time job. Those assumptions no longer hold. AI assistance has compressed development timelines, dlt has standardized the patterns, and Fivetran’s pricing has made the cost of convenience painfully clear.

For Python-proficient data teams loading marketing data to BigQuery, dlt + AI assistance is now the faster, cheaper, more maintainable path. The 50-100 hours per connector? Call it 10-20 with AI assistance and a well-documented library. The maintenance burden? dlt handles schema evolution automatically. The $520,000 annual pipeline cost? That’s the budget you’re reclaiming.

The economics have shifted and the tools have matured. The only question is how much of your pipeline infrastructure you’re ready to own.