Managed ELT Tool Architectures: Fivetran, Airbyte, and dlt

Fivetran, Airbyte, and dlt are the three dominant data ingestion tools in 2026. All move data from sources into a warehouse, but they take different architectural approaches that determine when each is the appropriate choice.

Fivetran: Fully Managed ELT

Fivetran pioneered the fully-managed ELT model. You configure connectors through a web UI, and Fivetran handles everything: extraction, schema detection, incremental loading, and delivery to your warehouse. No infrastructure to manage, no code to write.

The architecture is opinionated by design. Fivetran’s connectors are built and maintained by their own engineers. When Salesforce changes its API, when Google Ads releases a new API version, when Meta adjusts attribution windows — Fivetran’s team updates their connector. You don’t need to think about it. That’s the product.

The trade-offs are real. You have no visibility into how the connector works, and limited ability to modify its behavior. You get Fivetran’s schema and Fivetran’s sync schedule. Want to extract a field they don’t include? Want to apply transformations at extraction time? That’s not how Fivetran works. You work within their model or you don’t use their connector.

The business model follows the architecture: you pay per monthly active row (MAR), which means your costs scale with data volume in ways that can be difficult to predict. The March 2025 pricing change eliminated bulk discounts and introduced per-connector MAR tiering, which changed the economics significantly for teams with many connections or high-volume marketing data.

Fivetran makes most sense when reliability and zero maintenance matter more than cost and control. Their SLA is 99.9% uptime with automatic upgrades. When you need syncing from multiple SaaS tools to just work, it delivers.

Airbyte: Open-Source Challenger

Airbyte started as an open-source alternative with a connector-based architecture that looks familiar to Fivetran users. You get a web UI, connector configurations, and scheduled syncs — but you can self-host it on Kubernetes or use their cloud offering.

The open-source model means two categories of connectors exist side by side: official connectors maintained by Airbyte’s engineering team, and community-contributed connectors maintained by whoever wrote them. That’s both the strength and the weakness.

Architecturally, Airbyte is more heavyweight than it initially appears. Each connector runs in its own Docker container. Airbyte itself requires a Kubernetes cluster with its own operational requirements — external PostgreSQL, object storage for logs, and the operational expertise to keep the cluster healthy. The UI abstracts this complexity, but the complexity is there.

The self-hosted version keeps data within your infrastructure, which matters for compliance-sensitive organizations. The cloud version trades operational control for managed infrastructure.

Airbyte introduced capacity-based pricing in February 2025, replacing credit-based pricing with a volume model: $15 per million rows for API sources, $10 per GB for database and file sources. This is generally more predictable than Fivetran’s MAR model for most workloads, but self-hosting adds infrastructure costs that aren’t always obvious upfront. See Airbyte Pricing and Self-Hosting Costs for the full picture.

dlt: Python-Native Library

dlt (data load tool) takes a different path entirely. It’s a Python library you pip install. No UI, no containers, no orchestration servers. You write Python scripts that define your pipelines, and those scripts run anywhere Python runs — your laptop, an Airflow DAG, a Cloud Function, or a GitHub Action.

The library handles the hard parts: schema inference, incremental loading, type coercion, nested JSON normalization. You handle the pipeline logic. A complete pipeline targeting BigQuery can be 20-30 lines of Python:

import dlt
from dlt.sources.rest_api import rest_api_source

source = rest_api_source({
    "client": {
        "base_url": "https://api.example.com/v1",
        "auth": {"type": "bearer", "token": dlt.secrets.value}
    },
    "resources": [
        {
            "name": "campaigns",
            "endpoint": {"path": "campaigns"},
            "write_disposition": "merge",
            "primary_key": "id"
        }
    ]
})

pipeline = dlt.pipeline(
    pipeline_name="campaigns",
    destination="bigquery",
    dataset_name="marketing"
)

load_info = pipeline.run(source)

The architectural difference is total. Fivetran and Airbyte are platforms with connectors. dlt is a library with a framework. When you use Fivetran, you’re configuring their system. When you use dlt, you’re writing code that uses their framework. That distinction determines what you can and can’t control.

dlt costs nothing beyond the infrastructure you choose to run it on. It’s Apache 2.0 licensed open source. Your costs are compute (often pennies for serverless Cloud Functions) and your destination warehouse. The deployment flexibility is real — the same pipeline runs locally for development and in production with no code changes.

The trade-off is that dlt requires Python proficiency and self-managed infrastructure. There’s no monitoring dashboard, no automatic alerting, no dedicated support team. If a source API changes, you handle the update. If a run fails, you debug it.

The Spectrum This Creates

The three tools form a spectrum:

	Fivetran	Airbyte	dlt
Setup	Web UI, minutes	Web UI + Kubernetes, days	pip install, minutes
Maintenance	Near zero	Low (cloud) / High (self-hosted)	You own it
Cost	MAR-based, expensive at scale	Capacity-based, moderate	Infrastructure only
Control	Low	Medium	Full
Python required	No	No	Yes
Monitoring	Built-in	Built-in (cloud)	External

No position on this spectrum is objectively better. Teams with non-technical data owners, enterprise compliance requirements, or zero tolerance for pipeline failures lean toward Fivetran. Python-proficient teams with budget constraints or custom source requirements lean toward dlt. Airbyte serves teams in the middle — cost-conscious but not ready to go fully code-first.

Most mature data teams end up with a mixture, using different tools for different sources based on volume, connector availability, and acceptable operational overhead.