ServicesAboutNotesContact Get in touch →
EN FR
Note

n8n RSS-to-Notion Workflow

How to build an automated RSS reader that fetches, cleans, and stores articles in Notion using n8n, Jina AI, and ChatGPT.

Planted
automationai

n8n is an open-source workflow automation tool that lets you connect different services through visual workflows — a self-hosted alternative to Zapier or Make, but with more flexibility and no per-task pricing. You define workflows as nodes connected by edges, where each node performs a specific operation: making HTTP requests, transforming data with JavaScript, calling APIs.

This hub documents a 14-node n8n workflow that turns RSS feeds into a personal knowledge base in Notion. The workflow runs daily at 5 AM, requires no manual intervention, and costs about $2/month in LLM API calls.

The problem it solves

RSS feeds are one of the best ways to aggregate content from blogs, newsletters, and news sites — but raw feeds come with friction:

  • Content fragmentation: you need to check multiple readers across multiple apps
  • HTML noise: navigation menus, cookie banners, newsletter CTAs, footer content all end up mixed in with the article
  • No deduplication: the same article appears every time you refresh the feed
  • Manual effort: saving interesting articles requires copy-pasting

The workflow handles all of this automatically. Fetch, deduplicate, extract, clean, store.

Architecture overview

The 14 nodes organize into four stages:

Trigger → Fetch RSS Sources (Notion DB) → Get RSS Feed (HTTP) → Parse XML → Split Items
Filter Existing Articles (Merge/anti-join) → Create Notion Pages → Extract Page IDs
Fetch Article Content (Jina AI) → Build ChatGPT Prompt → Call OpenAI API
Markdown to Notion Blocks (JS) → Append Blocks to Notion Page

Each stage is documented in a dedicated note:

Stage 1: Triggering

Two trigger options run in parallel in the workflow:

Manual triggerWhen clicking 'Execute workflow'. Useful for testing, initial bulk imports, or forcing a run outside the schedule.

Scheduled trigger — runs daily at 5:00 AM using a cron-like syntax (triggerAtHour: 5). Early morning timing means yesterday’s content is captured before the workday starts, and API calls happen during off-peak hours.

Stage 2: RSS source management

RSS source URLs aren’t hardcoded in the workflow — they live in a Notion database. The Get many sources from monitoring node queries that database filtering for type = "RSS". This means you can add or remove feeds without touching the workflow itself.

Each source record needs an rss_link property. The same database could hold other content types (podcasts, newsletters) using different type values — the filter keeps them separate.

The Fetch RSS Feed node makes an HTTP GET to each URL with explicit headers:

{
"User-Agent": "Mozilla/5.0",
"Accept": "application/rss+xml, application/xml;q=0.9, */*;q=0.8"
}

Some servers block headless requests without a User-Agent. The response format is text (raw XML) because the next node handles parsing.

The XML to JSON node converts the RSS structure into JSON. Standard RSS feeds look like this before parsing:

<rss>
<channel>
<item>
<title>Article Title</title>
<link>https://example.com/article</link>
<pubDate>Thu, 14 Nov 2024 10:00:00 GMT</pubDate>
<dc:creator>Author Name</dc:creator>
</item>
</channel>
</rss>

The Split Out RSS Feed node takes the array at rss.channel.item and creates one output item per article, so each one can be processed independently.

Deduplication happens next — see RSS Feed Deduplication in n8n for how the Merge node anti-join works.

Stage 3: Content extraction and cleaning

For each new article that passes the deduplication filter, the workflow:

  1. Creates a placeholder Notion page with metadata (title, author, pubDate, source, URL, type: “RSS”)
  2. Captures the new page ID
  3. Fetches the article’s full content via Jina AI
  4. Cleans the content with ChatGPT
  5. Converts cleaned markdown to Notion blocks
  6. Appends the blocks to the placeholder page

Jina AI for content extraction

Jina AI’s Reader API is designed specifically to extract clean content from web pages. You pass it a URL, it returns the article as markdown — stripping navigation, ads, and site chrome automatically.

The node is configured with retry logic:

retryOnFail: true
maxTries: 5
waitBetweenTries: 5000 # 5 seconds between retries

Web scraping is inherently flaky. Timeouts, rate limits, and transient errors are common. Five retries with backoff handles most cases without manual intervention.

Jina returns markdown, which is the ideal intermediate format: structured enough to parse programmatically, human-readable, and easy to pass to an LLM for cleaning. See LLM as Content Cleaner for what happens next.

Notion page creation

The Create a database page node creates a page with metadata only — no content yet:

PropertyValue
Title{{ $json.title }} from RSS item
AuthorFalls back through dc:creator, author, or “no author”
Published At{{ $json.pubDate }}
RSS feed nameReference to the source record
content_url{{ $json.link }}
Type”RSS” (for filtering later)
Icon📰

The page ID from Notion’s response is captured in a Set notion_page_id node. This ID is used in the final step to append content.

Stage 4: Writing to Notion

After the markdown is cleaned by ChatGPT, it needs to be converted from markdown to Notion’s block format before it can be appended. Notion doesn’t accept raw markdown — it requires a JSON array of “block” objects. See Markdown-to-Notion Blocks Parser for how this works.

The final HTTP Request node makes a PATCH to Notion’s append endpoint:

PATCH https://api.notion.com/v1/blocks/{{ notion_page_id }}/children

with the array of blocks as the body. This node also has retry logic for transient API failures.

The result: a Notion page with metadata properties and full article content, cleanly formatted.

Cost model

Running this workflow daily at ~20 articles/day:

ServiceCost
n8n (self-hosted on railway.app)$5/month
Jina AI ReaderFree tier (1M tokens) handles ~60-150K tokens/day
OpenAI GPT-4o-mini~$2/month at 20 articles/day
NotionFree tier is fine for personal use
Total~$7/month

Most paid RSS readers cost $5–15/month without providing a searchable Notion database with cleaned content.

The LLM cost is low because gpt-4o-mini is cheap and the task is well-defined (clean this markdown). Input tokens (system prompt + article) run 3-8K per article; output is 2-5K. At 600 articles/month that’s roughly $0.54 input + $1.44 output.

Extending the pattern

The same architecture handles more than RSS. The core pattern — fetch from a source, deduplicate, extract content, clean with LLM, store structured — applies to:

  • GitHub repository updates (new commits, releases, issues)
  • Slack messages with specific keywords
  • Jira ticket descriptions
  • Documentation site changes
  • Industry report releases

The Notion source database approach (filtering by type = "RSS") already anticipates this: you could add rows with type = "github" or type = "slack" and route them through different sub-workflows.