n8n is an open-source workflow automation tool that lets you connect different services through visual workflows — a self-hosted alternative to Zapier or Make, but with more flexibility and no per-task pricing. You define workflows as nodes connected by edges, where each node performs a specific operation: making HTTP requests, transforming data with JavaScript, calling APIs.
This hub documents a 14-node n8n workflow that turns RSS feeds into a personal knowledge base in Notion. The workflow runs daily at 5 AM, requires no manual intervention, and costs about $2/month in LLM API calls.
The problem it solves
RSS feeds are one of the best ways to aggregate content from blogs, newsletters, and news sites — but raw feeds come with friction:
- Content fragmentation: you need to check multiple readers across multiple apps
- HTML noise: navigation menus, cookie banners, newsletter CTAs, footer content all end up mixed in with the article
- No deduplication: the same article appears every time you refresh the feed
- Manual effort: saving interesting articles requires copy-pasting
The workflow handles all of this automatically. Fetch, deduplicate, extract, clean, store.
Architecture overview
The 14 nodes organize into four stages:
Trigger → Fetch RSS Sources (Notion DB) → Get RSS Feed (HTTP) → Parse XML → Split Items ↓Filter Existing Articles (Merge/anti-join) → Create Notion Pages → Extract Page IDs ↓Fetch Article Content (Jina AI) → Build ChatGPT Prompt → Call OpenAI API ↓Markdown to Notion Blocks (JS) → Append Blocks to Notion PageEach stage is documented in a dedicated note:
- RSS Feed Deduplication in n8n — the Merge node anti-join pattern that prevents duplicate Notion pages
- LLM as Content Cleaner — using GPT-4o-mini to strip HTML noise and normalize markdown
- Markdown-to-Notion Blocks Parser — the custom JavaScript parser that converts markdown to Notion’s block API format
Stage 1: Triggering
Two trigger options run in parallel in the workflow:
Manual trigger — When clicking 'Execute workflow'. Useful for testing, initial bulk imports, or forcing a run outside the schedule.
Scheduled trigger — runs daily at 5:00 AM using a cron-like syntax (triggerAtHour: 5). Early morning timing means yesterday’s content is captured before the workday starts, and API calls happen during off-peak hours.
Stage 2: RSS source management
RSS source URLs aren’t hardcoded in the workflow — they live in a Notion database. The Get many sources from monitoring node queries that database filtering for type = "RSS". This means you can add or remove feeds without touching the workflow itself.
Each source record needs an rss_link property. The same database could hold other content types (podcasts, newsletters) using different type values — the filter keeps them separate.
The Fetch RSS Feed node makes an HTTP GET to each URL with explicit headers:
{ "User-Agent": "Mozilla/5.0", "Accept": "application/rss+xml, application/xml;q=0.9, */*;q=0.8"}Some servers block headless requests without a User-Agent. The response format is text (raw XML) because the next node handles parsing.
The XML to JSON node converts the RSS structure into JSON. Standard RSS feeds look like this before parsing:
<rss> <channel> <item> <title>Article Title</title> <link>https://example.com/article</link> <pubDate>Thu, 14 Nov 2024 10:00:00 GMT</pubDate> <dc:creator>Author Name</dc:creator> </item> </channel></rss>The Split Out RSS Feed node takes the array at rss.channel.item and creates one output item per article, so each one can be processed independently.
Deduplication happens next — see RSS Feed Deduplication in n8n for how the Merge node anti-join works.
Stage 3: Content extraction and cleaning
For each new article that passes the deduplication filter, the workflow:
- Creates a placeholder Notion page with metadata (title, author, pubDate, source, URL, type: “RSS”)
- Captures the new page ID
- Fetches the article’s full content via Jina AI
- Cleans the content with ChatGPT
- Converts cleaned markdown to Notion blocks
- Appends the blocks to the placeholder page
Jina AI for content extraction
Jina AI’s Reader API is designed specifically to extract clean content from web pages. You pass it a URL, it returns the article as markdown — stripping navigation, ads, and site chrome automatically.
The node is configured with retry logic:
retryOnFail: truemaxTries: 5waitBetweenTries: 5000 # 5 seconds between retriesWeb scraping is inherently flaky. Timeouts, rate limits, and transient errors are common. Five retries with backoff handles most cases without manual intervention.
Jina returns markdown, which is the ideal intermediate format: structured enough to parse programmatically, human-readable, and easy to pass to an LLM for cleaning. See LLM as Content Cleaner for what happens next.
Notion page creation
The Create a database page node creates a page with metadata only — no content yet:
| Property | Value |
|---|---|
| Title | {{ $json.title }} from RSS item |
| Author | Falls back through dc:creator, author, or “no author” |
| Published At | {{ $json.pubDate }} |
| RSS feed name | Reference to the source record |
| content_url | {{ $json.link }} |
| Type | ”RSS” (for filtering later) |
| Icon | 📰 |
The page ID from Notion’s response is captured in a Set notion_page_id node. This ID is used in the final step to append content.
Stage 4: Writing to Notion
After the markdown is cleaned by ChatGPT, it needs to be converted from markdown to Notion’s block format before it can be appended. Notion doesn’t accept raw markdown — it requires a JSON array of “block” objects. See Markdown-to-Notion Blocks Parser for how this works.
The final HTTP Request node makes a PATCH to Notion’s append endpoint:
PATCH https://api.notion.com/v1/blocks/{{ notion_page_id }}/childrenwith the array of blocks as the body. This node also has retry logic for transient API failures.
The result: a Notion page with metadata properties and full article content, cleanly formatted.
Cost model
Running this workflow daily at ~20 articles/day:
| Service | Cost |
|---|---|
| n8n (self-hosted on railway.app) | $5/month |
| Jina AI Reader | Free tier (1M tokens) handles ~60-150K tokens/day |
| OpenAI GPT-4o-mini | ~$2/month at 20 articles/day |
| Notion | Free tier is fine for personal use |
| Total | ~$7/month |
Most paid RSS readers cost $5–15/month without providing a searchable Notion database with cleaned content.
The LLM cost is low because gpt-4o-mini is cheap and the task is well-defined (clean this markdown). Input tokens (system prompt + article) run 3-8K per article; output is 2-5K. At 600 articles/month that’s roughly $0.54 input + $1.44 output.
Extending the pattern
The same architecture handles more than RSS. The core pattern — fetch from a source, deduplicate, extract content, clean with LLM, store structured — applies to:
- GitHub repository updates (new commits, releases, issues)
- Slack messages with specific keywords
- Jira ticket descriptions
- Documentation site changes
- Industry report releases
The Notion source database approach (filtering by type = "RSS") already anticipates this: you could add rows with type = "github" or type = "slack" and route them through different sub-workflows.
Related
- RSS Feed Deduplication in n8n
- LLM as Content Cleaner
- Markdown-to-Notion Blocks Parser
- Source article: n8n-rss-to-notion