Markdown-to-Notion Blocks Parser

Notion doesn’t accept raw markdown. If you want to programmatically add content to a Notion page, you need to convert that content into an array of “block” objects in Notion’s specific JSON format. There’s no native markdown import endpoint.

This creates a real problem for any workflow that generates or extracts markdown and wants to store it in Notion. The n8n RSS-to-Notion Workflow runs into this after the LLM cleaning step: it has clean markdown, but Notion won’t accept it directly.

The solution is a custom JavaScript parser — about 400 lines of code — that converts markdown to Notion blocks. This note explains how Notion’s block format works, what the parser does, and what edge cases it has to handle.

How Notion’s block API works

Every piece of content in Notion is a “block.” A block has a type (paragraph, heading_1, bulleted_list_item, etc.) and type-specific content. Most block types have a rich_text array that contains the actual text content as formatted segments.

A minimal paragraph block looks like this:

{
  "type": "paragraph",
  "paragraph": {
    "rich_text": [
      {
        "type": "text",
        "text": {
          "content": "Hello world"
        },
        "annotations": {
          "bold": false,
          "italic": false,
          "strikethrough": false,
          "underline": false,
          "code": false,
          "color": "default"
        }
      }
    ]
  }
}

A paragraph with a mix of bold text and a link requires multiple rich_text segments:

{
  "type": "paragraph",
  "paragraph": {
    "rich_text": [
      {
        "type": "text",
        "text": { "content": "See the " },
        "annotations": { "bold": false, "italic": false, ... }
      },
      {
        "type": "text",
        "text": {
          "content": "Notion API docs",
          "link": { "url": "https://developers.notion.com" }
        },
        "annotations": { "bold": true, "italic": false, ... }
      }
    ]
  }
}

Every formatting change — bold on, bold off, link start, link end — requires a new segment. A heavily formatted paragraph can produce a dozen rich_text objects.

Block types the parser handles

The markdown-to-Notion parser needs to cover every block-level element that might appear in a cleaned article:

Markdown	Notion block type
`# Heading`	`heading_1`
`## Heading`	`heading_2`
`### Heading`	`heading_3`
`#### H4+`	`paragraph` with bold annotation
Regular paragraph	`paragraph`
`- List item`	`bulleted_list_item`
`1. List item`	`numbered_list_item`
`> Quoted text`	`quote`
```code block```	`code`
`![alt](url)`	`image`
`---` or `***`	`divider`

Notion only supports three heading levels. Articles sometimes have H4-H6 headings (especially technical content with deeply nested sections). The parser converts these to bold paragraphs — it’s not perfect, but it preserves the content and doesn’t break the API call.

Inline formatting parsing

Within each block, the parser needs to identify inline formatting spans and split the text accordingly. The inline patterns to detect:

// Bold: **text** or __text__
const boldPattern = /\*\*(.*?)\*\*|__(.*?)__/g;

// Italic: *text* or _text_
const italicPattern = /\*(.*?)\*|_(.*?)_/g;

// Inline code: `code`
const codePattern = /`([^`]+)`/g;

// Link: [text](url)
const linkPattern = /\[([^\]]+)\]\(([^)]+)\)/g;

The parser walks through each line, matches these patterns, and builds the rich_text array by slicing the string at each formatting boundary. Text between formatted spans becomes a plain segment; formatted spans become segments with the appropriate annotations set.

Nested formatting (bold italic, bold link) requires combining annotations from multiple patterns. A segment inside both ** and _ markers gets bold: true, italic: true.

The 2000-character limit

Notion has a hard limit of 2000 characters per rich_text element. This matters for long paragraphs — technical articles sometimes have dense paragraphs that exceed this.

The parser includes a splitLongBlocks function that runs after the initial conversion:

function splitLongBlocks(blocks) {
  const result = [];
  for (const block of blocks) {
    if (!block[block.type]?.rich_text) {
      result.push(block);
      continue;
    }

    // Calculate total character count across all segments
    const totalChars = block[block.type].rich_text
      .reduce((sum, segment) => sum + segment.text.content.length, 0);

    if (totalChars <= 2000) {
      result.push(block);
      continue;
    }

    // Split into chunks, respecting segment boundaries where possible
    // Falls back to splitting mid-segment for code blocks
    result.push(...splitBlock(block, 2000));
  }
  return result;
}

The split logic tries to break at segment boundaries first (between rich_text elements). If a single segment is itself longer than 2000 characters (common in code blocks), it splits the content string at 2000 characters and creates a new segment.

The 100-block request limit

Notion’s “append block children” API accepts a maximum of 100 blocks per request. Longer articles can easily produce more blocks than that.

The workflow handles this by taking the first 100 blocks. This is a pragmatic tradeoff: most articles fit within 100 blocks, and the content that gets cut is usually the tail end of longer pieces — often conclusion sections and “related” links that add less value anyway.

A more sophisticated approach would chunk into multiple API calls: blocks 1-100, then 101-200, etc. The current implementation keeps it simple and that simplicity is defensible for a personal knowledge base.

Escaped characters and edge cases

Markdown scraped from the web is messier than markdown you write yourself. A few specific edge cases the parser handles:

Escaped underscores: \_like\_this\_ appears in scraped content when the original HTML had underscores that a markdown converter escaped defensively. The parser unescapes these before processing.

Empty blockquotes: A > line with no content produces an empty quote block in Notion that looks awkward. Empty quotes are skipped.

Linked images: [![alt](image-url)](link-url) — an image wrapped in a link — appears occasionally in scraped content. The parser unwraps these to just the image block, since Notion doesn’t support linked images anyway.

Stray formatting characters: Lines containing only * * * or --- in various combinations from bad markdown converters. These get converted to divider blocks rather than triggering parsing errors.

Output structure

The parser returns not just the blocks but debugging metadata:

{
  children: blocksToSend,    // Array of Notion block objects (max 100)
  meta: {
    model: "gpt-4o-mini",    // Passed through from the LLM step
    created: "...",           // Timestamp
    total_blocks: 45,         // Blocks before splitting long ones
    split_blocks: 47,         // Blocks after splitting (some long ones became 2)
    blocks_sent: 47           // What actually goes to Notion
  }
}

The metadata is useful when debugging why an article looks wrong in Notion: you can see whether the issue was too many blocks getting cut (high total_blocks) or whether splitting happened unexpectedly (large gap between total_blocks and split_blocks).

Why not use an existing library?

Libraries exist for markdown-to-Notion conversion (like @tryfabric/martian for Node.js). The custom parser exists because:

The workflow runs in n8n’s JavaScript sandbox, which has a limited set of available modules
A custom parser can handle the specific edge cases in scraped content that a general library might not
The output format can be tuned to match what this specific workflow needs

If you’re building a standalone application outside n8n, using an established library is probably the better choice. If you’re in an n8n Code node, you’ll likely need something like this.

n8n RSS-to-Notion Workflow — the full workflow this parser is part of
LLM as Content Cleaner — the step that produces the clean markdown this parser consumes