Engineering

Rendering JavaScript Pages When Scraping: A Practical Guide

Rendering JavaScript pages when scraping: why the raw HTML is empty, how headless browsers fill it in, and how to get fully rendered content as clean markdown or JSON without running a browser fleet yourself.

By the ClawEngine team

June 2026 · 9 min read

Rendering JavaScript pages when scraping is the difference between empty and complete data

If you have ever scraped a modern website and gotten back a nearly empty page, you have met the JavaScript rendering problem. The server sends a thin HTML shell, and the real content, products, prices, articles, comments, gets built in the browser after the page loads. A plain HTTP fetch never runs that JavaScript, so it never sees the content. This guide explains why that happens and how to capture fully rendered pages cleanly, on public and permitted data only.

Why the raw HTML is empty

Single-page applications built with React, Vue, Angular and similar frameworks ship a minimal document and a bundle of JavaScript. When a browser loads the page, that script fetches data from an API, builds the DOM, and paints the content you actually see. A library like requests or a simple curl does none of this, it just downloads the initial response. So you get the skeleton: a root div, some script tags, and none of the data.

# raw fetch of a JS-rendered page: almost nothing useful
curl https://app.example.com/products
# <div id="root"></div> ... content loads later via JS

How headless browsers fill in the content

To scrape a JavaScript page you need to render it the way a real browser would. A headless browser, Chromium driven by Playwright or Puppeteer, loads the page, executes the scripts, waits for the data to arrive, and then exposes the finished DOM. From there you can read the rendered HTML and extract content that simply did not exist in the raw response. The catch is that running headless browsers at scale is real work: memory pressure, crashes, concurrency limits and timeouts all become your problem.

The hard parts of doing it yourself

Knowing when to read. Read too early and the content is still loading; too late and you waste time. You need smart waits on network idle or specific elements.
Lazy loading and infinite scroll. Some content only appears after scrolling. Capturing it means scripting realistic interaction.
Resource cost. Each rendered page spins up a browser context. Hundreds of concurrent renders demand serious infrastructure.
Stability. Browsers leak memory and crash. A production fleet needs supervision and restarts.

Render with one API call instead

A managed web scraping API runs the headless browser for you and returns fully rendered content as clean markdown or JSON. You ask it to render, it waits for the page to settle, strips the boilerplate, and hands back the data, no fleet to operate. This keeps your pipeline simple: one request in, clean content out.

# render the page, wait for content, return clean markdown
curl https://api.clawengine.ai/v1/extract \
  -H "Authorization: Bearer $KEY" \
  -d '{"url":"https://app.example.com/products","render":true,"format":"markdown"}'

Render only when you need to

Rendering is more expensive than a plain fetch, so use it deliberately. Many pages are server-rendered and need no browser at all; a quick check of the raw HTML tells you whether the content is already there. Reserve full rendering for genuine single-page apps and content that depends on client-side data. A good API lets you toggle rendering per request so you pay for it only where it earns its keep.

Stay polite and in bounds while rendering

Rendering does not change the rules. Crawl public and permitted pages only, respect robots.txt and Terms of Service, and honor crawl-delay so you do not overload a host. Rendering is a technique for reading content that a browser would legitimately display to any visitor, not a means to reach anything gated behind authentication or access controls. Keep it lawful and considerate.

Get complete pages without the browser fleet

JavaScript rendering is the single biggest reason naive scrapers return empty results, and running headless browsers yourself is a project in its own right. ClawEngine renders pages on demand and returns clean, complete markdown or JSON, on public and permitted data only, so you get the whole page without operating any infrastructure. Read how it works or learn about structured extraction for RAG.

See ClawEngine turn pages into clean data

Point ClawEngine at any public or permitted site and get back clean markdown, JSON, or typed structured fields in one call. Crawl at scale, render JavaScript, and feed your RAG pipelines and AI agents, robots.txt and Terms of Service respected.

Rendering JavaScript Pages When Scraping: A Practical Guide

Rendering JavaScript pages when scraping is the difference between empty and complete data

Why the raw HTML is empty

How headless browsers fill in the content

The hard parts of doing it yourself

Render with one API call instead

Render only when you need to

Stay polite and in bounds while rendering

Get complete pages without the browser fleet

More from the ClawEngine blog

How to Crawl a Website for LLM Training Data

Web Scraping API vs Building Your Own: An Honest Cost Breakdown

Turn any site into LLM-ready data