How it works
How does a web scraping API work?
Send a URL, get back clean data. Under the hood, ClawEngine runs four steps in one API call: crawl the page, render the JavaScript, extract the fields you want, and return LLM-ready markdown or JSON. No proxies to rotate, no headless browsers to babysit.
Hit Extract to turn this page into clean, LLM-ready data.
robots.txt respected · public data only
The pipeline
From a URL to LLM-ready data in four steps
One request flows through the whole crawl. Here is exactly what happens between the API call and the clean data that comes back.
Fetch the page
You send a URL or a domain. ClawEngine fetches the page and, for a crawl, follows the links you allow. It reads robots.txt first and honors crawl-delay and site Terms of Service along the way.
Run the JavaScript
The page loads in a managed headless browser, so client-side content, dynamic tables and infinite scroll execute and settle. What you extract is the fully rendered page, not an empty shell.
Pull content and fields
The engine strips navigation, ads and footers, then extracts the main content plus any fields you defined with a schema. Selectors stay on our side, so brittle parsing is not your problem.
Return LLM-ready data
You get back clean markdown, structured JSON with title, links and metadata, or schema-typed records. The same shape every time, ready to chunk, embed and feed to a RAG pipeline or agent.
One call
All four steps in a single request
You do not chain a fetcher, a renderer and a parser yourself. One POST to the API runs the whole crawl and hands back clean data, with a compliance line on every result.
- Send a URL and an output format: markdown, JSON or schema-typed fields
- Crawling, proxy handling and headless rendering are fully managed
- Boilerplate is stripped, so the response is ready to embed
- Define a schema once and get typed records back from every page
- robots.txt respected, public data only, on every response
# one call does crawl, render and extract
curl https://api.clawengine.ai/v1/extract \
-H "Authorization: Bearer $KEY" \
-d '{"url":"example.com/docs","format":"markdown"}'
# response
{
"title": "Quickstart",
"markdown": "# Quickstart\n\nInstall...",
"links": ["/api", "/sdks"],
"rendered": true
}
Keep reading
Go deeper on the web scraping API
Structured extraction, JS rendering, crawl at scale, markdown and JSON, SDKs and webhooks.
OpenEndpoints, authentication and code samples in curl, Python and Node.
OpenPublic, permitted data only, robots.txt and Terms of Service respected by default.
OpenSend a URL, get clean data back
One API call crawls the page, renders the JavaScript and extracts LLM-ready markdown or JSON. Public, permitted data only.