API reference

API documentation for the web scraping API

A single REST API to crawl a page, render its JavaScript and extract clean markdown or typed JSON. Authenticate with a bearer key, call one of three endpoints, and get LLM-ready data back. Here is everything you need to make your first request.

Jump to endpoints

Introduction

The ClawEngine API is organized around predictable, resource-oriented REST. Every request is sent over HTTPS, accepts and returns JSON, and authenticates with a bearer key. The base URL for all endpoints is:

https://api.clawengine.ai/v1

Authentication

Authenticate every request by passing your secret key in the Authorization header as a bearer token. Keep your key secret, on a server and out of client-side code. You get a key when you create an account.

Authorization header BEARER

Authorization: Bearer $CLAWENGINE_API_KEY

Need a key? ClawEngine works on public, permitted data only, and respects robots.txt and site Terms of Service on every request.

POST /v1/extract

Extract a single URL. ClawEngine fetches the page, renders its JavaScript, strips boilerplate, and returns clean markdown, structured JSON, or fields typed to a schema you provide.

Request

curl https://api.clawengine.ai/v1/extract \
  -H "Authorization: Bearer $CLAWENGINE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products/atlas",
    "format": "json"
  }'

import requests

r = requests.post(
    "https://api.clawengine.ai/v1/extract",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"url": "https://example.com/products/atlas",
          "format": "json"},
)
data = r.json()

const res = await fetch("https://api.clawengine.ai/v1/extract", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.CLAWENGINE_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ url: "https://example.com/products/atlas", format: "json" }),
});
const data = await res.json();

Response · 200 OK

{
  "url": "https://example.com/products/atlas",
  "title": "Atlas Field Notebook",
  "markdown": "# Atlas Field Notebook\n\nDurable...",
  "data": { "name": "Atlas Field Notebook", "price": 24.00 },
  "links": [ "/products", "/cart" ],
  "metadata": { "rendered": true, "robotsTxt": "respected" }
}

Body parameters

Field	Type	Description
url	string	The public page to extract. Required.
format	string	One of `markdown`, `json` or `structured`. Defaults to `markdown`.
schema	object	Optional. A field-to-type map for structured extraction.
render	boolean	Render JavaScript before extraction. Defaults to `true`.

POST /v1/crawl

Crawl a whole site. ClawEngine starts at the URL you give, follows the links you allow, and extracts every page. Crawls run asynchronously: this endpoint returns a crawl id you poll for results, or you can register a webhook to receive pages as they complete.

Request

curl https://api.clawengine.ai/v1/crawl \
  -H "Authorization: Bearer $CLAWENGINE_API_KEY" \
  -d '{
    "url": "https://example.com/docs",
    "limit": 500,
    "format": "markdown",
    "webhook": "https://yourapp.com/hooks/clawengine"
  }'

Response · 202 Accepted

{
  "id": "crawl_8f2a1c9e",
  "status": "queued",
  "url": "https://example.com/docs"
}

GET /v1/crawl/{id}

Check the status of a crawl and collect its pages. Poll this endpoint with the id from the crawl request until status is completed. Each page comes back in the same shape as a single extract.

Request

curl https://api.clawengine.ai/v1/crawl/crawl_8f2a1c9e \
  -H "Authorization: Bearer $CLAWENGINE_API_KEY"

Response · 200 OK

{
  "id": "crawl_8f2a1c9e",
  "status": "completed",
  "total": 128,
  "pages": [
    { "url": "https://example.com/docs/quickstart",
      "title": "Quickstart",
      "markdown": "# Quickstart..." }
  ]
}

SDKs

Official Python and Node SDKs wrap every endpoint with typed responses, retries and pagination, so you can extract a page or crawl a site in a couple of lines. The REST API is always available if you prefer raw HTTP from any language.

pip install clawengine
npm i clawengine

Webhooks

Register a webhook URL on a crawl and ClawEngine posts pages to your endpoint as they complete, so you never hold a long connection open. Each delivery is signed, so you can verify it came from us before you ingest the data.

POST /hooks/clawengine
X-Clawengine-Signature: t=...,v1=...

Compliance by default. Every request reads robots.txt first, honors crawl-delay, and respects site Terms of Service. ClawEngine is for public, permitted data only. You are responsible for the URLs you crawl. Read the compliance policy.

Make your first API call

Get a key, send a URL, and get LLM-ready markdown or JSON back from one request. Public, permitted data only.

See how it works