Getting Started

Tadpole is a lightweight, DSL and scraper engine. The DSL is powered by KDL. The CLI is the primary way to execute your .kdl scripts.

Why?

Tadpole tries to simplify the complexities of web scraping and automation by:

Abstraction: Simulating realistic human behavior (bezier curves, easing) through high-level composed actions.
Zero Config: Import and share scraper modules directly via Git, bypass NPM/Registry overhead.
Reusability: Actions and evaluators can be composed through slots to create more complex workflows.

You can install the Tadpole CLI globally using your preferred package manager:

pnpm add -g @tadpolehq/cli

npm install -g @tadpolehq/cli

yarn global add @tadpolehq/cli

Create a file named hello.kdl. We’ll use a simple script to grab the “Article of the Day” from Wikipedia.

main {
  new_page {
    goto "https://en.wikipedia.org"
    extract data {
      article {
        $ "#mp-tfa"
        text
      }
    }
  }
}

main: The execution root of the script.
new_page: Creates a new browser tab and initiates a unique CDP session.
goto: Navigates to a URL. It automatically waits for the load event before proceeding.
extract: Transforms page content into a JSON object. By default, it starts at the document level.
$: Scopes the extraction to a specific CSS selector (in this case, #mp-tfa).
text: Pulls the innerText from the currently selected node and assigns it to the property name (in this case, article).

tadpole run hello.kdl --auto --headless

{
  "data": {
    "article": "Opifex fuscus is a species of mosquito that is endemic..."
  }
}