Getting Started
Tadpole is a lightweight, DSL and scraper engine. The DSL is powered by KDL. The CLI is the primary way to execute your .kdl scripts.
Tadpole tries to simplify the complexities of web scraping and automation by:
- Abstraction: Simulating realistic human behavior (bezier curves, easing) through high-level composed actions.
- Zero Config: Import and share scraper modules directly via Git, bypass NPM/Registry overhead.
- Reusability: Actions and evaluators can be composed through slots to create more complex workflows.
Requirements
Section titled “Requirements”- NodeJS
- Modern version of Chrome or Chromimum (Tadpole uses CDP for browser automation)
Installation
Section titled “Installation”You can install the Tadpole CLI globally using your preferred package manager:
pnpm add -g @tadpolehq/clinpm install -g @tadpolehq/cliyarn global add @tadpolehq/cliYour First Script
Section titled “Your First Script”Create a file named hello.kdl. We’ll use a simple script to grab the “Article of the Day” from Wikipedia.
main { new_page { goto "https://en.wikipedia.org" extract data { article { $ "#mp-tfa" text } } }}Understanding the Script
Section titled “Understanding the Script”main: The execution root of the script.new_page: Creates a new browser tab and initiates a unique CDP session.goto: Navigates to a URL. It automatically waits for theloadevent before proceeding.extract: Transforms page content into a JSON object. By default, it starts at the document level.$: Scopes the extraction to a specific CSS selector (in this case,#mp-tfa).text: Pulls theinnerTextfrom the currently selected node and assigns it to the property name (in this case,article).
Running
Section titled “Running”tadpole run hello.kdl --auto --headlessThe Result
Section titled “The Result”{ "data": { "article": "Opifex fuscus is a species of mosquito that is endemic..." }}