Getting Started
Tadpole is a lightweight, DSL and scraper engine. The DSL is powered by KDL. The CLI is the primary way to execute your .kdl scripts.
Requirements
Section titled “Requirements”- NodeJS
- Modern version of Chrome or Chromimum (Tadpole uses CDP for browser automation)
Installation
Section titled “Installation”You can install the Tadpole CLI globally using your preferred package manager:
pnpm add -g @tadpolehq/clinpm install -g @tadpolehq/cliyarn global add @tadpolehq/cliYour First Script
Section titled “Your First Script”Create a file named hello.kdl. We’ll use a simple script to grab the “Article of the Day” from Wikipedia.
main { new_page { goto "https://en.wikipedia.org" extract data { article { $ "#mp-tfa" text } } }}Understanding the Script
Section titled “Understanding the Script”main: The execution root of the script.new_page: Creates a new browser tab and initiates a unique CDP session.goto: Navigates to a URL. It automatically waits for theloadevent before proceeding.extract: Transforms page content into a JSON object. By default, it starts at the document level.$: Scopes the extraction to a specific CSS selector (in this case,#mp-tfa).text: Pulls theinnerTextfrom the currently selected node and assigns it to the property name (in this case,article).
Running
Section titled “Running”tadpole run hello.kdl --auto --headlessThe Result
Section titled “The Result”{ "data": { "article": "Opifex fuscus is a species of mosquito that is endemic..." }}