Quick Start
Getting Started with trueparse
Introduction
trueparse is a modern web scraping API that extracts clean, structured data from any website. Unlike traditional scrapers that break when websites change, trueparse uses AI-powered parsing to reliably extract content in multiple formats: HTML, Markdown, images, and custom JSON schemas.
Scrape
Scrape data from any website and parse into HTML, Markdown, and images effortlessly.
Crawl
Crawl a website and extract all links, images, and other assets.
Extract data
Extract custom and guaranteed JSON data using natural language prompts or schemas.
Schedules
Schedule regular scrapes and crawls and receive data updates via webhooks.
Capabilities
- Dynamic Content Rendering: Fully managed virtual browsers that render dynamic JavaScript content
- Universal Parsing: Effortlessly parse websites, PDFs, images and more into clean structured data
- AI-Powered Data Extraction: Extract perfect JSON data using natural language or custom schemas. Ditch manual HTML selectors
- Advanced Stealth Mode: Stay undetected with world-class anti-bot software and automatic proxy management
- Parallel Web Crawling: Crawl entire websites and scrape multiple pages in parallel for maximum performance
- Schedules & Webhooks: Automate your scraping tasks with schedules and deliver data directly via webhooks
Getting Started
1. Get Your API Key
First, you'll need an API key to authenticate your requests:
- Sign up for a trueparse account
- Navigate to the API Keys page in your dashboard
- Click "Create New Key" and give it a descriptive name
- Copy your API key (keep it secure - and don't share it with anyone!)
2. Test in the Playground
Before writing code, try the API playground to:
- Test different websites and see the extracted data
- Experiment with output formats (HTML, Markdown, images)
- Generate code snippets for your preferred language
- Test data extraction with prompts or custom schemas
Your First API Call
Here's how to make your first request to extract content from a webpage. Replace <API_KEY>
with your actual API key from the dashboard.
curl -X POST https://api.trueparse.com/v0/scrape \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"outputs": ["markdown", "html"]
}'
import requests
import json
url = "https://api.trueparse.com/v0/scrape"
headers = {
"Authorization": "Bearer <API_KEY>",
"Content-Type": "application/json"
}
data = {
"url": "https://example.com",
"outputs": ["markdown", "html"]
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
print(result)
const url = "https://api.trueparse.com/v0/scrape";
const data = {
url: "https://example.com",
outputs: ["markdown", "html"]
};
fetch(url, {
method: "POST",
headers: {
"Authorization": "Bearer <API_KEY>",
"Content-Type": "application/json"
},
body: JSON.stringify(data)
})
.then(response => response.json())
.then(result => console.log(result));
Next Steps
Ready to dive deeper? Check out these guides:
- Scraping: Scraping and extracting data
- Crawling - Crawl entire websites
- Scheduling - Set up automated, recurring scrapes