Skip to main content

Best Web Scraping APIs (2026)

·APIScout Team
Share:

Scraping in 2026: Harder Than It Looks

A basic HTTP GET request returns raw HTML for most pages. For the rest — the ones backed by React, Angular, or Vue — JavaScript must execute before any content appears. Anti-bot measures (Cloudflare, DataDome, hCaptcha) block requests that look like automation. IP-based rate limiting blocks requests from data center IP ranges. Rotating user agents is table stakes; modern bot detection fingerprints Canvas, WebGL, timing, and behavioral signals.

Web scraping APIs solve the infrastructure problem: they maintain residential proxy pools, run headless browsers at scale, handle CAPTCHA solving, and rotate fingerprints — you send a URL, they return the rendered HTML or structured data.

In 2026, four platforms serve different points in the scraping stack: ScrapingBee (simple API for developers), Bright Data (enterprise proxy and scraping infrastructure), Apify (actor-based scraping platform with pre-built scrapers), and Oxylabs (data-center and residential proxies with Web Scraper API).

TL;DR

ScrapingBee is the simplest entry point — one API call returns rendered HTML, no proxy management required, starting at $49/month. Bright Data has the most capable and expensive infrastructure — the largest residential proxy network, a scraping browser, and unlocker service for heavily protected sites. Apify is the right choice when you need pre-built scrapers for specific sites (Amazon, Google Maps, LinkedIn) or want to build and host custom scrapers. Oxylabs offers competitive proxy infrastructure and a structured Web Scraper API for SERP, e-commerce, and real estate data.

Key Takeaways

  • ScrapingBee charges $49/month for 150K API credits (1 basic request = 1 credit, 1 JS-rendered request = 5 credits, 1 premium proxy request = 25 credits).
  • Bright Data's Web Unlocker starts at $1.50/1,000 requests — the highest-reliability unlocker for heavily protected sites (Amazon, LinkedIn, Zillow).
  • Apify's free tier includes $5/month in compute credits, with paid plans starting at $49/month (25K compute units/month).
  • Oxylabs Web Scraper API starts at $49 for 17,500 results ($2.80/1,000) for SERP and e-commerce structured data extraction.
  • Residential proxies vs. data center proxies: Residential IPs appear to come from real users' ISPs — much harder to block but more expensive. Data center IPs are cheaper but blocked by sophisticated anti-bot systems.
  • Headless browser vs. plain HTTP: Headless browsers (Chrome) render JavaScript before returning HTML — required for SPA pages but 5-10x more expensive per request than plain HTTP.
  • Legal considerations: Web scraping legality varies by jurisdiction and target site's terms of service. Public data scraping is generally permitted; scraping behind login walls or accessing personal data has greater legal risk.

Pricing Comparison

PlatformFree TierPaid StartingPer 1,000 Requests
ScrapingBee1,000 credits trial$49/month~$0.33-$8.25 (varies by type)
Bright Data Web UnlockerTrial$1.50/1,000 requests$1.50
Apify$5 credits/month$49/monthVaries by compute time
Oxylabs Web ScraperNo$49/17.5K results$2.80/1,000

ScrapingBee

Best for: Developers new to scraping, simple API integration, JavaScript rendering, no proxy management

ScrapingBee abstracts all scraping infrastructure behind a single REST API endpoint. You send a GET request with your target URL and parameters, ScrapingBee handles proxy rotation, browser rendering, and CAPTCHA solving, and returns HTML or JSON.

Pricing

PlanCostCredits/Month
Freelance$49/month150,000
Startup$99/month500,000
Business$249/month1,500,000
Enterprise$599/month5,000,000

Credit costs per request type:

  • Standard (no JS): 1 credit
  • JavaScript rendering: 5 credits
  • Premium proxies (residential): 25 credits
  • Stealth proxy (high-protection sites): 75 credits

API Integration

import requests

# Basic HTML scraping
response = requests.get(
    "https://app.scrapingbee.com/api/v1/",
    params={
        "api_key": "your-api-key",
        "url": "https://example.com/products",
        "render_js": "false",       # No JavaScript rendering needed
    },
)
html = response.text

# JavaScript-rendered page
response = requests.get(
    "https://app.scrapingbee.com/api/v1/",
    params={
        "api_key": "your-api-key",
        "url": "https://react-app.example.com/products",
        "render_js": "true",        # Render JavaScript
        "wait": "2000",             # Wait 2 seconds for JS to execute
        "wait_for": "#product-list",# Wait for element to appear
        "premium_proxy": "false",
    },
)

JavaScript Snippet Execution

# Execute custom JavaScript before capturing the page
response = requests.get(
    "https://app.scrapingbee.com/api/v1/",
    params={
        "api_key": "your-api-key",
        "url": "https://example.com/modal-page",
        "render_js": "true",
        # Execute JS to close modals before capture
        "js_snippet": "document.querySelector('.cookie-modal')?.remove(); document.querySelector('.signup-overlay')?.remove();",
    },
)

When to Choose ScrapingBee

Developers building a scraper for the first time who want to skip proxy and browser management, applications that need occasional scraping of public pages (product pricing, news articles, public listings), or teams that want a simple pay-per-credit model without the complexity of proxy pool management.

Bright Data

Best for: Enterprise infrastructure, heavily protected sites, largest residential proxy network

Bright Data operates the largest commercial proxy network — 72M+ residential IPs across 195 countries. The platform serves enterprise data collection teams that need to bypass sophisticated anti-bot measures on the most protected sites on the internet (Amazon, LinkedIn, Zillow, Google).

Products and Pricing

ProductUse CaseStarting Price
Web UnlockerBypass anti-bot, CAPTCHA solving$1.50/1,000 requests
Scraping BrowserFull browser automation$1.50-10/1,000 requests
Residential ProxiesIP rotation network$8.50/GB
SERP APIStructured Google results$1.50/1,000 results
DatasetsPre-collected dataCustom

Web Unlocker API

import requests

# Web Unlocker — handles anti-bot, CAPTCHA, fingerprinting automatically
proxies = {
    "http": "http://brd-customer-hl_ACCOUNT_ID-zone-unlocker:PASSWORD@brd.superproxy.io:22225",
    "https": "http://brd-customer-hl_ACCOUNT_ID-zone-unlocker:PASSWORD@brd.superproxy.io:22225",
}

# Point any HTTP client at Bright Data's proxy
response = requests.get(
    "https://www.amazon.com/dp/B09XHVJ6Z3",
    proxies=proxies,
    verify=False,  # Bright Data uses its own SSL cert
)

html = response.text

Scraping Browser (Playwright)

from playwright.async_api import async_playwright

async with async_playwright() as pw:
    # Connect Playwright to Bright Data's scraping browser
    # Bright Data handles fingerprinting, CAPTCHA, and anti-bot
    browser = await pw.chromium.connect_over_cdp(
        f"wss://brd-customer-hl_{ACCOUNT_ID}-zone-scraping_browser:{PASSWORD}@brd.superproxy.io:9222"
    )

    page = await browser.new_page()
    await page.goto("https://www.linkedin.com/jobs/", timeout=60000)

    # Interact with page as normal Playwright code
    jobs = await page.query_selector_all(".job-card-container")
    for job in jobs:
        title = await job.query_selector(".job-card-list__title")
        print(await title.inner_text())

    await browser.close()

When to Choose Bright Data

Enterprise data collection teams scraping heavily protected sites (LinkedIn, Amazon, Zillow), organizations that need the highest-quality residential proxy coverage with 72M+ IPs, or teams building production-scale scrapers where reliability on protected sites justifies the premium pricing.

Apify

Best for: Pre-built scrapers for specific sites, actor marketplace, full-stack scraping platform

Apify is not just a proxy service — it's a full-stack web scraping platform. The Apify Store contains hundreds of pre-built "actors" (scrapers) for specific sites: Amazon products, Google Maps reviews, Instagram profiles, YouTube channels, LinkedIn companies, and hundreds more. You can run these actors without writing code, or build and host your own.

Pricing

PlanCostCompute Units/Month
Free$0$5 equivalent
Starter$49/month25,000 CUs
Scale$149/month100,000 CUs
Business$499/month400,000 CUs

Compute units are consumed based on memory × time. A 1GB scraper running for 1 minute = 60 CUs.

Pre-Built Actors (No-Code)

# Run Amazon scraper via Apify CLI
npx apify-cli run apify/amazon-scraper \
  --input='{"search": "bluetooth headphones", "maxResults": 100}' \
  --token=YOUR_APIFY_TOKEN

# Or via API
curl -X POST "https://api.apify.com/v2/acts/apify~amazon-scraper/runs" \
  -H "Authorization: Bearer $APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "search": "bluetooth headphones",
    "maxResults": 100
  }'

Custom Actor (Playwright)

// src/main.ts — Apify actor
import { Actor } from "apify";
import { PlaywrightCrawler } from "crawlee";

await Actor.init();

const crawler = new PlaywrightCrawler({
  async requestHandler({ page, request, pushData }) {
    const title = await page.title();

    // Extract data from page
    const products = await page.$$eval(".product-item", (items) =>
      items.map((item) => ({
        name: item.querySelector(".name")?.textContent?.trim(),
        price: item.querySelector(".price")?.textContent?.trim(),
        url: item.querySelector("a")?.href,
      }))
    );

    await pushData({ url: request.url, title, products });
  },
});

await crawler.run(["https://example.com/products"]);

await Actor.exit();
# Deploy your custom actor to Apify cloud
npx apify-cli push

# Run it on demand or schedule
apify actor:run --memory=1024

When to Choose Apify

Teams that need pre-built scrapers for specific popular sites (Amazon, Google Maps, LinkedIn, Instagram) without writing custom code, developers who want a managed platform for hosting and scheduling scrapers, or organizations that need a marketplace of ready-to-use data extraction actors.

Oxylabs

Best for: SERP and e-commerce structured data, residential proxy network, data center proxies

Oxylabs is the competitor to Bright Data at the enterprise level — a large residential proxy network alongside purpose-built Web Scraper APIs that return structured data (JSON) rather than raw HTML.

Products and Pricing

ProductStarting PriceOutput
SERP API$49/17,500 resultsStructured JSON from Google
E-Commerce Scraper API$49/17,500 resultsProduct data JSON
Web Scraper API$49/17,500 resultsGeneral HTML or JSON
Residential Proxies$9/GBRaw proxy
Data Center Proxies$1.30/GBRaw proxy

SERP API

import requests

# Oxylabs SERP API — structured Google results
payload = {
    "source": "google_search",
    "query": "python web scraping",
    "domain": "com",
    "geo_location": "United States",
    "locale": "en-us",
    "parse": True,   # Return structured JSON, not raw HTML
    "pages": 3,
}

response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    auth=("YOUR_USERNAME", "YOUR_PASSWORD"),
    json=payload,
)

data = response.json()
results = data["results"][0]["content"]["results"]["organic"]
for result in results:
    print(f"{result['pos']}. {result['title']}: {result['url']}")

E-Commerce Scraper

# Extract structured product data from Amazon
payload = {
    "source": "amazon_product",
    "query": "B09XHVJ6Z3",  # ASIN
    "parse": True,
}

response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    auth=("USERNAME", "PASSWORD"),
    json=payload,
)

product = response.json()["results"][0]["content"]
print(f"Title: {product['title']}")
print(f"Price: {product['price']}")
print(f"Rating: {product['rating']}")
print(f"Reviews: {product['reviews_count']}")

When to Choose Oxylabs

Teams that need structured data extraction (JSON output) from e-commerce and SERP sources without parsing HTML, organizations that need enterprise-grade proxy infrastructure as a Bright Data alternative, or projects where Oxylabs' competitive pricing on residential proxies ($9/GB vs Bright Data's $8.50/GB) matters at scale.

Choosing the Right Tool

ScenarioRecommended
First-time scraper, simple pagesScrapingBee
Pre-built Amazon/Google scraperApify
Custom scraper, manage infrastructureApify
Heavily protected sites (LinkedIn)Bright Data Web Unlocker
Largest residential proxy poolBright Data
SERP data extraction (structured)Oxylabs SERP API
E-commerce price monitoringOxylabs or ScrapingBee
Enterprise volume, full controlBright Data
No-code scrapingApify (pre-built actors)

Self-Hosted Alternative

For teams with engineering capacity:

// Crawlee + Playwright — self-hosted scraper
import { PlaywrightCrawler, Dataset } from "crawlee";

const crawler = new PlaywrightCrawler({
  // Crawlee handles browser pool, request queue, retry logic
  maxConcurrency: 10,

  async requestHandler({ page, request, enqueueLinks, pushData }) {
    const title = await page.title();
    const data = await page.evaluate(() => {
      return Array.from(document.querySelectorAll(".product")).map(p => ({
        name: p.querySelector(".name")?.textContent,
        price: p.querySelector(".price")?.textContent,
      }));
    });

    await pushData({ url: request.url, title, data });
    await enqueueLinks({ selector: "a.pagination-next" }); // Follow pagination
  },
});

await crawler.run(["https://example.com/products"]);

Self-hosted costs: VPS for browser compute + residential proxy subscription. Viable for teams with engineering capacity; Crawlee (by Apify) is open-source.

Anti-Bot Technology and How Scraping APIs Bypass It

Modern bot protection uses behavioral fingerprinting — not just checking request headers, but analyzing how the browser behaves over time. TLS fingerprinting identifies headless browsers by their TLS handshake parameters (cipher suites, extension order) that differ from real browser TLS signatures. Canvas fingerprinting generates a canvas element and measures rendering — headless Chrome without GPU rendering produces different canvas hashes than Chrome running on a real machine. WebGL fingerprinting and audio fingerprinting operate similarly.

Residential proxy networks (Bright Data, Oxylabs, SmartProxy) route requests through real consumer IP addresses. This sidesteps IP-based blocking because the request appears to come from a genuine home internet connection, not a datacenter. The proxy provider maintains millions of IPs that rotate per request, preventing rate limiting by IP. The trade-off: residential proxies are slower than datacenter proxies (40-200ms additional latency from the residential hop), less reliable (residential IPs can disconnect mid-request), and more expensive ($8-15/GB vs $1-3/GB for datacenter proxies).

Browser fingerprint spoofing is what Bright Data's Scraping Browser and Browserbase's managed Chrome implement. They run patched Chromium builds with GPU emulation, realistic TLS fingerprints, WebGL spoofing, and timing behavior that matches real user browsing patterns. These stealth browsers pass most fingerprinting checks that would identify standard Playwright or Puppeteer instances.

Captcha solving — handling hCaptcha, reCAPTCHA, and CloudFlare challenges that appear in the middle of a scraping session — is the remaining blocker. ScrapingBee includes automatic captcha handling; Bright Data's Scraping Browser handles most Cloudflare challenges natively. Third-party captcha solving services (2Captcha, Anti-Captcha) provide human-solved captcha responses for challenges that automated bypass can't handle — typically $1-2 per 1,000 solved captchas, with 30-120 second response times.

Web scraping occupies legally complex territory that varies by jurisdiction, data type, and how the scraped data is used. The relevant legal frameworks: Computer Fraud and Abuse Act (CFAA) in the US, GDPR in Europe, and various national copyright and database rights laws.

The hiQ v. LinkedIn case established in the US that scraping publicly accessible data (not behind a login) is not a CFAA violation — LinkedIn's cease-and-desist letters don't create a legal prohibition on scraping public profiles. However, this ruling is narrow and fact-specific; scraping behind login walls, circumventing explicit technical measures (CAPTCHAs, robots.txt enforcement), or accessing non-public data carries different legal risks.

GDPR imposes obligations on any personal data scraped from websites accessible to EU residents, regardless of where the scraping operator is located. Email addresses, names, and other personal identifiers scraped from public pages are personal data under GDPR — their collection and processing requires a lawful basis. Legitimate interest (business contact data for direct marketing) is sometimes asserted, but enforcement actions against data brokers have established that indiscriminate personal data scraping without notice to data subjects is high-risk under GDPR.

The robots.txt file is a conventions file, not legally binding — violating robots.txt is not itself a CFAA violation per the hiQ ruling. However, courts have considered robots.txt in breach-of-contract cases where a website's Terms of Service prohibit scraping and the scraper agreed to the ToS (by logging in). If your scraping involves clicking through a ToS agreement, that agreement becomes contractually relevant.

The practical guidance: scraping publicly available non-personal data (product prices, job listings, public business information) for competitive intelligence is generally lower-risk than scraping personal profiles or bypassing authentication. Consult legal counsel for commercial scraping operations, particularly when data will be resold or published.

Verdict

ScrapingBee is the right starting point for most development teams — one API, clear credit pricing, JavaScript rendering included, no proxy management.

Bright Data is the enterprise choice when you need to reliably scrape the most protected sites at scale. The 72M+ residential IP pool and the Scraping Browser give you capabilities that smaller networks can't match.

Apify is the choice when you want the scraping work already done — the actor marketplace provides production-ready scrapers for the most common sites, deployable without writing any code.

Oxylabs is the competitive alternative to Bright Data for e-commerce and SERP data, with structured JSON output APIs that skip the HTML parsing step.


Compare web scraping API pricing, proxy network quality, and site coverage at APIScout — find the right data extraction platform for your use case.

Related: Firecrawl vs Jina vs Apify: Best Scraping API 2026, Best Browser Automation APIs 2026, Stagehand vs Playwright: AI Browser Automation 2026

The API Integration Checklist (Free PDF)

Step-by-step checklist: auth setup, rate limit handling, error codes, SDK evaluation, and pricing comparison for 50+ APIs. Used by 200+ developers.

Join 200+ developers. Unsubscribe in one click.