Guide
How to Avoid Getting Blocked While Scraping
A comprehensive guide to avoiding blocks and bans while web scraping, covering proxy rotation, headers, rate limiting, and anti-detection techniques.
Getting blocked is the most common frustration in web scraping. Websites use sophisticated anti-bot systems to detect and block scrapers. Here are proven techniques to avoid detection.
Why You Get Blocked
Websites detect scrapers through:
- IP reputation, known datacenter or previously flagged IPs
- Request patterns, too fast, too regular, or unnatural navigation
- Browser fingerprinting, missing or inconsistent browser signals
- Header analysis, missing or suspicious HTTP headers
- CAPTCHAs, challenge-response tests for suspected bots
Technique 1: Rotate Proxies
Never scrape from a single IP address:
import requests
import random
proxies = [
"http://user:pass@proxy1:8080",
"http://user:pass@proxy2:8080",
"http://user:pass@proxy3:8080",
]
def scrape_with_rotation(url):
proxy = random.choice(proxies)
return requests.get(url, proxies={"http": proxy, "https": proxy})
Or let ScraperAPI handle rotation automatically:
import requests
response = requests.get("https://api.scraperapi.com", params={
"api_key": "YOUR_KEY",
"url": "https://example.com/target"
})
# Proxies are rotated automatically
Technique 2: Use Realistic Headers
import requests
import random
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
]
headers = {
"User-Agent": random.choice(user_agents),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
}
response = requests.get("https://example.com", headers=headers)
Technique 3: Add Random Delays
import time
import random
def human_delay():
"""Simulate human-like browsing delays"""
delay = random.uniform(2, 5) + random.gauss(0, 0.5)
time.sleep(max(1, delay))
for url in urls:
response = requests.get(url, headers=headers)
human_delay()
Technique 4: Handle CAPTCHAs
ScraperAPI and ScrapingAnt handle CAPTCHAs automatically:
# ScraperAPI solves CAPTCHAs automatically
response = requests.get("https://api.scraperapi.com", params={
"api_key": "YOUR_KEY",
"url": "https://protected-site.com/page",
"render": "true"
})
# CAPTCHA solved, content returned
Technique 5: Manage Sessions and Cookies
session = requests.Session()
# Visit the homepage first (like a real user)
session.get("https://example.com", headers=headers)
time.sleep(2)
# Then navigate to your target
response = session.get("https://example.com/products", headers=headers)
Technique 6: Render JavaScript
Many anti-bot systems check for JavaScript execution:
# ScrapingAnt renders JavaScript on every request
response = requests.get("https://api.scrapingant.com/v2/general", params={
"x-api-key": "YOUR_KEY",
"url": "https://protected-site.com",
"browser": "true"
})
The Simplest Approach
Instead of implementing all these techniques yourself, use a scraping API that handles everything:
| Technique | DIY | ScraperAPI/ScrapingAnt |
|---|---|---|
| Proxy rotation | Manual setup | Automatic |
| Header management | Manual | Automatic |
| CAPTCHA solving | Third-party service | Built-in |
| JS rendering | Playwright/Selenium | One parameter |
| Rate limiting | Manual | Built-in |
Verdict
The most reliable way to avoid getting blocked is to use a managed scraping API like ScraperAPI or ScrapingAnt. They implement all anti-detection techniques automatically, achieving 95-99% success rates even on heavily protected sites. For DIY scraping, combine proxy rotation, realistic headers, random delays, and JavaScript rendering for the best results.