Handling Pagination in Web Scraping
Learn how to scrape paginated websites by following next-page links, handling page numbers, and collecting data across multiple pages.
Python Scraping · #3beginner2 min read
Most websites split content across multiple pages. To get the complete dataset, your scraper needs to follow pagination links automatically.
Strategy 1: Following "Next" Links
The simplest approach is to look for a "Next" button and follow it until there are no more pages.
import requests
from bs4 import BeautifulSoup
import time
base_url = "https://quotes.toscrape.com"
url = base_url + "/page/1/"
all_quotes = []
while url:
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for quote in soup.select("div.quote"):
text = quote.select_one("span.text").get_text()
author = quote.select_one("small.author").get_text()
all_quotes.append({"text": text, "author": author})
# Find the "Next" button
next_btn = soup.select_one("li.next > a")
url = base_url + next_btn["href"] if next_btn else None
time.sleep(1) # Be polite
print(f"Scraped {len(all_quotes)} quotes across multiple pages.")
Strategy 2: Page Number Iteration
When you know the URL pattern, you can iterate through page numbers directly.
import requests
from bs4 import BeautifulSoup
for page_num in range(1, 11):
url = f"https://quotes.toscrape.com/page/{page_num}/"
response = requests.get(url)
if response.status_code != 200:
break
soup = BeautifulSoup(response.text, "html.parser")
quotes = soup.select("div.quote")
if not quotes:
break # No more content, stop
for quote in quotes:
print(quote.select_one("span.text").get_text()[:60])
Strategy 3: Offset-Based APIs
Some sites use offset or cursor parameters in their API calls. Inspect the Network tab in DevTools to discover these patterns.
import requests
offset = 0
limit = 20
while True:
resp = requests.get(
"https://api.example.com/items",
params={"offset": offset, "limit": limit}
)
data = resp.json()
if not data["results"]:
break
for item in data["results"]:
print(item["name"])
offset += limit
Tips for Paginated Scraping
- Always add delays between page requests to avoid getting blocked.
- Check for duplicates, some sites repeat items across pages.
- Use a proxy service like ScraperAPI or ScrapingAnt if you hit rate limits while scraping many pages.
- Save progress incrementally so you can resume if your scraper crashes mid-run.
Next Steps
- Store your paginated results in CSV or JSON format
- Handle errors and retries for more robust multi-page scraping
- Explore concurrent scraping to speed up pagination