Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Handling Pagination in Web Scraping

Learn how to scrape paginated websites by following next-page links, handling page numbers, and collecting data across multiple pages.

Python Scraping · #3beginner2 min read
Share:WhatsAppLinkedIn

Most websites split content across multiple pages. To get the complete dataset, your scraper needs to follow pagination links automatically.

Strategy 1: Following "Next" Links

The simplest approach is to look for a "Next" button and follow it until there are no more pages.

import requests
from bs4 import BeautifulSoup
import time

base_url = "https://quotes.toscrape.com"
url = base_url + "/page/1/"
all_quotes = []

while url:
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")

    for quote in soup.select("div.quote"):
        text = quote.select_one("span.text").get_text()
        author = quote.select_one("small.author").get_text()
        all_quotes.append({"text": text, "author": author})

    # Find the "Next" button
    next_btn = soup.select_one("li.next > a")
    url = base_url + next_btn["href"] if next_btn else None

    time.sleep(1)  # Be polite

print(f"Scraped {len(all_quotes)} quotes across multiple pages.")

Strategy 2: Page Number Iteration

When you know the URL pattern, you can iterate through page numbers directly.

import requests
from bs4 import BeautifulSoup

for page_num in range(1, 11):
    url = f"https://quotes.toscrape.com/page/{page_num}/"
    response = requests.get(url)

    if response.status_code != 200:
        break

    soup = BeautifulSoup(response.text, "html.parser")
    quotes = soup.select("div.quote")

    if not quotes:
        break  # No more content, stop

    for quote in quotes:
        print(quote.select_one("span.text").get_text()[:60])

Strategy 3: Offset-Based APIs

Some sites use offset or cursor parameters in their API calls. Inspect the Network tab in DevTools to discover these patterns.

import requests

offset = 0
limit = 20

while True:
    resp = requests.get(
        "https://api.example.com/items",
        params={"offset": offset, "limit": limit}
    )
    data = resp.json()

    if not data["results"]:
        break

    for item in data["results"]:
        print(item["name"])

    offset += limit

Tips for Paginated Scraping

  • Always add delays between page requests to avoid getting blocked.
  • Check for duplicates, some sites repeat items across pages.
  • Use a proxy service like ScraperAPI or ScrapingAnt if you hit rate limits while scraping many pages.
  • Save progress incrementally so you can resume if your scraper crashes mid-run.

Next Steps

  • Store your paginated results in CSV or JSON format
  • Handle errors and retries for more robust multi-page scraping
  • Explore concurrent scraping to speed up pagination