Handling Pagination in Web Scraping - Python Scraping

Learn how to scrape paginated websites by following next-page links, handling page numbers, and collecting data across multiple pages.

Most websites split content across multiple pages. To get the complete dataset, your scraper needs to follow pagination links automatically.

Strategy 1: Following "Next" Links

The simplest approach is to look for a "Next" button and follow it until there are no more pages.

import requests
from bs4 import BeautifulSoup
import time

base_url = "https://quotes.toscrape.com"
url = base_url + "/page/1/"
all_quotes = []

while url:
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")

    for quote in soup.select("div.quote"):
        text = quote.select_one("span.text").get_text()
        author = quote.select_one("small.author").get_text()
        all_quotes.append({"text": text, "author": author})

    # Find the "Next" button
    next_btn = soup.select_one("li.next > a")
    url = base_url + next_btn["href"] if next_btn else None

    time.sleep(1)  # Be polite

print(f"Scraped {len(all_quotes)} quotes across multiple pages.")

Strategy 2: Page Number Iteration

When you know the URL pattern, you can iterate through page numbers directly.

import requests
from bs4 import BeautifulSoup

for page_num in range(1, 11):
    url = f"https://quotes.toscrape.com/page/{page_num}/"
    response = requests.get(url)

    if response.status_code != 200:
        break

    soup = BeautifulSoup(response.text, "html.parser")
    quotes = soup.select("div.quote")

    if not quotes:
        break  # No more content, stop

    for quote in quotes:
        print(quote.select_one("span.text").get_text()[:60])

Strategy 3: Offset-Based APIs

Some sites use offset or cursor parameters in their API calls. Inspect the Network tab in DevTools to discover these patterns.

import requests

offset = 0
limit = 20

while True:
    resp = requests.get(
        "https://api.example.com/items",
        params={"offset": offset, "limit": limit}
    )
    data = resp.json()

    if not data["results"]:
        break

    for item in data["results"]:
        print(item["name"])

    offset += limit

Tips for Paginated Scraping

Always add delays between page requests to avoid getting blocked.
Check for duplicates, some sites repeat items across pages.
Use a proxy service like ScraperAPI or ScrapingAnt if you hit rate limits while scraping many pages.
Save progress incrementally so you can resume if your scraper crashes mid-run.

Next Steps

Store your paginated results in CSV or JSON format
Handle errors and retries for more robust multi-page scraping
Explore concurrent scraping to speed up pagination