Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Python Scraping

Web scraping with Python using Requests, BeautifulSoup, Scrapy, and HTTPX

27 articles

#1

Getting Started with Web Scraping in Python

Learn the basics of web scraping with Python using the Requests library and BeautifulSoup. Your first scraper in 10 minutes.

beginner
beautifulsoupdata-extraction

#2

CSS Selectors for Web Scraping

Master CSS selectors to extract exactly the data you need. Classes, IDs, attributes, and advanced selector patterns.

beginner
beautifulsoupdata-extraction

#3

Handling Pagination in Web Scraping

Learn how to scrape paginated websites by following next-page links, handling page numbers, and collecting data across multiple pages.

beginner
beautifulsoupdata-extractionpagination

#4

Scraping with Scrapy Framework - Getting Started

Get started with Scrapy, the most powerful Python web scraping framework. Install Scrapy, create a project, and run your first spider.

beginner
scrapydata-extraction

#5

Scrapy Spiders and Items

Define structured data with Scrapy Items and build advanced spiders with CrawlSpider, SitemapSpider, and custom parsing logic.

intermediate
scrapydata-extraction

#6

Scrapy Middleware and Pipelines

Customize Scrapy's request/response flow with middleware and process scraped data using item pipelines for validation, cleaning, and storage.

intermediate
scrapydata-extraction

#7

Async Scraping with HTTPX and asyncio

Speed up your scrapers with async Python. Use HTTPX and asyncio to make concurrent HTTP requests and scrape pages in parallel.

intermediate
httpxasynciodata-extraction

#8

Scraping with aiohttp

Use aiohttp for high-performance async web scraping in Python. Learn session management, connection pooling, and concurrent page fetching.

intermediate
aiohttpasynciodata-extraction

#9

Storing Scraped Data in CSV and JSON

Save your scraped data to CSV and JSON files using Python's built-in modules. Learn best practices for data export, encoding, and file organization.

beginner
data-extractioncsvjson

#10

Storing Scraped Data in Databases (SQLite, PostgreSQL)

Store scraped data in SQLite and PostgreSQL databases. Learn schema design, upserts, and best practices for persistent scraping data storage.

intermediate
data-extractionsqlitepostgresqldatabases

#11

Error Handling and Retries in Scrapers

Build robust scrapers with proper error handling, automatic retries, exponential backoff, and graceful failure recovery.

intermediate
error-handlingretriesdata-extraction

#12

Scraping Behind Login/Authentication

Scrape websites that require login. Handle form-based authentication, session tokens, and authenticated API requests with Python.

intermediate
authenticationsessionsdata-extraction

#13

Handling Cookies and Sessions

Master cookie management and persistent sessions in Python web scraping. Handle session cookies, cookie jars, and cross-request state.

intermediate
sessionscookiesdata-extraction

#14

Scraping Dynamic Content Without a Browser

Extract data from JavaScript-heavy websites without using a browser. Discover hidden APIs, intercept XHR requests, and parse JSON responses.

intermediate
api-scrapingdata-extractiondynamic-content

#15

Using ScraperAPI with Python

Integrate ScraperAPI into your Python scrapers for automatic proxy rotation, CAPTCHA solving, and JavaScript rendering. Complete guide with code examples.

beginner
scraperapiproxy-rotationdata-extraction

#16

Using ScrapingAnt with Python

Integrate ScrapingAnt into your Python scrapers for headless browser rendering, proxy rotation, and anti-bot bypass. Complete tutorial with examples.

beginner
scrapingantproxy-rotationdata-extraction

#17

Web Scraping with lxml and XPath

Use lxml and XPath expressions for fast, powerful HTML parsing. Learn XPath syntax, axes, and functions for precise data extraction.

intermediate
lxmlxpathdata-extraction

#18

Extracting Data from HTML Tables

Scrape HTML tables from websites using BeautifulSoup and pandas. Handle complex tables with rowspan, colspan, and nested elements.

beginner
beautifulsouppandasdata-extraction

#19

Scraping Images and Files

Download images, PDFs, and other files while web scraping. Learn URL resolution, streaming downloads, and file organization best practices.

intermediate
beautifulsoupdata-extractionfile-download

#20

Building a Price Monitoring Scraper

Build a complete price monitoring scraper that tracks product prices over time, detects price drops, and sends alerts. A real-world scraping project.

intermediate
beautifulsoupdata-extractionproject

#21

Scraping Multiple Pages Concurrently

Speed up scraping with concurrent requests using threading, multiprocessing, and asyncio. Learn to balance speed with politeness.

intermediate
concurrencyasynciothreadingdata-extraction

#22

Scraping with Python and Regex

Use Python regular expressions to extract emails, phone numbers, prices, URLs, and other patterns from scraped web pages.

intermediate
regexdata-extraction

#23

Handling Different Encodings (UTF-8, ISO-8859)

Handle character encoding issues in web scraping. Detect, convert, and fix UTF-8, ISO-8859, and other encodings to avoid garbled text.

intermediate
encodingdata-extraction

#24

Scraping XML and RSS Feeds

Parse XML documents and RSS/Atom feeds with Python. Extract structured data from feeds using feedparser, lxml, and the xml.etree module.

beginner
xmlrssdata-extraction

#25

Building a News Aggregator Scraper

Build a complete news aggregator that collects articles from multiple sources using RSS feeds and web scraping. Deduplicate, categorize, and store results.

intermediate
beautifulsouprssdata-extractionproject

#26

Scraping with Zyte API

Use Zyte API (formerly Scrapy Cloud) for intelligent web scraping with automatic extraction, browser rendering, and anti-bot bypass.

intermediate
zyteproxy-rotationdata-extraction

#27

Web Scraping Best Practices and Patterns

Master web scraping best practices: respectful scraping, anti-detection, data quality, error recovery, project architecture, and legal considerations.

advanced
best-practicesdata-extractionarchitecture