Python Scraping

Web scraping with Python using Requests, BeautifulSoup, Scrapy, and HTTPX

27 articles

Getting Started with Web Scraping in Python

Learn the basics of web scraping with Python using the Requests library and BeautifulSoup. Your first scraper in 10 minutes.

beginner

beautifulsoupdata-extraction

CSS Selectors for Web Scraping

Master CSS selectors to extract exactly the data you need. Classes, IDs, attributes, and advanced selector patterns.

beginner

beautifulsoupdata-extraction

Handling Pagination in Web Scraping

Learn how to scrape paginated websites by following next-page links, handling page numbers, and collecting data across multiple pages.

beginner

beautifulsoupdata-extractionpagination

Scraping with Scrapy Framework - Getting Started

Get started with Scrapy, the most powerful Python web scraping framework. Install Scrapy, create a project, and run your first spider.

beginner

scrapydata-extraction

Scrapy Spiders and Items

Define structured data with Scrapy Items and build advanced spiders with CrawlSpider, SitemapSpider, and custom parsing logic.

intermediate

scrapydata-extraction

Scrapy Middleware and Pipelines

Customize Scrapy's request/response flow with middleware and process scraped data using item pipelines for validation, cleaning, and storage.

intermediate

scrapydata-extraction

Async Scraping with HTTPX and asyncio

Speed up your scrapers with async Python. Use HTTPX and asyncio to make concurrent HTTP requests and scrape pages in parallel.

intermediate

httpxasynciodata-extraction

Scraping with aiohttp

Use aiohttp for high-performance async web scraping in Python. Learn session management, connection pooling, and concurrent page fetching.

intermediate

aiohttpasynciodata-extraction

Storing Scraped Data in CSV and JSON

Save your scraped data to CSV and JSON files using Python's built-in modules. Learn best practices for data export, encoding, and file organization.

beginner

data-extractioncsvjson

#10

Storing Scraped Data in Databases (SQLite, PostgreSQL)

Store scraped data in SQLite and PostgreSQL databases. Learn schema design, upserts, and best practices for persistent scraping data storage.

intermediate

data-extractionsqlitepostgresqldatabases

#11

Error Handling and Retries in Scrapers

Build robust scrapers with proper error handling, automatic retries, exponential backoff, and graceful failure recovery.

intermediate

error-handlingretriesdata-extraction

#12

Scraping Behind Login/Authentication

Scrape websites that require login. Handle form-based authentication, session tokens, and authenticated API requests with Python.

intermediate

authenticationsessionsdata-extraction

#13

Handling Cookies and Sessions

Master cookie management and persistent sessions in Python web scraping. Handle session cookies, cookie jars, and cross-request state.

intermediate

sessionscookiesdata-extraction

#14

Scraping Dynamic Content Without a Browser

Extract data from JavaScript-heavy websites without using a browser. Discover hidden APIs, intercept XHR requests, and parse JSON responses.

intermediate

api-scrapingdata-extractiondynamic-content

#15

Using ScraperAPI with Python

Integrate ScraperAPI into your Python scrapers for automatic proxy rotation, CAPTCHA solving, and JavaScript rendering. Complete guide with code examples.

beginner

scraperapiproxy-rotationdata-extraction

#16

Using ScrapingAnt with Python

Integrate ScrapingAnt into your Python scrapers for headless browser rendering, proxy rotation, and anti-bot bypass. Complete tutorial with examples.

beginner

scrapingantproxy-rotationdata-extraction

#17

Web Scraping with lxml and XPath

Use lxml and XPath expressions for fast, powerful HTML parsing. Learn XPath syntax, axes, and functions for precise data extraction.

intermediate

lxmlxpathdata-extraction

#18

Extracting Data from HTML Tables

Scrape HTML tables from websites using BeautifulSoup and pandas. Handle complex tables with rowspan, colspan, and nested elements.

beginner

beautifulsouppandasdata-extraction

#19

Scraping Images and Files

Download images, PDFs, and other files while web scraping. Learn URL resolution, streaming downloads, and file organization best practices.

intermediate

beautifulsoupdata-extractionfile-download

#20

Building a Price Monitoring Scraper

Build a complete price monitoring scraper that tracks product prices over time, detects price drops, and sends alerts. A real-world scraping project.

intermediate

beautifulsoupdata-extractionproject

#21

Scraping Multiple Pages Concurrently

Speed up scraping with concurrent requests using threading, multiprocessing, and asyncio. Learn to balance speed with politeness.

intermediate

concurrencyasynciothreadingdata-extraction

#22

Scraping with Python and Regex

Use Python regular expressions to extract emails, phone numbers, prices, URLs, and other patterns from scraped web pages.

intermediate

regexdata-extraction

#23

Handling Different Encodings (UTF-8, ISO-8859)

Handle character encoding issues in web scraping. Detect, convert, and fix UTF-8, ISO-8859, and other encodings to avoid garbled text.

intermediate

encodingdata-extraction

#24

Scraping XML and RSS Feeds

Parse XML documents and RSS/Atom feeds with Python. Extract structured data from feeds using feedparser, lxml, and the xml.etree module.

beginner

xmlrssdata-extraction

#25

Building a News Aggregator Scraper

Build a complete news aggregator that collects articles from multiple sources using RSS feeds and web scraping. Deduplicate, categorize, and store results.

intermediate

beautifulsouprssdata-extractionproject

#26

Scraping with Zyte API

Use Zyte API (formerly Scrapy Cloud) for intelligent web scraping with automatic extraction, browser rendering, and anti-bot bypass.

intermediate

zyteproxy-rotationdata-extraction

#27

Web Scraping Best Practices and Patterns

Master web scraping best practices: respectful scraping, anti-detection, data quality, error recovery, project architecture, and legal considerations.

advanced

best-practicesdata-extractionarchitecture