Python Scraping
Web scraping with Python using Requests, BeautifulSoup, Scrapy, and HTTPX
27 articles
#1
Getting Started with Web Scraping in Python
Learn the basics of web scraping with Python using the Requests library and BeautifulSoup. Your first scraper in 10 minutes.
#2
CSS Selectors for Web Scraping
Master CSS selectors to extract exactly the data you need. Classes, IDs, attributes, and advanced selector patterns.
#3
Handling Pagination in Web Scraping
Learn how to scrape paginated websites by following next-page links, handling page numbers, and collecting data across multiple pages.
#4
Scraping with Scrapy Framework - Getting Started
Get started with Scrapy, the most powerful Python web scraping framework. Install Scrapy, create a project, and run your first spider.
#5
Scrapy Spiders and Items
Define structured data with Scrapy Items and build advanced spiders with CrawlSpider, SitemapSpider, and custom parsing logic.
#6
Scrapy Middleware and Pipelines
Customize Scrapy's request/response flow with middleware and process scraped data using item pipelines for validation, cleaning, and storage.
#7
Async Scraping with HTTPX and asyncio
Speed up your scrapers with async Python. Use HTTPX and asyncio to make concurrent HTTP requests and scrape pages in parallel.
#8
Scraping with aiohttp
Use aiohttp for high-performance async web scraping in Python. Learn session management, connection pooling, and concurrent page fetching.
#9
Storing Scraped Data in CSV and JSON
Save your scraped data to CSV and JSON files using Python's built-in modules. Learn best practices for data export, encoding, and file organization.
#10
Storing Scraped Data in Databases (SQLite, PostgreSQL)
Store scraped data in SQLite and PostgreSQL databases. Learn schema design, upserts, and best practices for persistent scraping data storage.
#11
Error Handling and Retries in Scrapers
Build robust scrapers with proper error handling, automatic retries, exponential backoff, and graceful failure recovery.
#12
Scraping Behind Login/Authentication
Scrape websites that require login. Handle form-based authentication, session tokens, and authenticated API requests with Python.
#13
Handling Cookies and Sessions
Master cookie management and persistent sessions in Python web scraping. Handle session cookies, cookie jars, and cross-request state.
#14
Scraping Dynamic Content Without a Browser
Extract data from JavaScript-heavy websites without using a browser. Discover hidden APIs, intercept XHR requests, and parse JSON responses.
#15
Using ScraperAPI with Python
Integrate ScraperAPI into your Python scrapers for automatic proxy rotation, CAPTCHA solving, and JavaScript rendering. Complete guide with code examples.
#16
Using ScrapingAnt with Python
Integrate ScrapingAnt into your Python scrapers for headless browser rendering, proxy rotation, and anti-bot bypass. Complete tutorial with examples.
#17
Web Scraping with lxml and XPath
Use lxml and XPath expressions for fast, powerful HTML parsing. Learn XPath syntax, axes, and functions for precise data extraction.
#18
Extracting Data from HTML Tables
Scrape HTML tables from websites using BeautifulSoup and pandas. Handle complex tables with rowspan, colspan, and nested elements.
#19
Scraping Images and Files
Download images, PDFs, and other files while web scraping. Learn URL resolution, streaming downloads, and file organization best practices.
#20
Building a Price Monitoring Scraper
Build a complete price monitoring scraper that tracks product prices over time, detects price drops, and sends alerts. A real-world scraping project.
#21
Scraping Multiple Pages Concurrently
Speed up scraping with concurrent requests using threading, multiprocessing, and asyncio. Learn to balance speed with politeness.
#22
Scraping with Python and Regex
Use Python regular expressions to extract emails, phone numbers, prices, URLs, and other patterns from scraped web pages.
#23
Handling Different Encodings (UTF-8, ISO-8859)
Handle character encoding issues in web scraping. Detect, convert, and fix UTF-8, ISO-8859, and other encodings to avoid garbled text.
#24
Scraping XML and RSS Feeds
Parse XML documents and RSS/Atom feeds with Python. Extract structured data from feeds using feedparser, lxml, and the xml.etree module.
#25
Building a News Aggregator Scraper
Build a complete news aggregator that collects articles from multiple sources using RSS feeds and web scraping. Deduplicate, categorize, and store results.
#26
Scraping with Zyte API
Use Zyte API (formerly Scrapy Cloud) for intelligent web scraping with automatic extraction, browser rendering, and anti-bot bypass.
#27
Web Scraping Best Practices and Patterns
Master web scraping best practices: respectful scraping, anti-detection, data quality, error recovery, project architecture, and legal considerations.