How to Scrape YouTube Video Data

A practical guide to extracting YouTube video data including titles, views, descriptions, and comments using Python and scraping APIs.

YouTube is the world's largest video platform, and its data is valuable for content research, competitive analysis, and trend tracking. Here is how to extract YouTube data effectively.

Method 1: YouTube Data API (Official)

Google's YouTube Data API v3 is the best starting point for legitimate use cases:

import requests

API_KEY = "YOUR_GOOGLE_API_KEY"

# Search for videos
response = requests.get("https://www.googleapis.com/youtube/v3/search", params={
    "part": "snippet",
    "q": "web scraping tutorial",
    "type": "video",
    "maxResults": 10,
    "key": API_KEY
})

for item in response.json()["items"]:
    video_id = item["id"]["videoId"]
    title = item["snippet"]["title"]
    print(f"{title} - https://youtube.com/watch?v={video_id}")

Get detailed video statistics:

def get_video_stats(video_ids):
    response = requests.get("https://www.googleapis.com/youtube/v3/videos", params={
        "part": "statistics,snippet,contentDetails",
        "id": ",".join(video_ids),
        "key": API_KEY
    })

    for video in response.json()["items"]:
        stats = video["statistics"]
        print(f"Title: {video['snippet']['title']}")
        print(f"Views: {stats.get('viewCount', 'N/A')}")
        print(f"Likes: {stats.get('likeCount', 'N/A')}")
        print(f"Comments: {stats.get('commentCount', 'N/A')}")
        print("---")

get_video_stats(["dQw4w9WgXcQ", "jNQXAC9IVRw"])

Method 2: Scraping Beyond API Limits

The YouTube API has daily quota limits (10,000 units/day). For larger-scale data collection, you can supplement with scraping:

import requests
from bs4 import BeautifulSoup
import json

# Use ScraperAPI for YouTube pages
response = requests.get("https://api.scraperapi.com", params={
    "api_key": "YOUR_SCRAPERAPI_KEY",
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "render": "true"
})

soup = BeautifulSoup(response.text, "html.parser")

# YouTube embeds data in script tags
for script in soup.find_all("script"):
    if "ytInitialData" in str(script):
        json_text = str(script.string).split("ytInitialData = ")[1].rstrip(";")
        data = json.loads(json_text)
        # Navigate the nested JSON structure for video data
        break

Method 3: Using yt-dlp for Metadata

The open-source yt-dlp tool can extract metadata without downloading videos:

import subprocess
import json

def get_video_info(url):
    result = subprocess.run(
        ["yt-dlp", "--dump-json", "--no-download", url],
        capture_output=True, text=True
    )
    data = json.loads(result.stdout)
    return {
        "title": data["title"],
        "views": data["view_count"],
        "likes": data.get("like_count"),
        "duration": data["duration"],
        "upload_date": data["upload_date"],
        "channel": data["channel"],
    }

info = get_video_info("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
print(json.dumps(info, indent=2))

Best Practices

Start with the official API, it is free up to quota limits
Cache aggressively, video metadata does not change frequently
Use ScraperAPI or ScrapingAnt when you need to go beyond API limits
Respect YouTube's Terms of Service for commercial use
Consider yt-dlp for metadata extraction without API keys

Verdict

The YouTube Data API should be your first choice for video data. When you hit quota limits or need data not available through the API, supplement with scraping via ScraperAPI or ScrapingAnt. The yt-dlp tool is an excellent open-source alternative for metadata extraction.