Guide
How to Scrape YouTube Video Data
A practical guide to extracting YouTube video data including titles, views, descriptions, and comments using Python and scraping APIs.
YouTube is the world's largest video platform, and its data is valuable for content research, competitive analysis, and trend tracking. Here is how to extract YouTube data effectively.
Method 1: YouTube Data API (Official)
Google's YouTube Data API v3 is the best starting point for legitimate use cases:
import requests
API_KEY = "YOUR_GOOGLE_API_KEY"
# Search for videos
response = requests.get("https://www.googleapis.com/youtube/v3/search", params={
"part": "snippet",
"q": "web scraping tutorial",
"type": "video",
"maxResults": 10,
"key": API_KEY
})
for item in response.json()["items"]:
video_id = item["id"]["videoId"]
title = item["snippet"]["title"]
print(f"{title} - https://youtube.com/watch?v={video_id}")
Get detailed video statistics:
def get_video_stats(video_ids):
response = requests.get("https://www.googleapis.com/youtube/v3/videos", params={
"part": "statistics,snippet,contentDetails",
"id": ",".join(video_ids),
"key": API_KEY
})
for video in response.json()["items"]:
stats = video["statistics"]
print(f"Title: {video['snippet']['title']}")
print(f"Views: {stats.get('viewCount', 'N/A')}")
print(f"Likes: {stats.get('likeCount', 'N/A')}")
print(f"Comments: {stats.get('commentCount', 'N/A')}")
print("---")
get_video_stats(["dQw4w9WgXcQ", "jNQXAC9IVRw"])
Method 2: Scraping Beyond API Limits
The YouTube API has daily quota limits (10,000 units/day). For larger-scale data collection, you can supplement with scraping:
import requests
from bs4 import BeautifulSoup
import json
# Use ScraperAPI for YouTube pages
response = requests.get("https://api.scraperapi.com", params={
"api_key": "YOUR_SCRAPERAPI_KEY",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"render": "true"
})
soup = BeautifulSoup(response.text, "html.parser")
# YouTube embeds data in script tags
for script in soup.find_all("script"):
if "ytInitialData" in str(script):
json_text = str(script.string).split("ytInitialData = ")[1].rstrip(";")
data = json.loads(json_text)
# Navigate the nested JSON structure for video data
break
Method 3: Using yt-dlp for Metadata
The open-source yt-dlp tool can extract metadata without downloading videos:
import subprocess
import json
def get_video_info(url):
result = subprocess.run(
["yt-dlp", "--dump-json", "--no-download", url],
capture_output=True, text=True
)
data = json.loads(result.stdout)
return {
"title": data["title"],
"views": data["view_count"],
"likes": data.get("like_count"),
"duration": data["duration"],
"upload_date": data["upload_date"],
"channel": data["channel"],
}
info = get_video_info("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
print(json.dumps(info, indent=2))
Best Practices
- Start with the official API, it is free up to quota limits
- Cache aggressively, video metadata does not change frequently
- Use ScraperAPI or ScrapingAnt when you need to go beyond API limits
- Respect YouTube's Terms of Service for commercial use
- Consider yt-dlp for metadata extraction without API keys
Verdict
The YouTube Data API should be your first choice for video data. When you hit quota limits or need data not available through the API, supplement with scraping via ScraperAPI or ScrapingAnt. The yt-dlp tool is an excellent open-source alternative for metadata extraction.