Categories We Write About

Scrape movie runtimes and genres

Written by

To scrape movie runtimes and genres from websites, you’d typically use Python with libraries like requests, BeautifulSoup, or Selenium (for JavaScript-rendered pages). Here’s a basic example using BeautifulSoup to scrape from IMDb:

Example: Scraping IMDb Movie Runtimes and Genres

python
import requests
from bs4 import BeautifulSoup

# Example IMDb movie URL (you can iterate over a list of URLs)
url = "https://www.imdb.com/title/tt1375666/"  # Inception

headers = {
    "Accept-Language": "en-US,en;q=0.5",
    "User-Agent": "Mozilla/5.0"
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

# Extract runtime
runtime = soup.find('li', attrs={'data-testid': 'title-techspec_runtime'})
if runtime:
    runtime_text = runtime.find('div').get_text(strip=True)
else:
    runtime_text = 'N/A'

# Extract genres
genre_section = soup.find_all('a', attrs={'href': lambda x: x and '/search/title/?genres=' in x})
genres = [genre.get_text(strip=True) for genre in genre_section]

print("Runtime:", runtime_text)
print("Genres:", genres)

Notes:

IMDb dynamically changes its layout, so inspect elements before writing scrapers.
For many movies, loop through URLs or use IMDb IDs.
Consider using IMDbPY (a dedicated Python package) for more structured access.
Avoid overloading IMDb’s servers: use time delays and respect robots.txt.

Let me know if you want a bulk scraper, headless browser integration (Selenium), or target a different site like TMDb or Rotten Tomatoes.

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

Check Out Our Newest Posts we wrote about

Categories We Write About

Scrape movie runtimes and genres

Example: Scraping IMDb Movie Runtimes and Genres

Notes:

Check Out Our Newest Posts we wrote about

Why your ML system design must support partial retraining

Why your ML pipeline must detect missing or stale features

Why your ML feedback loop must consider label quality

Why your ML deployment plan must include fallback logic