Categories We Write About

Scrape book titles from Goodreads shelves

Written by

To scrape book titles from Goodreads shelves, you can use Python with libraries like requests, BeautifulSoup, and optionally pandas for data organization. Note that Goodreads may block scraping activity, so always comply with their robots.txt and API terms.

Here’s a basic example to scrape book titles from a public Goodreads shelf:

python
import requests
from bs4 import BeautifulSoup

def scrape_goodreads_shelf(shelf_url, max_pages=1):
    headers = {
        "User-Agent": "Mozilla/5.0"
    }
    
    book_titles = []

    for page in range(1, max_pages + 1):
        url = f"{shelf_url}?page={page}"
        response = requests.get(url, headers=headers)
        if response.status_code != 200:
            print(f"Failed to retrieve page {page}")
            continue
        
        soup = BeautifulSoup(response.text, 'html.parser')
        books = soup.select('a.bookTitle span')
        
        for book in books:
            title = book.get_text(strip=True)
            if title:
                book_titles.append(title)
    
    return book_titles

# Example usage:
shelf_url = 'https://www.goodreads.com/shelf/show/fantasy'  # Change to any public shelf
titles = scrape_goodreads_shelf(shelf_url, max_pages=3)

for i, title in enumerate(titles, 1):
    print(f"{i}. {title}")

Notes:

Replace the shelf_url with the desired Goodreads shelf URL.
Adjust max_pages to scrape more pages.
Avoid too many requests in a short time to prevent IP blocking. Use time.sleep() between requests if needed.

Let me know if you’d like a version that saves to CSV or handles more advanced filtering.

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

Check Out Our Newest Posts we wrote about

Categories We Write About

Scrape book titles from Goodreads shelves

Notes:

Check Out Our Newest Posts we wrote about

Why your ML system design must support partial retraining

Why your ML pipeline must detect missing or stale features

Why your ML feedback loop must consider label quality

Why your ML deployment plan must include fallback logic