The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing

Categories We Write About

Scrape Goodreads book reviews

Written by

in

Computer Science

To scrape Goodreads book reviews, you must be aware that Goodreads has terms of service that prohibit unauthorized scraping. Instead of scraping, consider using the Goodreads API (though it has limited access) or other ethical alternatives.

However, if you’re looking to programmatically extract book reviews for research or personal use and you understand the legal and ethical implications, here’s a technical overview of how it can be done:

Method 1: Using Goodreads API (If You Have Access)

Goodreads previously offered a public API (deprecated in 2020), but some users still have keys. If you have one:

python
import requests

api_key = 'YOUR_API_KEY'
book_id = 'BOOK_ID'  # You can get this from the URL

url = f"https://www.goodreads.com/book/show/{book_id}.xml?key={api_key}"
response = requests.get(url)

print(response.text)  # XML format; use xml.etree.ElementTree to parse

Note: The new API is now private under Amazon, and access may require approval.

Method 2: Web Scraping with Python (Use Ethically & Legally)

Tools:

requests
BeautifulSoup
Selenium (for dynamic pages)
fake_useragent and time.sleep to avoid blocks

Sample Code:

python
import requests
from bs4 import BeautifulSoup
import time
import random

headers = {
    'User-Agent': 'Mozilla/5.0'
}

def get_reviews(book_url, max_pages=5):
    reviews = []
    for page in range(1, max_pages + 1):
        url = f"{book_url}?page={page}"
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.text, 'html.parser')

        review_blocks = soup.select('div.reviewText span.readable span')
        for block in review_blocks:
            text = block.get_text(strip=True)
            if text:
                reviews.append(text)

        time.sleep(random.uniform(1, 3))  # Be polite to the server
    return reviews

book_url = 'https://www.goodreads.com/book/show/4671.The_Great_Gatsby'
reviews = get_reviews(book_url)

for i, review in enumerate(reviews[:10], 1):
    print(f"Review {i}: {review}")

Tips to Stay Within Legal/Ethical Boundaries

Use official APIs whenever possible.
Do not overload the website with requests.
Cache data to avoid repeated scraping.
Include attribution and use reviews only for permitted purposes (e.g., educational use, personal research).

Alternatives to Goodreads Scraping

LibraryThing API
Open Library API
Google Books API
Book-related subreddits (e.g., r/books) via Reddit API

Would you like a Python script that stores the reviews in a CSV or database?

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

Check Out Our Newest Posts we wrote about

Categories We Write About