The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Scrape Goodreads book reviews

To scrape Goodreads book reviews, you must be aware that Goodreads has terms of service that prohibit unauthorized scraping. Instead of scraping, consider using the Goodreads API (though it has limited access) or other ethical alternatives.

However, if you’re looking to programmatically extract book reviews for research or personal use and you understand the legal and ethical implications, here’s a technical overview of how it can be done:


Method 1: Using Goodreads API (If You Have Access)

Goodreads previously offered a public API (deprecated in 2020), but some users still have keys. If you have one:

python
import requests api_key = 'YOUR_API_KEY' book_id = 'BOOK_ID' # You can get this from the URL url = f"https://www.goodreads.com/book/show/{book_id}.xml?key={api_key}" response = requests.get(url) print(response.text) # XML format; use xml.etree.ElementTree to parse

Note: The new API is now private under Amazon, and access may require approval.


Method 2: Web Scraping with Python (Use Ethically & Legally)

Tools:

  • requests

  • BeautifulSoup

  • Selenium (for dynamic pages)

  • fake_useragent and time.sleep to avoid blocks

Sample Code:

python
import requests from bs4 import BeautifulSoup import time import random headers = { 'User-Agent': 'Mozilla/5.0' } def get_reviews(book_url, max_pages=5): reviews = [] for page in range(1, max_pages + 1): url = f"{book_url}?page={page}" response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') review_blocks = soup.select('div.reviewText span.readable span') for block in review_blocks: text = block.get_text(strip=True) if text: reviews.append(text) time.sleep(random.uniform(1, 3)) # Be polite to the server return reviews book_url = 'https://www.goodreads.com/book/show/4671.The_Great_Gatsby' reviews = get_reviews(book_url) for i, review in enumerate(reviews[:10], 1): print(f"Review {i}: {review}")

Tips to Stay Within Legal/Ethical Boundaries

  • Use official APIs whenever possible.

  • Do not overload the website with requests.

  • Cache data to avoid repeated scraping.

  • Include attribution and use reviews only for permitted purposes (e.g., educational use, personal research).


Alternatives to Goodreads Scraping

  1. LibraryThing API

  2. Open Library API

  3. Google Books API

  4. Book-related subreddits (e.g., r/books) via Reddit API

Would you like a Python script that stores the reviews in a CSV or database?

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About