The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Scrape user reviews from eCommerce sites

Scraping user reviews from eCommerce sites involves programmatically collecting customer feedback data posted on product pages. Here’s a clear guide to scraping such data ethically and efficiently, ensuring compliance with legal and technical constraints:


1. Understand Legal and Ethical Considerations

  • Check the site’s robots.txt: Respect rules set for web crawlers (e.g., /robots.txt on the site).

  • Review Terms of Service: Many eCommerce platforms (like Amazon or eBay) prohibit scraping. Violating these terms can result in IP bans or legal action.

  • Use Public APIs if available: Sites like Best Buy, Walmart, and eBay offer APIs to access product reviews legally.


2. Choose Target Platforms Carefully

Some popular eCommerce sites:

  • Amazon – Strict anti-scraping rules, no public API for reviews.

  • eBay – Offers official APIs.

  • Walmart – Has a developer program.

  • Best Buy – Provides public APIs.

  • Newegg – Easier to scrape, less aggressive anti-bot measures.

  • AliExpress – Some third-party services aggregate reviews.


3. Select a Scraping Tool/Library

For Python users, the most popular tools include:

  • requests – For making HTTP calls.

  • BeautifulSoup – For parsing HTML.

  • Selenium – For scraping dynamic JavaScript-based content.

  • Scrapy – A powerful framework for complex scraping projects.

  • Playwright – Similar to Selenium, but faster and modern.


4. Sample Python Script to Scrape Reviews (Using BeautifulSoup)

python
import requests from bs4 import BeautifulSoup def scrape_reviews(url): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)' } response = requests.get(url, headers=headers) if response.status_code != 200: print("Failed to retrieve page") return soup = BeautifulSoup(response.text, 'html.parser') reviews = [] for review_block in soup.select('.review'): # Update CSS class to match the site reviewer = review_block.select_one('.reviewer-name').get_text(strip=True) rating = review_block.select_one('.rating').get('data-rating') text = review_block.select_one('.review-text').get_text(strip=True) reviews.append({ 'reviewer': reviewer, 'rating': rating, 'text': text }) return reviews # Example usage: url = 'https://www.example.com/product-page-with-reviews' scraped_data = scrape_reviews(url) for review in scraped_data: print(review)

Note: Update the CSS selectors (.review, .reviewer-name, .rating, .review-text) based on the site’s HTML structure.


5. Handling JavaScript-Rendered Content

For sites like Amazon or AliExpress:

  • Use Selenium or Playwright to simulate browser behavior.

  • Wait for elements to load with proper delays or WebDriverWait.

Example using Selenium:

python
from selenium import webdriver from bs4 import BeautifulSoup import time driver = webdriver.Chrome() driver.get('https://www.example.com/product-reviews') time.sleep(5) # Wait for page to load soup = BeautifulSoup(driver.page_source, 'html.parser') reviews = soup.select('.review-text') for r in reviews: print(r.text.strip()) driver.quit()

6. Store and Analyze Reviews

Store the scraped data in:

  • CSV files

  • Databases like MySQL or MongoDB

  • Pandas DataFrames for analysis

Basic sentiment analysis can be done using:

  • TextBlob

  • VADER from NLTK

  • transformers (BERT-based models)


7. Rate Limiting and Anti-Bot Measures

  • Add delays (time.sleep) between requests.

  • Rotate User Agents using libraries like fake_useragent.

  • Use proxy rotation with services like:

    • ScraperAPI

    • Bright Data

    • Smartproxy


8. Alternative: Use Third-party Review Aggregators

If scraping is not viable due to legal or technical limits, use platforms like:

  • ReviewMeta (for Amazon)

  • Trustpilot API

  • Appbot.io or G2.com (for app/product reviews)


9. API-Based Review Retrieval Example (eBay)

python
import requests url = "https://api.ebay.com/buy/browse/v1/item/get_item_by_item_group" headers = { "Authorization": "Bearer YOUR_ACCESS_TOKEN", "Content-Type": "application/json" } params = { "item_group_id": "v1|1234567890|0" } response = requests.get(url, headers=headers, params=params) print(response.json())

10. Final Thoughts

  • Scrape responsibly to avoid blocking or legal issues.

  • Always prefer APIs when available.

  • Keep scrapers updated to handle HTML structure changes.

Let me know if you want a ready-to-run scraper tailored for a specific eCommerce site.

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About