The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Scrape upvotes from discussion forums

Scraping upvotes from discussion forums can be done through several methods, depending on the platform’s structure, terms of service, and available APIs. Below is a detailed guide for scraping upvotes from popular discussion forums like Reddit, Hacker News, and Disqus-based forums, along with a general strategy for forums that don’t offer public APIs.


1. Reddit

Method: Using Reddit API (Preferred)

Steps:

  1. Register an application on Reddit to get a client ID and secret.

  2. Use Python with praw (Python Reddit API Wrapper).

python
import praw reddit = praw.Reddit(client_id='YOUR_CLIENT_ID', client_secret='YOUR_CLIENT_SECRET', user_agent='YOUR_APP_NAME') submission = reddit.submission(url='https://www.reddit.com/r/Python/comments/example_post') print(f"Title: {submission.title}") print(f"Score (Upvotes - Downvotes): {submission.score}")

Pros:

  • Easy and clean data access

  • Maintains API limits and TOS

Cons:

  • Limited by rate limits

  • Some data may be hidden due to vote fuzzing


2. Hacker News

Method: Using Hacker News API

Hacker News provides a straightforward REST API via Firebase.

Endpoint Example:

bash
https://hacker-news.firebaseio.com/v0/item/<item_id>.json

Python Example:

python
import requests item_id = '123456' # replace with actual ID url = f'https://hacker-news.firebaseio.com/v0/item/{item_id}.json' response = requests.get(url) data = response.json() print(f"Title: {data['title']}") print(f"Upvotes: {data['score']}")

Pros:

  • Easy to use

  • Direct access to upvote counts

Cons:

  • No built-in search or filtering

  • Manual ID discovery required unless you scrape from front pages


3. Disqus-Based Forums

Method: Scraping HTML (No Official API for Upvotes)

Disqus forums usually embed comments in JavaScript, so you’ll need to use a headless browser or parse pre-rendered pages.

Tools:

  • BeautifulSoup + requests-html

  • Selenium (for JS-rendered content)

Example (Selenium):

python
from selenium import webdriver from bs4 import BeautifulSoup driver = webdriver.Chrome() driver.get("https://example.com/article") soup = BeautifulSoup(driver.page_source, 'html.parser') upvotes = soup.find_all("span", class_="vote-count") # Update class accordingly for vote in upvotes: print(vote.text) driver.quit()

Pros:

  • Can access visual upvote counts on comment sections

Cons:

  • Prone to layout changes

  • Requires constant maintenance


4. Generic Forums (No API)

If the forum has no public API and upvote counts are displayed as part of the HTML:

Method: Web Scraping with BeautifulSoup

python
import requests from bs4 import BeautifulSoup url = "https://exampleforum.com/topic/123" headers = {'User-Agent': 'Mozilla/5.0'} response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') votes = soup.find_all("span", class_="upvote-count") # update class based on inspection for vote in votes: print(vote.text.strip())

Tip: Use browser dev tools to inspect the vote count element and its class/ID.


5. Automation Tools & Libraries

  • Scrapy: Great for crawling multiple pages and scraping structured data.

  • Selenium/Playwright: Best for dynamic JavaScript-heavy forums.

  • Puppeteer (Node.js): Chrome automation for advanced scraping.

  • Proxy Rotators: For bypassing IP bans when scraping aggressively.


6. Respect Robots.txt and Terms of Service

Always check /robots.txt of the target site and its TOS. Unauthorized scraping may result in IP bans or legal issues.


Summary Table

PlatformMethodAPI AvailableVote Access MethodBest Tool
RedditReddit APIsubmission.scorepraw
Hacker NewsFirebase APIdata['score']requests
Disqus ForumsHTML ParsingParse DOMSelenium, BS4
Custom ForumsHTML ScrapingParse static/dynamicScrapy, Playwright

By selecting the appropriate method based on the forum type, structure, and your use case, you can effectively extract upvote data for analytics, trend detection, or content monitoring.

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About