Scrape top Reddit posts by topic

Scraping Reddit directly for top posts by topic is possible through the Reddit API or by using third-party tools and Python libraries like PRAW (Python Reddit API Wrapper). Here’s a step-by-step guide using PRAW:

Step 1: Create a Reddit App

Go to https://www.reddit.com/prefs/apps
Click “Create App” or “Create Another App”
Fill out the form:
- name: your app’s name
- type: script
- description: optional
- redirect uri: http://localhost:8080
Note down the client ID (under the name) and client secret

Step 2: Install PRAW

bash
pip install praw

Step 3: Python Script to Scrape Top Posts by Topic (Subreddit)

python
import praw

# Initialize Reddit instance
reddit = praw.Reddit(
    client_id='YOUR_CLIENT_ID',
    client_secret='YOUR_CLIENT_SECRET',
    user_agent='your_user_agent'
)

def get_top_posts(subreddit_name, limit=10, time_filter='week'):
    subreddit = reddit.subreddit(subreddit_name)
    top_posts = subreddit.top(limit=limit, time_filter=time_filter)
    
    posts_data = []
    for post in top_posts:
        posts_data.append({
            'title': post.title,
            'score': post.score,
            'url': post.url,
            'comments': post.num_comments,
            'permalink': f"https://www.reddit.com{post.permalink}"
        })
    return posts_data

# Example usage
topic = 'technology'  # or 'AskReddit', 'science', etc.
top_posts = get_top_posts(topic, limit=5)

for i, post in enumerate(top_posts, 1):
    print(f"{i}. {post['title']} ({post['score']} points)")
    print(f"   {post['permalink']}")
    print()

Notes:

You can change the subreddit_name to match the topic you’re interested in.
time_filter options: 'hour', 'day', 'week', 'month', 'year', 'all'.

Optional: Export to CSV or JSON

You can extend the script to write data to a file:

python
import json

with open('top_posts.json', 'w') as f:
    json.dump(top_posts, f, indent=2)

Or CSV:

python
import csv

with open('top_posts.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.DictWriter(f, fieldnames=top_posts[0].keys())
    writer.writeheader()
    writer.writerows(top_posts)

Let me know if you need a version without API access (via web scraping), but keep in mind Reddit’s content scraping policies discourage unauthorized scraping.

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

Step 1: Create a Reddit App

Step 2: Install PRAW

Step 3: Python Script to Scrape Top Posts by Topic (Subreddit)

Notes:

Optional: Export to CSV or JSON

Check Out Our Newest Posts we wrote about

Why your ML system design must support partial retraining

Why your ML pipeline must detect missing or stale features

Why your ML feedback loop must consider label quality

Why your ML deployment plan must include fallback logic