Scrape GitHub user activity

To scrape GitHub user activity, you have several options depending on the level of detail needed and whether you’re building a script, web scraper, or using an API. Here’s a breakdown of the methods and a working example using GitHub’s public API:

✅ Method 1: Use the GitHub REST API (Recommended)

GitHub provides a powerful REST API to access user activity data without scraping HTML. You can get:

Public events (pushes, PRs, issues, etc.)
Repositories created/starred
Contributions
Followers and following
Gists

🔧 Example: Get Public Events from a User

bash
curl https://api.github.com/users/USERNAME/events/public

Replace USERNAME with the GitHub handle. This returns the latest public events (max 300 items).

Python Script Example

python
import requests

def get_github_user_activity(username):
    url = f"https://api.github.com/users/{username}/events/public"
    response = requests.get(url)
    
    if response.status_code == 200:
        data = response.json()
        for event in data:
            print(f"{event['type']} at {event['created_at']}")
            print(f"Repo: {event['repo']['name']}n")
    else:
        print("Failed to fetch data:", response.status_code)

get_github_user_activity("torvalds")  # Replace with any username

🛠️ Method 2: Scrape GitHub HTML Pages (Not Recommended)

If you must scrape HTML (e.g., for contribution graph or pinned repos):

python
import requests
from bs4 import BeautifulSoup

def scrape_user_profile(username):
    url = f"https://github.com/{username}"
    headers = {"User-Agent": "Mozilla/5.0"}
    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        soup = BeautifulSoup(response.text, "html.parser")
        contribution_summary = soup.find("h2", class_="f4 text-normal mb-2")
        if contribution_summary:
            print(contribution_summary.get_text(strip=True))
    else:
        print("Profile not accessible:", response.status_code)

scrape_user_profile("torvalds")

⚠️ GitHub may block or throttle you for scraping HTML. Always respect their robots.txt and rate limits.

🧠 Additional API Endpoints to Explore

Repos Created:
GET /users/:username/repos
Starred Repos:
GET /users/:username/starred
Gists:
GET /users/:username/gists
Followers / Following:
GET /users/:username/followers

Full documentation: https://docs.github.com/en/rest

🔐 Authentication for Higher Rate Limits

Unauthenticated requests have a low rate limit (~60/hr). To increase this:

bash
curl -u USERNAME:TOKEN https://api.github.com/users/USERNAME/events/public

Or in Python:

python
auth = ("your_username", "your_token")
requests.get(url, auth=auth)

Let me know if you want code that stores this data in a database or formats it for reporting.

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

✅ Method 1: Use the GitHub REST API (Recommended)

🔧 Example: Get Public Events from a User

Python Script Example

🛠️ Method 2: Scrape GitHub HTML Pages (Not Recommended)

🧠 Additional API Endpoints to Explore

🔐 Authentication for Higher Rate Limits

Check Out Our Newest Posts we wrote about

Why your ML system design must support partial retraining

Why your ML pipeline must detect missing or stale features

Why your ML feedback loop must consider label quality

Why your ML deployment plan must include fallback logic