The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Scrape GitHub user activity

To scrape GitHub user activity, you have several options depending on the level of detail needed and whether you’re building a script, web scraper, or using an API. Here’s a breakdown of the methods and a working example using GitHub’s public API:


Method 1: Use the GitHub REST API (Recommended)

GitHub provides a powerful REST API to access user activity data without scraping HTML. You can get:

  • Public events (pushes, PRs, issues, etc.)

  • Repositories created/starred

  • Contributions

  • Followers and following

  • Gists

🔧 Example: Get Public Events from a User

bash
curl https://api.github.com/users/USERNAME/events/public

Replace USERNAME with the GitHub handle. This returns the latest public events (max 300 items).

Python Script Example

python
import requests def get_github_user_activity(username): url = f"https://api.github.com/users/{username}/events/public" response = requests.get(url) if response.status_code == 200: data = response.json() for event in data: print(f"{event['type']} at {event['created_at']}") print(f"Repo: {event['repo']['name']}n") else: print("Failed to fetch data:", response.status_code) get_github_user_activity("torvalds") # Replace with any username

🛠️ Method 2: Scrape GitHub HTML Pages (Not Recommended)

If you must scrape HTML (e.g., for contribution graph or pinned repos):

python
import requests from bs4 import BeautifulSoup def scrape_user_profile(username): url = f"https://github.com/{username}" headers = {"User-Agent": "Mozilla/5.0"} response = requests.get(url, headers=headers) if response.status_code == 200: soup = BeautifulSoup(response.text, "html.parser") contribution_summary = soup.find("h2", class_="f4 text-normal mb-2") if contribution_summary: print(contribution_summary.get_text(strip=True)) else: print("Profile not accessible:", response.status_code) scrape_user_profile("torvalds")

⚠️ GitHub may block or throttle you for scraping HTML. Always respect their robots.txt and rate limits.


🧠 Additional API Endpoints to Explore

  • Repos Created:
    GET /users/:username/repos

  • Starred Repos:
    GET /users/:username/starred

  • Gists:
    GET /users/:username/gists

  • Followers / Following:
    GET /users/:username/followers

Full documentation: https://docs.github.com/en/rest


🔐 Authentication for Higher Rate Limits

Unauthenticated requests have a low rate limit (~60/hr). To increase this:

bash
curl -u USERNAME:TOKEN https://api.github.com/users/USERNAME/events/public

Or in Python:

python
auth = ("your_username", "your_token") requests.get(url, auth=auth)

Let me know if you want code that stores this data in a database or formats it for reporting.

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About