To scrape GitHub user activity, you have several options depending on the level of detail needed and whether you’re building a script, web scraper, or using an API. Here’s a breakdown of the methods and a working example using GitHub’s public API:
✅ Method 1: Use the GitHub REST API (Recommended)
GitHub provides a powerful REST API to access user activity data without scraping HTML. You can get:
-
Public events (pushes, PRs, issues, etc.)
-
Repositories created/starred
-
Contributions
-
Followers and following
-
Gists
🔧 Example: Get Public Events from a User
Replace USERNAME with the GitHub handle. This returns the latest public events (max 300 items).
Python Script Example
🛠️ Method 2: Scrape GitHub HTML Pages (Not Recommended)
If you must scrape HTML (e.g., for contribution graph or pinned repos):
⚠️ GitHub may block or throttle you for scraping HTML. Always respect their robots.txt and rate limits.
🧠 Additional API Endpoints to Explore
-
Repos Created:
GET /users/:username/repos -
Starred Repos:
GET /users/:username/starred -
Gists:
GET /users/:username/gists -
Followers / Following:
GET /users/:username/followers
Full documentation: https://docs.github.com/en/rest
🔐 Authentication for Higher Rate Limits
Unauthenticated requests have a low rate limit (~60/hr). To increase this:
Or in Python:
Let me know if you want code that stores this data in a database or formats it for reporting.