Categories We Write About

Archive GitHub activity logs

Archiving GitHub activity logs involves saving your history of actions such as commits, pull requests, issues, comments, and other interactions on repositories. This is useful for personal records, audits, or project documentation. Here’s a detailed guide on how to archive your GitHub activity logs effectively:

1. Using GitHub API to Export Activity Logs

GitHub provides a REST API and GraphQL API that lets you fetch your user activity data programmatically.

  • Get your events: The endpoint /users/{username}/events returns public events performed by the user.

  • Get repository events: /repos/{owner}/{repo}/events returns events for a specific repository.

Example: Fetch user public events using curl

bash
curl -H "Authorization: token YOUR_GITHUB_TOKEN" "https://api.github.com/users/yourusername/events"

This returns a JSON array of recent public events such as PushEvent, PullRequestEvent, IssueCommentEvent, etc.

You can paginate through these results with the ?page=2 and so on, to retrieve more events.


2. Using GitHub Archive Project

GitHub Archive (https://www.githubarchive.org/) records and archives public GitHub activity data continuously, providing datasets you can query via BigQuery or download.

  • Useful for large scale or historical analysis.

  • You can extract activity logs over long periods, but it covers public activity only.


3. Exporting Specific Logs Manually

  • Commits: Clone the repository and use git log to export commits locally.

  • Pull Requests & Issues: Use GitHub API or tools like gh CLI to export PRs and issues data.

Example using gh CLI:

bash
gh pr list --repo owner/repo --limit 100 --json number,title,author,createdAt,state > prs.json

4. Automating Activity Log Archival

Set up a script to regularly fetch and save activity logs via API calls.

Sample Python script to fetch user events:

python
import requests import json token = 'YOUR_GITHUB_TOKEN' username = 'yourusername' headers = {'Authorization': f'token {token}'} events = [] page = 1 while True: url = f'https://api.github.com/users/{username}/events?page={page}&per_page=100' response = requests.get(url, headers=headers) data = response.json() if not data: break events.extend(data) page += 1 with open('github_activity.json', 'w') as f: json.dump(events, f, indent=2)

Run this script periodically (e.g., daily, weekly) using a scheduler like cron.


5. Using Third-party Tools

  • GHTorrent: A project that continuously collects data from GitHub and offers it for research.

  • GitHub Archive Extractor tools: Various open-source tools to download and parse GitHub activity.


6. Considerations

  • Rate Limits: GitHub API has rate limits (usually 5000 requests per hour with authentication).

  • Private Repositories: API endpoints may not expose private activity depending on permissions.

  • Data Size: Activity logs can get large; consider compressing or filtering by date/event type.


Archiving your GitHub activity logs enables better record-keeping and analysis of your development history. By leveraging the GitHub API, third-party datasets, or manual export tools, you can create an automated and comprehensive archive tailored to your needs.

Share This Page:

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories We Write About