The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Scrape public GitHub issues by topic

To scrape public GitHub issues by topic, you can use the GitHub REST API. Below is a Python script using requests to fetch issues related to a specific topic from public repositories.

⚙️ Prerequisites:

  • Python 3.x

  • requests library (pip install requests)

  • A GitHub Personal Access Token (optional but recommended to avoid rate limiting)

✅ Python Script to Scrape GitHub Issues by Topic

python
import requests import time # Optional: Set your GitHub token here to increase rate limits GITHUB_TOKEN = 'your_github_token_here' # Replace with your token or set to None HEADERS = {'Authorization': f'token {GITHUB_TOKEN}'} if GITHUB_TOKEN else {} # Define the topic (GitHub "topics" relate to repositories, not issues) TOPIC = "machine-learning" MAX_REPOS = 10 # Number of repositories to fetch MAX_ISSUES_PER_REPO = 10 # Number of issues to fetch per repository def search_repositories_by_topic(topic, max_repos): url = f"https://api.github.com/search/repositories?q=topic:{topic}&sort=stars&order=desc&per_page={max_repos}" response = requests.get(url, headers=HEADERS) response.raise_for_status() return response.json().get("items", []) def get_issues(repo_full_name, max_issues): issues_url = f"https://api.github.com/repos/{repo_full_name}/issues?state=open&per_page={max_issues}" response = requests.get(issues_url, headers=HEADERS) response.raise_for_status() return response.json() def scrape_github_issues(topic): repos = search_repositories_by_topic(topic, MAX_REPOS) all_issues = [] for repo in repos: repo_name = repo['full_name'] print(f"Fetching issues for: {repo_name}") try: issues = get_issues(repo_name, MAX_ISSUES_PER_REPO) for issue in issues: if 'pull_request' not in issue: # Exclude PRs all_issues.append({ "repository": repo_name, "issue_title": issue["title"], "issue_url": issue["html_url"], "created_at": issue["created_at"], "user": issue["user"]["login"] }) time.sleep(1) # To avoid hitting rate limits except Exception as e: print(f"Failed to fetch issues for {repo_name}: {e}") return all_issues # Run the scraper if __name__ == "__main__": topic = "machine-learning" # You can change this to any topic issues = scrape_github_issues(topic) for i, issue in enumerate(issues, start=1): print(f"{i}. [{issue['issue_title']}]({issue['issue_url']}) - {issue['repository']} (by {issue['user']})")

🔍 Notes:

  • GitHub’s search API only supports repository-level topics. Issues themselves don’t have topics.

  • The script fetches repositories tagged with the specified topic, then scrapes open issues from each.

  • You can customize MAX_REPOS and MAX_ISSUES_PER_REPO as needed.

Let me know if you want this data exported to CSV or filtered by keywords within the issues.

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About