Scrape GitHub README badges

Scraping GitHub README badges involves extracting the badge images (typically displayed using Markdown syntax) from the README.md file of a GitHub repository. These badges are commonly used to display project status, test coverage, license, dependencies, etc.

Here’s a step-by-step Python script using requests and BeautifulSoup to scrape README badges from a public GitHub repository:

python
import requests
from bs4 import BeautifulSoup
import re

def extract_readme_badges(repo_url):
    # Convert GitHub URL to raw README.md URL
    if repo_url.endswith('/'):
        repo_url = repo_url[:-1]
    user_repo = '/'.join(repo_url.split('/')[-2:])
    raw_url = f"https://raw.githubusercontent.com/{user_repo}/master/README.md"

    # Try alternative default branch if needed
    response = requests.get(raw_url)
    if response.status_code != 200:
        raw_url = f"https://raw.githubusercontent.com/{user_repo}/main/README.md"
        response = requests.get(raw_url)

    if response.status_code != 200:
        print("Could not retrieve README.md")
        return []

    readme_content = response.text

    # Regex pattern to match Markdown image syntax: ![alt](url)
    badge_pattern = r'![.*?]((.*?))'
    badges = re.findall(badge_pattern, readme_content)

    # Filter URLs that look like badge images (from shields.io, etc.)
    badge_urls = [url for url in badges if 'badge' in url or 'shields.io' in url or 'img.shields.io' in url]

    return badge_urls

# Example usage
repo = "https://github.com/facebook/react"
badges = extract_readme_badges(repo)
for badge in badges:
    print(badge)

How it Works:

Extracts the README.md from the master or main branch.
Parses Markdown image links using regex.
Filters likely badge URLs (commonly from shields.io, badgen.net, etc.).

Output Example:

arduino
https://img.shields.io/badge/build-passing-brightgreen
https://img.shields.io/npm/v/react.svg

You can extend this to:

Parse badges inside HTML (<img> tags).
Identify badge types using URL patterns.
Display badge alt texts and links.

Let me know if you need this in another language or want to scan multiple repos.

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

How it Works:

Output Example:

Check Out Our Newest Posts we wrote about

Why your ML system design must support partial retraining

Why your ML pipeline must detect missing or stale features

Why your ML feedback loop must consider label quality

Why your ML deployment plan must include fallback logic