Scraping GitHub README badges involves extracting the badge images (typically displayed using Markdown syntax) from the README.md file of a GitHub repository. These badges are commonly used to display project status, test coverage, license, dependencies, etc.
Here’s a step-by-step Python script using requests and BeautifulSoup to scrape README badges from a public GitHub repository:
How it Works:
-
Extracts the
README.mdfrom themasterormainbranch. -
Parses Markdown image links using regex.
-
Filters likely badge URLs (commonly from
shields.io,badgen.net, etc.).
Output Example:
You can extend this to:
-
Parse badges inside HTML (
<img>tags). -
Identify badge types using URL patterns.
-
Display badge alt texts and links.
Let me know if you need this in another language or want to scan multiple repos.