Scrape news headlines from multiple sources

To scrape news headlines from multiple sources, you’ll typically need to use tools like Python with libraries such as requests, BeautifulSoup, or Newspaper3k. Here’s a basic Python script example that scrapes headlines from a few popular news websites:

python
import requests
from bs4 import BeautifulSoup

# List of URLs to scrape headlines from
news_sites = {
    "CNN": "https://edition.cnn.com",
    "BBC": "https://www.bbc.com/news",
    "Reuters": "https://www.reuters.com",
}

def get_headlines(url, tag, class_name):
    try:
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        headlines = soup.find_all(tag, class_=class_name)
        return [headline.get_text(strip=True) for headline in headlines[:10]]
    except Exception as e:
        return [f"Error fetching headlines from {url}: {e}"]

# Site-specific scraping logic (can vary depending on site's HTML structure)
headlines_data = {
    "CNN": get_headlines("https://edition.cnn.com", "h3", "cd__headline"),
    "BBC": get_headlines("https://www.bbc.com/news", "h3", "gs-c-promo-heading__title"),
    "Reuters": get_headlines("https://www.reuters.com", "h2", "story-title"),
}

# Display results
for source, headlines in headlines_data.items():
    print(f"n{source} Headlines:")
    for i, headline in enumerate(headlines, 1):
        print(f"{i}. {headline}")

Important Notes:

You might need to update tag/class names based on current HTML structure.
Some sites use JavaScript to load content. In such cases, use tools like Selenium or Playwright.
For large-scale or frequent scraping, consider using APIs (e.g., NewsAPI.org) instead.

Would you like a version using News API or an automated script with scheduled runs (e.g., using cron)?

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

Check Out Our Newest Posts we wrote about

Why your ML system design must support partial retraining

Why your ML pipeline must detect missing or stale features

Why your ML feedback loop must consider label quality

Why your ML deployment plan must include fallback logic