The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Scrape news headlines from multiple sources

To scrape news headlines from multiple sources, you’ll typically need to use tools like Python with libraries such as requests, BeautifulSoup, or Newspaper3k. Here’s a basic Python script example that scrapes headlines from a few popular news websites:

python
import requests from bs4 import BeautifulSoup # List of URLs to scrape headlines from news_sites = { "CNN": "https://edition.cnn.com", "BBC": "https://www.bbc.com/news", "Reuters": "https://www.reuters.com", } def get_headlines(url, tag, class_name): try: response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') headlines = soup.find_all(tag, class_=class_name) return [headline.get_text(strip=True) for headline in headlines[:10]] except Exception as e: return [f"Error fetching headlines from {url}: {e}"] # Site-specific scraping logic (can vary depending on site's HTML structure) headlines_data = { "CNN": get_headlines("https://edition.cnn.com", "h3", "cd__headline"), "BBC": get_headlines("https://www.bbc.com/news", "h3", "gs-c-promo-heading__title"), "Reuters": get_headlines("https://www.reuters.com", "h2", "story-title"), } # Display results for source, headlines in headlines_data.items(): print(f"n{source} Headlines:") for i, headline in enumerate(headlines, 1): print(f"{i}. {headline}")

Important Notes:

  • You might need to update tag/class names based on current HTML structure.

  • Some sites use JavaScript to load content. In such cases, use tools like Selenium or Playwright.

  • For large-scale or frequent scraping, consider using APIs (e.g., NewsAPI.org) instead.

Would you like a version using News API or an automated script with scheduled runs (e.g., using cron)?

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About