The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Scrape news articles on a specific topic

To scrape news articles on a specific topic, here’s a step-by-step guide using Python with the requests, BeautifulSoup, and optionally newspaper3k libraries. Below is a basic script for scraping news articles from Google News or directly from known sites.


Option 1: Scrape Google News RSS Feed

python
import feedparser def scrape_google_news(topic, max_results=10): query = topic.replace(' ', '+') rss_url = f'https://news.google.com/rss/search?q={query}&hl=en-US&gl=US&ceid=US:en' feed = feedparser.parse(rss_url) articles = [] for entry in feed.entries[:max_results]: articles.append({ 'title': entry.title, 'link': entry.link, 'published': entry.published, 'summary': entry.summary }) return articles # Example usage news = scrape_google_news("Artificial Intelligence") for article in news: print(f"{article['title']} - {article['link']}")

Option 2: Use newspaper3k to Extract Full Articles

python
from newspaper import Article def extract_full_article(url): article = Article(url) article.download() article.parse() return { 'title': article.title, 'text': article.text } # Example: extract full content of one URL url = 'https://example.com/article-about-ai' content = extract_full_article(url) print(content['title']) print(content['text'][:500]) # Print first 500 characters

Option 3: Scrape Specific Website with requests + BeautifulSoup

python
import requests from bs4 import BeautifulSoup def scrape_website(url, article_selector): response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') articles = soup.select(article_selector) return [article.get_text(strip=True) for article in articles] # Example usage for scraping headlines headlines = scrape_website("https://www.bbc.com/news", "h3") for headline in headlines[:10]: print(headline)

Notes:

  • Use user-agent headers if you’re blocked or throttled.

  • For robust scraping, respect each site’s robots.txt.

  • Use selenium if content is loaded dynamically with JavaScript.

  • You can automate daily scraping and storage with cron jobs or apscheduler.

Let me know your specific topic and preferred source, and I can tailor the code to your needs.

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About