Categories We Write About

Scrape news headlines into an RSS feed

To scrape news headlines into an RSS feed, you need to:

  1. Scrape the News Website – Extract headlines and links from a web page.

  2. Format as RSS XML – Convert the scraped data into RSS-compliant XML.

  3. Host the RSS Feed – Serve the XML so it can be subscribed to in RSS readers.

Here’s a step-by-step Python example using requests, BeautifulSoup, and outputting to an RSS XML file:

Prerequisites

Install dependencies:

bash
pip install requests beautifulsoup4

Python Script to Scrape and Generate RSS

python
import requests from bs4 import BeautifulSoup from datetime import datetime def scrape_headlines(url): response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') # Modify selector based on the website structure headlines = soup.select('h2 a') # Example: BBC might use 'h3 a' news_items = [] for h in headlines[:10]: # Limit to top 10 title = h.get_text(strip=True) link = h['href'] if not link.startswith('http'): link = url.rstrip('/') + '/' + link.lstrip('/') news_items.append({'title': title, 'link': link}) return news_items def generate_rss(news_items, feed_title="News Feed", feed_link="http://example.com", feed_description="Latest Headlines"): rss_items = "" for item in news_items: rss_items += f""" <item> <title>{item['title']}</title> <link>{item['link']}</link> <guid>{item['link']}</guid> <pubDate>{datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S +0000')}</pubDate> </item>""" rss_feed = f"""<?xml version="1.0" encoding="UTF-8" ?> <rss version="2.0"> <channel> <title>{feed_title}</title> <link>{feed_link}</link> <description>{feed_description}</description>{rss_items} </channel> </rss>""" return rss_feed # Example usage url = 'https://www.bbc.com/news' # Replace with your source news = scrape_headlines(url) rss = generate_rss(news) # Save to a file or serve via Flask with open('news_feed.xml', 'w', encoding='utf-8') as f: f.write(rss)

Optional: Host the RSS Feed

Use Flask to serve it:

python
from flask import Flask, Response app = Flask(__name__) @app.route('/rss') def rss_feed(): news = scrape_headlines('https://www.bbc.com/news') rss = generate_rss(news) return Response(rss, mimetype='application/rss+xml') if __name__ == '__main__': app.run(port=5000)

You now have a working RSS feed that scrapes and updates dynamically. Want help targeting a specific news site?

Share This Page:

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories We Write About