The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing

Categories We Write About

Create a web scraper for real-time headlines

Written by

in

Computer Science

Here’s a Python web scraper using requests and BeautifulSoup to get real-time headlines from a news website (for example, CNN). You can customize the URL and HTML selectors to fit the site you want.

python
import requests
from bs4 import BeautifulSoup

def get_real_time_headlines(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' +
                      '(KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'
    }
    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        print(f"Failed to retrieve page: Status code {response.status_code}")
        return []

    soup = BeautifulSoup(response.text, 'html.parser')

    # Example for CNN - headlines are in <h3> tags with class 'cd__headline'
    headlines_html = soup.find_all('h3', class_='cd__headline')
    headlines = [headline.get_text(strip=True) for headline in headlines_html]

    return headlines

if __name__ == "__main__":
    url = "https://edition.cnn.com/world"  # Change to your preferred news site
    headlines = get_real_time_headlines(url)
    for i, headline in enumerate(headlines, start=1):
        print(f"{i}. {headline}")

Notes:

Change url to the news website you want to scrape.
Adjust the tag and class in find_all based on the target site’s HTML structure.
Use User-Agent in headers to mimic a browser and avoid blocking.
This is a simple scraper — for more robust real-time data, consider APIs or headless browsers.

If you want me to write a scraper for a specific site, just let me know!

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

Check Out Our Newest Posts we wrote about

Categories We Write About