Track daily temperature using web scraping

Tracking daily temperature using web scraping is a practical solution for individuals or businesses that need localized and regularly updated weather data. With access to online weather sources, a simple scraper can automate temperature data collection, storing it for analysis or display. Below is a complete guide on how to track daily temperature using web scraping, including tools, code examples, and precautions.

1. Understanding Web Scraping for Weather Data

Web scraping involves programmatically extracting data from websites. Weather forecasting sites often present temperature data in structured formats such as HTML tables, tags with specific classes/IDs, or even embedded in JSON. With the right tools, this data can be fetched, parsed, and stored for continuous monitoring.

Popular websites used for weather data scraping include:

Weather.com
AccuWeather
National Weather Service (weather.gov)
OpenWeatherMap (via API)
Time and Date (timeanddate.com/weather)

Ensure you read and respect the site’s terms of service before scraping.

2. Tools and Technologies Required

To implement a temperature-tracking system via web scraping, the following tools are recommended:

Python (programming language)
BeautifulSoup (HTML parsing library)
Requests (for making HTTP requests)
Pandas (for storing and processing data)
Schedule or Cron (for automation)
SQLite/MySQL/CSV (for storing scraped data)

Optional:

Selenium (for sites with JavaScript-rendered content)
LXML (faster HTML parser)

3. Sample Python Script for Scraping Temperature

Here’s a basic example scraping current temperature from timeanddate.com:

python
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import csv

def get_temperature(city="new-york", country="usa"):
    url = f"https://www.timeanddate.com/weather/{country}/{city}"
    headers = {'User-Agent': 'Mozilla/5.0'}
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, "html.parser")
    
    temperature_element = soup.find("div", class_="h2")
    temperature = temperature_element.text.strip() if temperature_element else "N/A"
    return temperature

def save_temperature_data(city, country):
    temperature = get_temperature(city, country)
    now = datetime.now()
    data = {
        "datetime": now.strftime("%Y-%m-%d %H:%M:%S"),
        "temperature": temperature
    }

    filename = f"{city}_{country}_temperature.csv"
    with open(filename, mode='a', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=["datetime", "temperature"])
        if file.tell() == 0:
            writer.writeheader()
        writer.writerow(data)

# Example usage
save_temperature_data("new-york", "usa")

This script fetches the current temperature, logs the datetime, and appends it to a CSV file for later use.

4. Automating the Scraper

You can automate the script to run daily using:

Schedule (Python)

python
import schedule
import time

schedule.every().day.at("08:00").do(save_temperature_data, city="new-york", country="usa")

while True:
    schedule.run_pending()
    time.sleep(60)

Cron Jobs (Linux/Mac)

Add the following to crontab -e:

swift
0 8 * * * /usr/bin/python3 /path/to/your/script.py

5. Storing Data in a Database

If storing in a database is preferred for scalability:

python
import sqlite3

def save_to_db(city, country, temperature):
    conn = sqlite3.connect("weather.db")
    c = conn.cursor()
    c.execute('''CREATE TABLE IF NOT EXISTS temperature_data (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        city TEXT,
        country TEXT,
        datetime TEXT,
        temperature TEXT
    )''')
    c.execute("INSERT INTO temperature_data (city, country, datetime, temperature) VALUES (?, ?, ?, ?)",
              (city, country, datetime.now().strftime("%Y-%m-%d %H:%M:%S"), temperature))
    conn.commit()
    conn.close()

Combine this with the scraping logic to save directly to the database.

6. Visualizing and Analyzing Data

Using Pandas and Matplotlib, you can visualize the tracked temperatures:

python
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("new-york_usa_temperature.csv")
df['datetime'] = pd.to_datetime(df['datetime'])
df['temperature'] = df['temperature'].str.extract('([-+]?d+)', expand=False).astype(float)

plt.figure(figsize=(10, 5))
plt.plot(df['datetime'], df['temperature'], marker='o')
plt.title('Daily Temperature Over Time')
plt.xlabel('Date')
plt.ylabel('Temperature (°C)')
plt.grid(True)
plt.tight_layout()
plt.show()

7. Dealing with JavaScript-rendered Sites

For websites that do not load weather data statically (i.e., they use JavaScript):

python
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True
driver = webdriver.Chrome(service=Service("/path/to/chromedriver"), options=options)
driver.get("https://example-weather-site.com")

temperature = driver.find_element("class name", "temperature").text
driver.quit()

Selenium allows browser automation, emulating a real user session, which is useful when static scraping fails.

8. Legal and Ethical Considerations

Rate limiting: Avoid frequent requests that can overload servers.
Terms of Service: Always check the website’s policy to ensure you’re not violating scraping rules.
Respect robots.txt: This file indicates what parts of the site are off-limits for bots.

If you require high-frequency or commercial-grade access, consider official weather APIs such as:

OpenWeatherMap API
WeatherStack API
Climacell/Tomorrow.io API
Visual Crossing Weather API

9. Advantages of Daily Temperature Tracking

Monitor climate change patterns
Compare year-over-year temperature data
Support agricultural or logistics operations
Power personal weather dashboards
Integrate with smart home systems for automation

10. Conclusion

Web scraping for daily temperature tracking is a powerful technique when used responsibly. With minimal setup, a scraper can pull and store weather data for further analysis or real-time monitoring. As your data grows, integrating visualizations and predictive analytics can add deeper insights. Always respect the data source and consider using official APIs if your needs scale beyond occasional personal use.

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

Check Out Our Newest Posts we wrote about

Why your ML system design must support partial retraining

Why your ML pipeline must detect missing or stale features

Why your ML feedback loop must consider label quality

Why your ML deployment plan must include fallback logic