Monitor server health with Python

Monitoring server health with Python is an effective way to ensure your infrastructure runs smoothly and proactively detect issues before they escalate. By leveraging Python’s versatile libraries and scripting capabilities, you can track key performance indicators such as CPU usage, memory consumption, disk space, network activity, and process status. This article outlines practical approaches to build a server health monitoring system using Python, along with example code snippets to get started.

Key Metrics to Monitor

CPU Usage: High CPU utilization can indicate processes consuming excessive resources or potential overload.
Memory Usage: Monitoring RAM ensures applications have enough memory and detects leaks or spikes.
Disk Space: Prevents failures caused by full disks that can halt services or corrupt data.
Network Traffic: Monitors bandwidth and detects abnormal spikes which might indicate issues or attacks.
Process Monitoring: Ensures critical services are running and restarts them if necessary.
System Load: Measures average system load to understand overall server stress.

Essential Python Libraries for Server Monitoring

psutil: Cross-platform library for retrieving system information such as CPU, memory, disk, and network.
subprocess: To execute system commands when needed.
socket: For network-related health checks.
smtplib/email: To send alert notifications via email.
logging: To keep track of events and errors.
time/schedule: To run monitoring scripts at regular intervals.

Setting Up a Basic Server Health Monitor

Install the psutil library if not already installed:

bash
pip install psutil

Monitoring CPU and Memory Usage

python
import psutil

def get_cpu_usage():
    return psutil.cpu_percent(interval=1)

def get_memory_usage():
    mem = psutil.virtual_memory()
    return mem.percent

cpu = get_cpu_usage()
memory = get_memory_usage()

print(f"CPU Usage: {cpu}%")
print(f"Memory Usage: {memory}%")

Checking Disk Usage

python
def get_disk_usage(partition="/"):
    disk = psutil.disk_usage(partition)
    return disk.percent

disk_usage = get_disk_usage()
print(f"Disk Usage: {disk_usage}%")

Monitoring Network Usage

python
def get_network_stats():
    net_io = psutil.net_io_counters()
    return net_io.bytes_sent, net_io.bytes_recv

sent, recv = get_network_stats()
print(f"Bytes Sent: {sent}, Bytes Received: {recv}")

Monitoring Specific Processes

To check if a critical process (e.g., nginx) is running:

python
def check_process(name):
    for proc in psutil.process_iter(['name']):
        if proc.info['name'] == name:
            return True
    return False

process_name = "nginx"
if check_process(process_name):
    print(f"{process_name} is running")
else:
    print(f"{process_name} is NOT running")

Automating Alerts

You can configure email alerts when certain thresholds are exceeded.

Example: Send alert if CPU usage exceeds 80%.

python
import smtplib
from email.mime.text import MIMEText

def send_alert(subject, message, to_email):
    from_email = "your_email@example.com"
    from_password = "your_password"

    msg = MIMEText(message)
    msg['Subject'] = subject
    msg['From'] = from_email
    msg['To'] = to_email

    server = smtplib.SMTP_SSL('smtp.gmail.com', 465)
    server.login(from_email, from_password)
    server.sendmail(from_email, to_email, msg.as_string())
    server.quit()

cpu_threshold = 80
cpu = get_cpu_usage()

if cpu > cpu_threshold:
    send_alert("High CPU Usage Alert", f"CPU usage has reached {cpu}%", "admin@example.com")

Note: For Gmail, you might need to create an app password or enable less secure apps.

Scheduling Periodic Checks

Use the schedule library to automate monitoring at intervals.

bash
pip install schedule

Example script to run checks every 5 minutes:

python
import schedule
import time

def monitor():
    cpu = get_cpu_usage()
    memory = get_memory_usage()
    disk = get_disk_usage()
    print(f"CPU: {cpu}%, Memory: {memory}%, Disk: {disk}%")

schedule.every(5).minutes.do(monitor)

while True:
    schedule.run_pending()
    time.sleep(1)

Building a Dashboard or Logging

For more advanced monitoring, you can log data to a file or database, or visualize metrics using tools like:

Grafana + Prometheus (Python scripts export data)
Flask/Django web dashboards
Matplotlib/Plotly for local visualization

Conclusion

Python offers a flexible and powerful toolkit for monitoring server health through simple scripts that check vital metrics, alert on anomalies, and help maintain uptime. By customizing and extending these basic examples, you can create a robust monitoring solution tailored to your infrastructure’s needs. Regular monitoring improves reliability, performance, and incident response, making it an essential practice for system administrators and DevOps teams.

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

Key Metrics to Monitor

Essential Python Libraries for Server Monitoring

Setting Up a Basic Server Health Monitor

Monitoring CPU and Memory Usage

Checking Disk Usage

Monitoring Network Usage

Monitoring Specific Processes

Automating Alerts

Scheduling Periodic Checks

Building a Dashboard or Logging

Conclusion

Check Out Our Newest Posts we wrote about

Why your ML system design must support partial retraining

Why your ML pipeline must detect missing or stale features

Why your ML feedback loop must consider label quality

Why your ML deployment plan must include fallback logic