Categories We Write About

Build a script to identify outdated links

Here’s a Python script to identify outdated or broken links from a list of URLs. It checks each link’s HTTP response and flags those that return a status code of 400 or higher (which typically indicates errors like “Not Found” or “Forbidden”).

python
import requests from concurrent.futures import ThreadPoolExecutor import time # List of URLs to check — replace or expand with your own links urls_to_check = [ "https://example.com", "https://nonexistentdomain.com/page", "https://httpstat.us/404", "https://httpstat.us/500", "https://httpstat.us/200" ] # Timeout in seconds for each request TIMEOUT = 10 def check_url(url): try: response = requests.head(url, timeout=TIMEOUT, allow_redirects=True) if response.status_code >= 400: return (url, response.status_code, 'Broken') else: return (url, response.status_code, 'OK') except requests.RequestException as e: return (url, 'N/A', f'Error: {str(e)}') def check_links(urls): results = [] start = time.time() with ThreadPoolExecutor(max_workers=10) as executor: results = list(executor.map(check_url, urls)) end = time.time() print(f"nChecked {len(urls)} links in {round(end - start, 2)} seconds.n") for url, status, result in results: print(f"{url} --> {status} [{result}]") if __name__ == "__main__": check_links(urls_to_check)

How to Use:

  1. Replace urls_to_check with your list of URLs.

  2. Run the script using Python 3.

  3. The output will show each URL, its status code, and whether it’s OK or Broken.

Notes:

  • Uses requests.head() for faster checking without downloading full content. Use requests.get() if servers don’t support HEAD requests reliably.

  • ThreadPoolExecutor speeds up the process with parallel requests.

  • You can adapt it to read from a file or export broken URLs to a report.

Let me know if you want a version that crawls a site and checks links automatically.

Share This Page:

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories We Write About