The Palos Publishing Company

Categories We Write About

Extract titles from HTML pages

Written by

in

Computer Science

To extract titles from HTML pages, you can use several methods depending on your tools and programming language. Here’s a simple method using Python with BeautifulSoup, which is one of the most efficient ways to parse HTML.

Python Script to Extract Titles from HTML Pages

python
from bs4 import BeautifulSoup

# Example: Load HTML content (can also load from a file or URL)
html_content = """
<html>
<head>
    <title>Example Page Title</title>
</head>
<body>
    <h1>Main Heading</h1>
    <p>This is a sample page.</p>
</body>
</html>
"""

# Parse the HTML
soup = BeautifulSoup(html_content, 'html.parser')

# Extract the <title> tag content
page_title = soup.title.string if soup.title else "No title found"

print("Page Title:", page_title)

To Extract from Multiple HTML Files in a Folder

python
import os
from bs4 import BeautifulSoup

folder_path = "path_to_html_files"

for filename in os.listdir(folder_path):
    if filename.endswith(".html") or filename.endswith(".htm"):
        with open(os.path.join(folder_path, filename), "r", encoding="utf-8") as file:
            soup = BeautifulSoup(file, 'html.parser')
            title = soup.title.string if soup.title else "No title found"
            print(f"{filename}: {title}")

To Extract from URLs (Optional)

python
import requests
from bs4 import BeautifulSoup

url = "https://example.com"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
title = soup.title.string if soup.title else "No title found"
print("Page Title:", title)

Let me know if you want this in another language (like JavaScript or PHP), or adapted for browser extensions, WordPress, etc.

Share This Page:

Comments

Leave a Reply Cancel reply

Check Out Our Newest Posts we wrote about

Categories We Write About