Categories We Write About

Extract titles from HTML pages

To extract titles from HTML pages, you can use several methods depending on your tools and programming language. Here’s a simple method using Python with BeautifulSoup, which is one of the most efficient ways to parse HTML.


Python Script to Extract Titles from HTML Pages

python
from bs4 import BeautifulSoup # Example: Load HTML content (can also load from a file or URL) html_content = """ <html> <head> <title>Example Page Title</title> </head> <body> <h1>Main Heading</h1> <p>This is a sample page.</p> </body> </html> """ # Parse the HTML soup = BeautifulSoup(html_content, 'html.parser') # Extract the <title> tag content page_title = soup.title.string if soup.title else "No title found" print("Page Title:", page_title)

To Extract from Multiple HTML Files in a Folder

python
import os from bs4 import BeautifulSoup folder_path = "path_to_html_files" for filename in os.listdir(folder_path): if filename.endswith(".html") or filename.endswith(".htm"): with open(os.path.join(folder_path, filename), "r", encoding="utf-8") as file: soup = BeautifulSoup(file, 'html.parser') title = soup.title.string if soup.title else "No title found" print(f"{filename}: {title}")

To Extract from URLs (Optional)

python
import requests from bs4 import BeautifulSoup url = "https://example.com" response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') title = soup.title.string if soup.title else "No title found" print("Page Title:", title)

Let me know if you want this in another language (like JavaScript or PHP), or adapted for browser extensions, WordPress, etc.

Share This Page:

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories We Write About