Categories We Write About

Extract all links from web pages

Written by

To extract all links from web pages, you generally need to:

Fetch the web page’s HTML content.
Parse the HTML.
Extract all <a> tags with their href attributes.

Here are some common methods for extracting links:

1. Using Python (with BeautifulSoup)

python
import requests
from bs4 import BeautifulSoup

url = 'https://example.com'  # Replace with your target URL
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

links = []
for a_tag in soup.find_all('a', href=True):
    links.append(a_tag['href'])

print(links)

2. Using JavaScript in Browser Console

Open the browser console (F12), then run:

javascript
const links = Array.from(document.querySelectorAll('a'))
                   .map(a => a.href);
console.log(links);

3. Using Command Line with `wget` and `grep`

bash
wget -qO- https://example.com | grep -oP '(?<=href=")[^"]*'

If you want me to extract links from specific URLs, please share the URLs and I can do it for you.

Share This Page:

Comments

Check Out Our Newest Posts we wrote about

Categories We Write About

Extract all links from web pages

1. Using Python (with BeautifulSoup)

2. Using JavaScript in Browser Console

3. Using Command Line with `wget` and `grep`

Comments

Leave a Reply Cancel reply

Check Out Our Newest Posts we wrote about

Why Prompt Engineering Is Just the Starting Point

Why Most AI Projects Don’t Deliver—and How to Fix That

Why Generative AI Should Be in Your Annual Plan

Why Generative AI Needs Business Context

Extract all links from web pages

1. Using Python (with BeautifulSoup)

2. Using JavaScript in Browser Console

3. Using Command Line with wget and grep

Comments

Leave a Reply Cancel reply

Check Out Our Newest Posts we wrote about

Why Prompt Engineering Is Just the Starting Point

Why Most AI Projects Don’t Deliver—and How to Fix That

Why Generative AI Should Be in Your Annual Plan

Why Generative AI Needs Business Context

3. Using Command Line with `wget` and `grep`