The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Scrape job listings from multiple websites

Scraping job listings from multiple websites involves collecting structured data from job boards or company career pages. Here’s a general approach using Python and popular libraries like requests, BeautifulSoup, and optionally Selenium for dynamic sites:


Tools You’ll Need

  1. Python Libraries:

    • requests – to fetch HTML content

    • BeautifulSoup – to parse HTML

    • pandas – to structure the data (optional)

    • Selenium – for JavaScript-rendered pages

  2. Target Websites:

    • Examples: Indeed, Glassdoor, LinkedIn (limited scraping allowed), RemoteOK, WeWorkRemotely, StackOverflow Jobs (deprecated but archived), etc.


Basic Example for Static Sites

python
import requests from bs4 import BeautifulSoup def scrape_indeed(query, location, pages=1): base_url = "https://www.indeed.com/jobs" job_listings = [] for page in range(pages): params = { "q": query, "l": location, "start": page * 10 } response = requests.get(base_url, params=params) soup = BeautifulSoup(response.text, 'html.parser') for div in soup.find_all(name="div", attrs={"class":"job_seen_beacon"}): title = div.find("h2", {"class": "jobTitle"}) company = div.find("span", {"class": "companyName"}) location = div.find("div", {"class": "companyLocation"}) summary = div.find("div", {"class": "job-snippet"}) job_listings.append({ "title": title.text.strip() if title else "", "company": company.text.strip() if company else "", "location": location.text.strip() if location else "", "summary": summary.text.strip() if summary else "" }) return job_listings

For JavaScript-Heavy Sites (Use Selenium)

python
from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By from webdriver_manager.chrome import ChromeDriverManager def scrape_remoteok(): driver = webdriver.Chrome(service=Service(ChromeDriverManager().install())) driver.get("https://remoteok.io/remote-dev-jobs") jobs = [] rows = driver.find_elements(By.XPATH, '//tr[@class="job"]') for row in rows: try: title = row.find_element(By.CLASS_NAME, "company_and_position").text company = row.find_element(By.CLASS_NAME, "companyLink").text location = row.find_element(By.CLASS_NAME, "location").text jobs.append({"title": title, "company": company, "location": location}) except: continue driver.quit() return jobs

Tips for Multi-Site Scraping

  • Throttle requests to avoid being blocked (e.g., use time.sleep).

  • Rotate user agents or use proxies for stealth.

  • Respect robots.txt – scraping some sites like LinkedIn or Glassdoor violates their terms.

  • Use APIs where available (e.g., GitHub Jobs, Lever, Greenhouse).


Optional: Save to CSV

python
import pandas as pd def save_to_csv(jobs, filename="jobs.csv"): df = pd.DataFrame(jobs) df.to_csv(filename, index=False)

Let me know if you’d like a scraper tailored for a specific site or want this turned into a CLI or web app.

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About