The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Scrape restaurant menus for dietary tags

Scraping restaurant menus for dietary tags (like vegan, gluten-free, nut-free, dairy-free, keto, etc.) involves several steps: identifying sources, extracting data, and processing it to identify or infer dietary tags.

Here’s a Python-based approach using BeautifulSoup and requests, suitable for scraping restaurant websites that serve HTML menus (not JavaScript-heavy ones like Uber Eats or DoorDash, which require headless browsers or APIs).


Step-by-Step Python Script

python
import requests from bs4 import BeautifulSoup import re # Example: Scraping a sample restaurant menu page URL = "https://www.example.com/menu" HEADERS = {'User-Agent': 'Mozilla/5.0'} # Common dietary keywords to match DIETARY_TAGS = ['vegan', 'vegetarian', 'gluten[- ]?free', 'dairy[- ]?free', 'nut[- ]?free', 'keto', 'paleo', 'halal', 'kosher'] def fetch_html(url): response = requests.get(url, headers=HEADERS) response.raise_for_status() return response.text def extract_menu_items(html): soup = BeautifulSoup(html, 'html.parser') menu_items = [] for item in soup.find_all(['p', 'li', 'div', 'span'], text=True): text = item.get_text(separator=' ', strip=True).lower() if len(text) > 20: # filter short non-descriptive lines tags_found = [tag for tag in DIETARY_TAGS if re.search(tag, text)] if tags_found: menu_items.append({ 'description': text, 'tags': tags_found }) return menu_items def main(): html = fetch_html(URL) menu_items = extract_menu_items(html) for item in menu_items: print(f"- {item['description']}") print(f" Tags: {', '.join(item['tags'])}") if __name__ == "__main__": main()

Output Example

text
- grilled tofu salad with quinoa, almonds, and tahini dressing (vegan, gluten-free) Tags: vegan, gluten-free - chicken tikka masala with basmati rice (gluten-free) Tags: gluten-free

Notes

  • Tag Detection: The script uses regex to flexibly detect variations like “gluten free” or “gluten-free”.

  • Website Compatibility: For JavaScript-heavy pages (e.g., those built with React), use Selenium or Playwright.

  • Structured APIs: If available (e.g., Yelp, Zomato), they’re a better long-term solution.

  • Ethics/Legality: Always respect robots.txt and Terms of Service.

Would you like a version of this script that works with multiple restaurant URLs or saves the data to a CSV/JSON file?

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About