The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing

Categories We Write About

Scrape restaurant menus for dietary tags

Written by

in

Computer Science

Scraping restaurant menus for dietary tags (like vegan, gluten-free, nut-free, dairy-free, keto, etc.) involves several steps: identifying sources, extracting data, and processing it to identify or infer dietary tags.

Here’s a Python-based approach using BeautifulSoup and requests, suitable for scraping restaurant websites that serve HTML menus (not JavaScript-heavy ones like Uber Eats or DoorDash, which require headless browsers or APIs).

Step-by-Step Python Script

python
import requests
from bs4 import BeautifulSoup
import re

# Example: Scraping a sample restaurant menu page
URL = "https://www.example.com/menu"
HEADERS = {'User-Agent': 'Mozilla/5.0'}

# Common dietary keywords to match
DIETARY_TAGS = ['vegan', 'vegetarian', 'gluten[- ]?free', 'dairy[- ]?free', 'nut[- ]?free', 'keto', 'paleo', 'halal', 'kosher']

def fetch_html(url):
    response = requests.get(url, headers=HEADERS)
    response.raise_for_status()
    return response.text

def extract_menu_items(html):
    soup = BeautifulSoup(html, 'html.parser')
    menu_items = []

    for item in soup.find_all(['p', 'li', 'div', 'span'], text=True):
        text = item.get_text(separator=' ', strip=True).lower()
        if len(text) > 20:  # filter short non-descriptive lines
            tags_found = [tag for tag in DIETARY_TAGS if re.search(tag, text)]
            if tags_found:
                menu_items.append({
                    'description': text,
                    'tags': tags_found
                })

    return menu_items

def main():
    html = fetch_html(URL)
    menu_items = extract_menu_items(html)

    for item in menu_items:
        print(f"- {item['description']}")
        print(f"  Tags: {', '.join(item['tags'])}")

if __name__ == "__main__":
    main()

Output Example

text
- grilled tofu salad with quinoa, almonds, and tahini dressing (vegan, gluten-free)
  Tags: vegan, gluten-free
- chicken tikka masala with basmati rice (gluten-free)
  Tags: gluten-free

Notes

Tag Detection: The script uses regex to flexibly detect variations like “gluten free” or “gluten-free”.
Website Compatibility: For JavaScript-heavy pages (e.g., those built with React), use Selenium or Playwright.
Structured APIs: If available (e.g., Yelp, Zomato), they’re a better long-term solution.
Ethics/Legality: Always respect robots.txt and Terms of Service.

Would you like a version of this script that works with multiple restaurant URLs or saves the data to a CSV/JSON file?

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

Check Out Our Newest Posts we wrote about

Categories We Write About