Scrape data from interactive charts

Scraping data from interactive charts—such as those built using JavaScript libraries like Chart.js, Highcharts, D3.js, or Google Charts—can be a bit more involved than scraping static HTML tables or text. Here’s a concise guide on how to approach this:

1. Understand the Chart Type and Data Source

Interactive charts often load data dynamically via:

JavaScript objects embedded in HTML.
External API calls (e.g., JSON or CSV fetched via fetch()).
Inline JavaScript code that initializes the chart.

Use browser dev tools (F12):

Inspect Network Tab: Look for .json, .csv, or .ajax requests.
Inspect Source Code: Look for JavaScript variables like var data = [...].

2. Scrape via API or JSON Endpoint (Preferred Method)

If the chart pulls data from a network request:

python
import requests

url = 'https://example.com/data.json'  # Replace with actual endpoint
response = requests.get(url)
data = response.json()
print(data)

This is the cleanest and most reliable way to get structured data.

3. Scrape from JavaScript-Embedded Data

If the data is embedded in JavaScript:

python
import re
import requests

html = requests.get('https://example.com/chart').text
match = re.search(r'var chartDatas*=s*([{.*?}]);', html, re.DOTALL)
if match:
    import json
    data = json.loads(match.group(1))
    print(data)

Adjust the regex pattern based on how the data is declared.

4. Use Selenium for Fully Rendered Charts

When data is rendered dynamically and not available in page source or network:

python
from selenium import webdriver
import time

driver = webdriver.Chrome()
driver.get('https://example.com/chart')
time.sleep(5)  # Wait for JS to load

# Example: Get chart canvas data or text overlay
chart_data = driver.execute_script("return window.chartInstance.data;")
print(chart_data)

driver.quit()

This method works well with libraries like Chart.js, where data is part of a chartInstance.

5. Scraping SVG or Canvas-Based Charts

For charts rendered as SVG (e.g., D3.js):

python
from bs4 import BeautifulSoup
import requests

html = requests.get('https://example.com/chart').text
soup = BeautifulSoup(html, 'html.parser')
svg_elements = soup.find_all('circle')  # or 'rect', 'path', etc.

for el in svg_elements:
    print(el.attrs)

Canvas-based charts (like Chart.js or Google Charts) do not expose data in the DOM. You must rely on:

Reverse-engineering JS variables.
Using browser dev tools to tap into the chart’s data via console.

6. Optional Tools

Selenium + BeautifulSoup: For dynamic + static DOM scraping.
Pyppeteer or Playwright: Headless browser automation with better control than Selenium.
F12 Console: Often the easiest way to inspect and copy chart data manually.

Key Tips

Look for JSON or JavaScript Arrays in the source.
Use browser DevTools > Network tab to trace data sources.
Use driver.execute_script() to access in-browser JS variables.
Avoid OCR or image parsing unless the chart is truly non-accessible otherwise.

Let me know if you want a custom script for a specific website or chart type.

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

1. Understand the Chart Type and Data Source

2. Scrape via API or JSON Endpoint (Preferred Method)

3. Scrape from JavaScript-Embedded Data

4. Use Selenium for Fully Rendered Charts

5. Scraping SVG or Canvas-Based Charts

6. Optional Tools

Key Tips

Check Out Our Newest Posts we wrote about

Why your ML system design must support partial retraining

Why your ML pipeline must detect missing or stale features

Why your ML feedback loop must consider label quality

Why your ML deployment plan must include fallback logic