Categories We Write About

Scrape Wikipedia tables

Written by

To scrape Wikipedia tables, you can use Python with the pandas library, which makes the task straightforward since many Wikipedia tables are formatted as HTML tables. Below is a basic script to do this:

python
import pandas as pd

# Wikipedia URL with a table
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)'

# Read all tables from the page
tables = pd.read_html(url)

# Inspect the number of tables found
print(f"Found {len(tables)} tables.")

# Display the first table (or another based on index)
print(tables[0])  # Change index to select another table

Notes:

pandas.read_html() uses lxml or html5lib behind the scenes, so install them if needed:
```
bash
pip install pandas lxml html5lib
```
Some pages have multiple tables; use indexing (tables[0], tables[1], etc.) to select the desired one.
Once you have the table, you can save it or manipulate it with pandas:

python
# Save to CSV
tables[0].to_csv("gdp_table.csv", index=False)

If you want help scraping a specific Wikipedia table, just share the URL or describe the table.

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

Check Out Our Newest Posts we wrote about

Categories We Write About

Scrape Wikipedia tables

Notes:

Check Out Our Newest Posts we wrote about

Why your ML system design must support partial retraining

Why your ML pipeline must detect missing or stale features

Why your ML feedback loop must consider label quality

Why your ML deployment plan must include fallback logic