Convert plain text logs into dataframes

Converting plain text logs into dataframes is a common task in data processing and analysis. The exact approach depends on the structure of your logs, but typically involves parsing the logs into columns and then loading them into a dataframe (often using Python’s pandas library).

Here’s a general step-by-step guide and example to convert plain text logs into a pandas dataframe:

1. Understand the Log Format

Logs might look like this (example):

pgsql
2025-05-18 10:15:23 INFO User login user_id=1234 ip=192.168.0.1
2025-05-18 10:16:05 ERROR Failed to load resource resource_id=5678
2025-05-18 10:18:45 WARN Disk space low disk_free=500MB

Each log line has:

Timestamp
Log level (INFO, ERROR, WARN)
Message
Optional key=value pairs

2. Parse Logs with Python

You can use regex or simple string operations to extract fields.

python
import pandas as pd
import re

log_lines = [
    "2025-05-18 10:15:23 INFO User login user_id=1234 ip=192.168.0.1",
    "2025-05-18 10:16:05 ERROR Failed to load resource resource_id=5678",
    "2025-05-18 10:18:45 WARN Disk space low disk_free=500MB"
]

# Define a regex pattern to capture timestamp, level, message, and key=value pairs
pattern = re.compile(r'^(?P<timestamp>S+ S+) (?P<level>S+) (?P<message>.*?)(?P<kv_pairs>(?: S+=S+)*)$')

data = []
for line in log_lines:
    match = pattern.match(line)
    if match:
        d = match.groupdict()
        # Parse key-value pairs into a dict
        kv_str = d.pop('kv_pairs').strip()
        kv_dict = {}
        if kv_str:
            pairs = kv_str.split(' ')
            for pair in pairs:
                if '=' in pair:
                    k, v = pair.split('=', 1)
                    kv_dict[k] = v
        d.update(kv_dict)
        data.append(d)

df = pd.DataFrame(data)
print(df)

3. Resulting DataFrame

timestamp	level	message	user_id	ip	resource_id	disk_free
2025-05-18 10:15:23	INFO	User login	1234	192.168.0.1	NaN	NaN
2025-05-18 10:16:05	ERROR	Failed to load resource	NaN	NaN	5678	NaN
2025-05-18 10:18:45	WARN	Disk space low	NaN	NaN	NaN	500MB

4. Adjust to Your Logs

Modify the regex pattern based on your exact log format.
Extract other fields as needed.
Convert columns (e.g., timestamp) to datetime types using pd.to_datetime().

If you provide a sample of your logs, I can tailor the code specifically for your format.

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

1. Understand the Log Format

2. Parse Logs with Python

3. Resulting DataFrame

4. Adjust to Your Logs

Check Out Our Newest Posts we wrote about

Why your ML system design must support partial retraining

Why your ML pipeline must detect missing or stale features

Why your ML feedback loop must consider label quality

Why your ML deployment plan must include fallback logic