Categories We Write About

Extract bold and italic text from Word docs

To extract bold and italic text from Word documents (.docx files), you can use Python with the python-docx library. Here’s a sample script that identifies and extracts text that is bold, italic, or both.

Python Script to Extract Bold and Italic Text

python
from docx import Document def extract_bold_italic_text(file_path): doc = Document(file_path) bold_text = [] italic_text = [] bold_italic_text = [] for para in doc.paragraphs: for run in para.runs: text = run.text.strip() if not text: continue if run.bold and run.italic: bold_italic_text.append(text) elif run.bold: bold_text.append(text) elif run.italic: italic_text.append(text) return bold_text, italic_text, bold_italic_text # Example usage: file_path = 'example.docx' # Replace with your .docx file path bold, italic, both = extract_bold_italic_text(file_path) print("Bold Text:", bold) print("Italic Text:", italic) print("Bold & Italic Text:", both)

Requirements

Install the required library:

bash
pip install python-docx

Output

  • bold will contain only bold text

  • italic will contain only italic text

  • both will contain text that is both bold and italic

Let me know if you’d like this script adapted to extract from tables, headers, or footnotes as well.

Share This Page:

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories We Write About