Categories We Write About

Batch convert DOCX to plain text

Batch converting DOCX files to plain text can be done efficiently using several methods, depending on your preference for tools or programming languages. Here are some straightforward approaches:


Method 1: Using Python with python-docx

Python’s python-docx library can read DOCX files and extract their text. You can write a simple script to batch process all DOCX files in a folder and save the extracted text as plain .txt files.

Example script:

python
import os from docx import Document input_folder = 'path/to/docx_files' output_folder = 'path/to/output_txt_files' os.makedirs(output_folder, exist_ok=True) for filename in os.listdir(input_folder): if filename.endswith('.docx'): docx_path = os.path.join(input_folder, filename) doc = Document(docx_path) full_text = [] for para in doc.paragraphs: full_text.append(para.text) text_content = 'n'.join(full_text) txt_filename = filename.replace('.docx', '.txt') txt_path = os.path.join(output_folder, txt_filename) with open(txt_path, 'w', encoding='utf-8') as f: f.write(text_content)

Method 2: Using LibreOffice Command Line (Windows/Linux/macOS)

LibreOffice can convert DOCX files to TXT via command line, which is very handy for batch processing.

  1. Install LibreOffice.

  2. Use this command in the terminal or batch script:

bash
libreoffice --headless --convert-to txt *.docx --outdir /path/to/output_folder

This will convert all DOCX files in the current directory to TXT and place them in the specified output folder.


Method 3: Using PowerShell (Windows)

If you’re on Windows and have Microsoft Word installed, you can automate conversion using PowerShell with Word COM automation.

powershell
$word = New-Object -ComObject Word.Application $word.Visible = $false $inputFolder = "C:pathtodocx" $outputFolder = "C:pathtotxt" Get-ChildItem -Path $inputFolder -Filter *.docx | ForEach-Object { $doc = $word.Documents.Open($_.FullName) $txtPath = Join-Path $outputFolder ($_.BaseName + ".txt") $doc.SaveAs([ref] $txtPath, [ref] 2) # 2 = wdFormatText $doc.Close() } $word.Quit()

Choose the method based on your environment and preferences. The Python script is platform-independent and easy to customize, while LibreOffice and PowerShell solutions are excellent for quick command-line batch processing.

Share This Page:

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories We Write About