The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Build a Python script to check spelling in bulk documents

Here’s a complete Python script that checks spelling in bulk documents (like .txt, .docx, or .pdf files) within a specified directory. It uses pyspellchecker for spell-checking and python-docx / PyPDF2 for reading .docx and .pdf files.

python
import os from spellchecker import SpellChecker from docx import Document import PyPDF2 def extract_text_from_txt(file_path): with open(file_path, 'r', encoding='utf-8', errors='ignore') as file: return file.read() def extract_text_from_docx(file_path): doc = Document(file_path) return 'n'.join([para.text for para in doc.paragraphs]) def extract_text_from_pdf(file_path): text = "" with open(file_path, 'rb') as file: reader = PyPDF2.PdfReader(file) for page in reader.pages: text += page.extract_text() or '' return text def get_text_from_file(file_path): if file_path.endswith('.txt'): return extract_text_from_txt(file_path) elif file_path.endswith('.docx'): return extract_text_from_docx(file_path) elif file_path.endswith('.pdf'): return extract_text_from_pdf(file_path) else: return None def spell_check_text(text): spell = SpellChecker() words = text.split() misspelled = spell.unknown(words) return {word: spell.correction(word) for word in misspelled} def process_documents(directory): report = {} for filename in os.listdir(directory): if filename.endswith(('.txt', '.docx', '.pdf')): file_path = os.path.join(directory, filename) print(f"Processing: {filename}") text = get_text_from_file(file_path) if text: errors = spell_check_text(text) if errors: report[filename] = errors return report if __name__ == "__main__": folder_path = input("Enter the path to the folder containing documents: ").strip() results = process_documents(folder_path) if not results: print("No spelling errors found.") else: for file, errors in results.items(): print(f"nFile: {file}") for wrong, suggestion in errors.items(): print(f" {wrong} -> {suggestion}")

Required Libraries Installation

Install the dependencies via pip before running:

bash
pip install pyspellchecker python-docx PyPDF2

Features:

  • Supports .txt, .docx, and .pdf.

  • Automatically extracts and checks words.

  • Suggests corrections for each misspelled word.

  • Displays a report file-wise.

Let me know if you want the output saved to a CSV or text file.

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About