The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Create a script to clean email signatures

Here is a Python script to clean email signatures from a block of email text. It identifies and removes common email signature patterns using heuristics such as delimiters (--, Thanks, Regards, etc.), excessive contact info, and short lines at the end of the email body:

python
import re def clean_email_signature(email_body: str) -> str: # Normalize line endings email_body = email_body.replace('rn', 'n').strip() # Common signature delimiters signature_delimiters = [ r'^--s*$', # "--" on its own line r'^__+$', # "____" line r'^thanks[.,s]*$', # "Thanks", "Thanks." r'^regards[.,s]*$', # "Regards", "Regards," r'^best[s]+(regards|wishes)[.,s]*$', # "Best regards" r'^cheers[.,s]*$', # "Cheers" r'^sincerely[.,s]*$', # "Sincerely" ] # Compile all delimiter regexes signature_patterns = [re.compile(p, re.IGNORECASE) for p in signature_delimiters] # Split the email into lines for analysis lines = email_body.split('n') cleaned_lines = [] found_signature = False # Start from the bottom and work upward to detect signature block for i in range(len(lines) - 1, -1, -1): line = lines[i].strip() # Stop if we've already found a signature if found_signature: continue # Check if line matches any known signature delimiter if any(p.match(line) for p in signature_patterns): found_signature = True continue # Heuristics: Skip phone/email/contact lines if re.search(r'(phone|mobile|email|website|linkedin|twitter|facebook|contact)', line, re.IGNORECASE): continue # Heuristic: Skip if line is name-like (short and capitalized words) if re.match(r'^([A-Z][a-z]+s?){1,3}$', line): continue # Once we decide to keep lines, prepend to cleaned list cleaned_lines.insert(0, line) return 'n'.join(cleaned_lines).strip() # Example usage if __name__ == "__main__": email_text = """ Hi John, Thanks for the update. I’ll take a look and get back to you. Best regards, Sarah Thompson Senior Developer ACME Corp Phone: 123-456-7890 Email: sarah@acme.com """ print("Cleaned Email:n") print(clean_email_signature(email_text))

Key Features:

  • Recognizes common signature delimiters and phrases.

  • Filters lines with contact info keywords.

  • Ignores name-like patterns often seen in signatures.

  • Processes text from the bottom up to detect trailing signatures.

You can further refine this by incorporating ML-based approaches or integrating with libraries like talon or replyto if needed.

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About