The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Build a newsletter cleanup script

Here is a Python script that performs a basic newsletter cleanup task. The script assumes you have a raw text file containing newsletter content with common issues like extra whitespace, duplicated lines, unwanted promotional sections, or broken links. The script will clean up the text by:

  • Removing duplicate lines

  • Stripping extra spaces

  • Removing lines containing specific unwanted keywords (e.g., “unsubscribe,” “advertisement”)

  • Fixing basic URL formatting issues

You can expand or customize this script depending on your exact newsletter content and cleanup needs.

python
import re def load_newsletter(file_path): with open(file_path, 'r', encoding='utf-8') as f: content = f.readlines() return content def remove_duplicates(lines): seen = set() cleaned_lines = [] for line in lines: stripped = line.strip() if stripped and stripped not in seen: seen.add(stripped) cleaned_lines.append(stripped) return cleaned_lines def remove_unwanted_sections(lines, keywords): filtered = [] for line in lines: if not any(keyword.lower() in line.lower() for keyword in keywords): filtered.append(line) return filtered def fix_urls(lines): fixed_lines = [] url_pattern = re.compile(r'(http[s]?://[^s]+)') for line in lines: # Remove trailing punctuation from URLs def clean_url(match): url = match.group(0) return url.rstrip('.,;:') fixed_line = url_pattern.sub(clean_url, line) fixed_lines.append(fixed_line) return fixed_lines def save_cleaned_newsletter(lines, output_path): with open(output_path, 'w', encoding='utf-8') as f: f.write('n'.join(lines)) print(f"Cleaned newsletter saved to {output_path}") def main(): input_file = 'newsletter_raw.txt' output_file = 'newsletter_cleaned.txt' unwanted_keywords = ['unsubscribe', 'advertisement', 'promo', 'sale', 'sponsored'] lines = load_newsletter(input_file) lines = remove_duplicates(lines) lines = remove_unwanted_sections(lines, unwanted_keywords) lines = fix_urls(lines) save_cleaned_newsletter(lines, output_file) if __name__ == '__main__': main()

How it works:

  • Place your raw newsletter content in newsletter_raw.txt.

  • The script removes duplicate lines.

  • Lines containing words like “unsubscribe,” “advertisement,” etc., are removed.

  • URLs ending with punctuation marks like commas or periods are cleaned.

  • The cleaned content is saved to newsletter_cleaned.txt.

You can customize the list unwanted_keywords or extend the cleanup logic to fit your exact needs.

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About