Categories We Write About

Build a tool to highlight repeated phrases

Here’s a simple tool in Python that highlights repeated phrases in a text. It finds n-grams (e.g., sequences of 2–5 words) that appear more than once and highlights them using color in terminal output or wraps them in a custom format (like **bold**) for web use.

Python Tool to Highlight Repeated Phrases

python
import re from collections import defaultdict, Counter def clean_text(text): text = re.sub(r's+', ' ', text) return text.strip() def get_ngrams(words, n): return [' '.join(words[i:i+n]) for i in range(len(words)-n+1)] def find_repeated_phrases(text, min_len=2, max_len=5, min_count=2): text = clean_text(text) words = text.lower().split() phrase_counts = defaultdict(int) for n in range(min_len, max_len + 1): for phrase in get_ngrams(words, n): phrase_counts[phrase] += 1 return {phrase for phrase, count in phrase_counts.items() if count >= min_count} def highlight_phrases(text, phrases, marker_start='**', marker_end='**'): def escape_for_regex(phrase): return re.escape(phrase) sorted_phrases = sorted(phrases, key=len, reverse=True) for phrase in sorted_phrases: pattern = re.compile(rf'b({escape_for_regex(phrase)})b', re.IGNORECASE) text = pattern.sub(rf'{marker_start}1{marker_end}', text) return text # Example Usage if __name__ == "__main__": sample_text = """ This is an example text. This text has repeated phrases. Repeated phrases should be highlighted. This text is just a sample text. Repeated phrases like sample text should be easy to detect. """ repeated = find_repeated_phrases(sample_text, min_len=2, max_len=4, min_count=2) highlighted = highlight_phrases(sample_text, repeated) print("Repeated Phrases Found:n", repeated) print("nHighlighted Text:n", highlighted)

Features:

  • Identifies repeated phrases of 2 to 5 words by default.

  • Case-insensitive matching.

  • Highlights using ** markers by default (you can change it to HTML spans for web).

  • Ignores one-word repetitions.

Let me know if you want a version that runs in-browser (JavaScript) or integrates with a web editor.

Share This Page:

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories We Write About