Detecting duplicate code in a project is a common task for improving code quality and maintainability. One way to achieve this is through static analysis tools or using a script that searches for code similarities. Here’s a basic script in Python that uses the difflib
module to find duplicate code blocks by comparing functions or code snippets across files.
This is a simple approach, which you can expand based on your needs.
Python Script to Detect Duplicate Code
How the Script Works:
-
Reading Files: The script reads all the Python files in the specified directory.
-
Comparing Code: It compares the contents of two files using the
difflib
module, which calculates how similar the two code snippets are. -
Similarity Threshold: It uses a similarity threshold to flag code snippets as duplicates. You can adjust this threshold to suit your needs. A threshold of
0.8
means 80% or more similarity. -
Duplicate Detection: If the similarity is greater than or equal to the threshold, the two files are considered as having duplicate code.
Steps to Use:
-
Place this script in the root of your project or adjust the
directory
variable to point to your project directory. -
Run the script.
-
The script will print out pairs of files with their similarity score if they contain duplicate code.
Customization:
-
Threshold: You can adjust the threshold to be more or less strict. A higher threshold means stricter duplicate detection.
-
Language Support: This script is currently designed for Python files (
.py
). You can adapt it to other languages by changing the file extension filter in thepython_files
list comprehension. -
Function Granularity: This basic script compares whole files. If you want to compare individual functions, you’ll need to split the files into function-level snippets.
This approach provides a foundation for detecting duplicate code but can be improved with more advanced techniques such as abstract syntax tree (AST) analysis, tokenization, or integrating with dedicated tools like SonarQube or PMD for larger projects.
Leave a Reply