TEXT and IMAGE Plagiarism Detection
Abstract
This project introduces a web-based plagiarism
detection system developed using Django, capable of
identifying plagiarism in both textual documents and
digital images. The system employs Natural Language
Processing (NLP) techniques such as tokenization,
stopword removal, lemmatization, and stemming to clean
and process text. It then uses the Longest Common
Subsequence (LCS) algorithm to compute similarity
between documents. For image analysis, the system
leverages OpenCV to preprocess images and calculate
histogram-based features using a custom Five Module
Matching (FMM) algorithm. These histograms are
compared to detect visual similarities between images.
The platform includes user registration, login, and
file/image upload functionalities. It stores user data in a
MySQL database and provides visual representation of
matching results. This system is ideal for academic and
content verification purposes, promoting originality and
preventing intellectual property theft
