Print Email Facebook Twitter Plagiarism detection by similarity join Title Plagiarism detection by similarity join Author Schellenberger, R. Contributor De Vries, A.P. (mentor) De Bruijn, S.T.J. (mentor) Faculty Electrical Engineering, Mathematics and Computer Science Department Media and Knowledge Engineering Date 2009-08-18 Abstract Since the internet is so big and most of its content is public, it is very hard to find out where the information came from originally. There are many websites that publish news articles, so people and organizations can easily lose track of where their articles are reused with or without their permission. This paper presents a plagiarism detection algorithm that allows us to quickly compare online news articles with a collection of personal news articles and detect plagiarized passages with the same quality as a human. The algorithm uses a basic shingle index and a Signature Tree as a more advanced pre-filtering step to narrow down the viable documents to a query. The algorithm achieves a score of 0.96 precision and 0.94 recall but is too resource intensive to be considered scalable. When only the pre-filtering step is used, it achieves 0.85 precision and recall creating a speedup of nearly one order of magnitude. Subject plagarism detectionsimilarity joinshinglessignature treenews articles To reference this document use: http://resolver.tudelft.nl/uuid:a18b62c5-e73e-44fc-9336-83a78275f266 Embargo date 2011-08-18 Part of collection Student theses Document type master thesis Rights (c) 2009 Schellenberger, R. Files PDF Plagiarism_detection_by_s ... y_join.pdf 2.96 MB Close viewer /islandora/object/uuid:a18b62c5-e73e-44fc-9336-83a78275f266/datastream/OBJ/view