Broder, A (1995). Some applications of Rabins fingerprinting method. In Renato Capocelli, Alfredo De Santis, and Ugo Vaccaro, editors, Sequences II: Methods in Communications, Security, and Computer Science
 Broder, A, Glassman, S, Manasse, M, and Zweig, G(1997). Syntactic clustering of the Web In 6th International World Wide Web Conference(Apr. 1997), 393-404
 Charikar, M.S(2002). Similarity estimation techniques from rounding algorithms. In 34 Annual ACM Symposium on Theory of Computing
 Fetterly, D, Manasse, M, and Najork, M.(2003). On the evolution of clusters of near-duplicate Web pages. In 1st Latin American Web Congress
 Chowdhury, A, Frieder, O Grossman, O, and McCabe, M C(2002). Collection statistics for fast duplicate document detection. ACM Transactions on Information Systems,20(2):171-191.
 Henzinger, M. (2006). Finding near-duplicate Web pages: A large scale evaluation of algorithms, In Proceedings of the 29 Annual Intemational ACM SIGIR Conference on Research and Development in information retrieval. Seattle, Washington. 421-428
 Kolcz, A. Chowdhury, A, and Alspector J.(2004). Improved robustness of signature-based near-replica detection via lexicon randomization. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA 605-610.
 Theobald, M, Siddharth, J, and Paepcke, A (2008). Spotsigs: robust and efficient near duplicate detection in large web collections. In SIGIR'08, 563-570.
 Manku, GS, Jain, A, and Das A(2007). Detecting near-duplicates for web crawling. In WWW'07