网页去重 参考文献

2019-12-03 15:19:14  浏览:158  作者:老王

  [1] Broder, A (1995). Some applications of Rabins fingerprinting method. In Renato Capocelli, Alfredo De Santis, and Ugo Vaccaro, editors, Sequences II: Methods in Communications, Security, and Computer Science

  [2] Broder, A, Glassman, S, Manasse, M, and Zweig, G(1997). Syntactic clustering of the Web In 6th International World Wide Web Conference(Apr. 1997), 393-404

  [3] Charikar, M.S(2002). Similarity estimation techniques from rounding algorithms. In 34 Annual ACM Symposium on Theory of Computing

  [4] Fetterly, D, Manasse, M, and Najork, M.(2003). On the evolution of clusters of near-duplicate Web pages. In 1st Latin American Web Congress

  [5] Chowdhury, A, Frieder, O Grossman, O, and McCabe, M C(2002). Collection statistics for fast duplicate document detection. ACM Transactions on Information Systems,20(2):171-191.

  [6] Henzinger, M. (2006). Finding near-duplicate Web pages: A large scale evaluation of algorithms, In Proceedings of the 29 Annual Intemational ACM SIGIR Conference on Research and Development in information retrieval. Seattle, Washington. 421-428

  [7] Kolcz, A. Chowdhury, A, and Alspector J.(2004). Improved robustness of signature-based near-replica detection via lexicon randomization. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA 605-610.

  [8] Theobald, M, Siddharth, J, and Paepcke, A (2008). Spotsigs: robust and efficient near duplicate detection in large web collections. In SIGIR'08, 563-570.

  [9] Manku, GS, Jain, A, and Das A(2007). Detecting near-duplicates for web crawling. In WWW'07

评论区

共0条评论
  • 这篇文章还没有收到评论,赶紧来抢沙发吧~

【随机新闻】

返回顶部