Print Email Facebook Twitter How to optimize the personalization vector to combat link spamming Title How to optimize the personalization vector to combat link spamming Author Tjan, J. Contributor van Gijzen, M.B. (mentor) Faculty Electrical Engineering, Mathematics and Computer Science Department Delft Institute of Applied Mathematics Date 2016-06-28 Abstract Google uses the PageRank algorithm to rank the web. The algorithm models the behavior of a random surfer. It follows an outlink or it goes to any page by entering a URL into an address bar. This is also called teleportation. The probability that a surfer teleports to a page is given in the personalization vector. The PageRank algorithm returns a Pagerank score for each page. The score determines the position of the page. The higher the score, the higher the page will be on the list. However, some people want to increase their PageRank score artificially. Link spamming is the name for adding and removing links between pages with the sole purpose of increasing the PageRank score. We want to find a method to lower the effect of link spamming. One way is to change the personalization vector. If we restrict the pages the random surfer can teleport to, we can avoid having the surfer teleport to a page that is suspected of link spamming. So one way to suppress the effect of link spamming is to optimize the personalization vector. In order to combat link spamming, we have looked at the role and in uence of the personalization vector. We describe two different methods to optimize the personalization vector. The first method generates a number of personalization vectors and calculates the sum of the PageRank scores of the pages that are suspected of link spamming. The lowest score belongs to an optimal personalization vector. The second method uses linear programming. We minimize the PageRank scores of the suspected pages and find the optimal personalization vector. The results were not always useful. For that reason, we added two extra requirements. One was setting an upper limit for all the pages on the probability that the surfer can teleport. The other one was suppressing the pages in the irreducible subsets. If a surfer gets into an irreducible subset, it will never leave the subset by following outlinks. To reference this document use: http://resolver.tudelft.nl/uuid:469e154c-5618-4507-9ac9-a3ffcf157e29 Part of collection Student theses Document type bachelor thesis Rights (c) 2016 Tjan, J. Files PDF Report_Jenny.pdf 541.02 KB Close viewer /islandora/object/uuid:469e154c-5618-4507-9ac9-a3ffcf157e29/datastream/OBJ/view