Performance of near-duplicate detection algorithms for Crawljax