Cold start is coming: How to approximate the optimal set of initial prototypes for clustering sequence data online

Fucarev, Silviu

Cold start is coming: How to approximate the optimal set of initial prototypes for clustering sequence data online

Title

Cold start is coming: How to approximate the optimal set of initial prototypes for clustering sequence data online

Author

Fucarev, Silviu (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Intelligent Systems)

Contributor

Nadeem, A. (mentor)
Verwer, S.E. (mentor)
Migut, M.A. (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Computer Science and Engineering

Project

CSE3000 Research Project

Date

2021-07-01

Abstract

Clustering data is a classic topic in the academic community and in the industry. It is by and large one of the most popular unsupervised classification techniques. It is fast and flexible as it can accommodate all kinds of data when a suitable similarity metric is found. SeqClu is an online k-medoids prototype based clustering algorithm designed to handle large quantities of sequence data. Our main focus is the role initialization plays in the performance of SeqClu. In this paper we show that Greedy Heuristics perform significantly better than K-medoids heuristics. In the context of Greedy Heuristics we show that these can be combined together to achieve potentially better accuracy if a proper metric to choose the initialization results is elected.

Subject

Clustering algorithms
greedy heuristic
k-medoids
online clustering algorithms

To reference this document use:

http://resolver.tudelft.nl/uuid:59e50492-e027-4f04-9d86-f8c659851cc6

Part of collection

Student theses

Document type

bachelor thesis

Rights

Files

PDF

thesis.pdf

2 MB

Close viewer