Print Email Facebook Twitter Automatic cell identification in single-cell RNA-sequencing data Title Automatic cell identification in single-cell RNA-sequencing data Author Michielsen, Lieke (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Reinders, M.J.T. (mentor) Mahfouz, A.M.E.T.A. (mentor) Degree granting institution Delft University of Technology Programme Computer Science | Data Science and Technology Date 2020-01-30 Abstract Since the revolution of single-cell RNA-sequencing, the number of available datasets has increased enormously. In these datasets, cell identification is mainly done manually, which is subjective and time-consuming. As a consequence, most datasets are annotated at a different resolution. This is not surprising as cell types form a hierarchy, but it can be problematic for downstream analysis or comparison of datasets. Several supervised methods have already been developed to overcome the drawbacks of unsupervised learning. None of these, however, combines the information found in multiple datasets and preserves the definition of cell populations in each dataset, while this consistency is necessary for downstream analysis. Furthermore, a supervised classifier should be able to detect new cell populations in an unlabeled dataset. Here, we introduce a hierarchical progressive learning pipeline with a one-class classifier to face these challenges. Using this pipeline, it is possible to construct a hierarchical classification tree by combining the information of multiple datasets. If datasets are annotated at a different resolution, their cell populations will be at different levels in the tree and all definitions are thus preserved. By using a one-class classifier for each cell population it is also possible to have a correctly working rejection option and discover new cell populations. In this paper, we show that it is possible to construct a classification tree for simulated data and immune cells. When comparing the pipeline with a one-class to a linear classifier, we show that a one-class classifier can indeed improve the rejection option. Using a linear classifier, on the other hand, results in a higher accuracy. Choosing between a one-class and a linear classifier is a trade-off between the ability of discovering new cell populations and a higher performance. Subject Cell typesTranscriptomicsMachine learning To reference this document use: http://resolver.tudelft.nl/uuid:a7a2a1f7-486e-47bb-afb8-7cdc891db795 Part of collection Student theses Document type master thesis Rights © 2020 Lieke Michielsen Files PDF Lieke_Michielsen_Master_Thesis.pdf 18.41 MB Close viewer /islandora/object/uuid:a7a2a1f7-486e-47bb-afb8-7cdc891db795/datastream/OBJ/view