Print Email Facebook Twitter LCT-GAN - Improving the efficiency of tabular data synthesis via latent embeddings Title LCT-GAN - Improving the efficiency of tabular data synthesis via latent embeddings Author Velev, Viktor (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Distributed Systems) Contributor Chen, Lydia Y. (mentor) Zhao, Z. (mentor) Degree granting institution Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2022-07-14 Abstract In the past decade data-driven approaches have been at the core of many business and research models. In critical domains such as healthcare and banking, data privacy issues are very stringent. Synthetic tabular data is an emerging solution to privacy guarantee concerns. Generative Adversarial Networks (GANs) are one of the emerging solutions for synthesizing data. However in order to capture all relevant relationships between columns, tabular data needs to be numerically encoded. As columns might be of different types, this is a challenging task as proven by recent approaches. Throughout this paper, we focus on the dimensionality explosion problem, which leads to high-dimensional datasets alongside computational overhead and increase in training time. We introduce a novel synthesis pipeline - LCT-GAN - an improvement to the current state-of-the-art in tabular data synthesis CTAB-GAN. Our approach addresses the dimensionality explosion problem by introducing a low-dimensional embedding step via an autoencoder prior to training. It is then combined with a novel conditional GAN architecture, operating in latent space. After thorough evaluation, we observe that our solution achieves more than 30\% improvement in certain statistical metrics in comparsion to CTAB-GAN, accompanied by 5 fold decrease in size and 150 times speedup in training time for a single epoch. We successfully show that it is possible to embed data using autoencoders, and that GANs are able to learn complex relationships in latent space in the context of tabular data. Subject Tabular dataGANLatent SpaceAutoencoderData synthesis To reference this document use: http://resolver.tudelft.nl/uuid:e6b008b4-7fdb-49b3-a3c1-d5d22f96ea14 Part of collection Student theses Document type bachelor thesis Rights © 2022 Viktor Velev Files PDF Improving_the_efficiency_ ... ngs_6_.pdf 528.04 KB PDF Final_Poster_11_.pdf 551.34 KB Close viewer /islandora/object/uuid:e6b008b4-7fdb-49b3-a3c1-d5d22f96ea14/datastream/OBJ1/view