Print Email Facebook Twitter Building Interactive Text-to-SQL Systems Title Building Interactive Text-to-SQL Systems Author Koops, Reinier (TU Delft Electrical Engineering, Mathematics and Computer Science; ING AI for FinTech Research; ING) Contributor Houben, G.J.P.M. (mentor) Gadiraju, Ujwal (mentor) Brons, J. (mentor) Lan, G. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science Date 2022-05-24 Abstract Natural Language Interfaces for Databases (NLIDBs) offer a way for users to reason about data. It does not require the user to know the data structure, its relations, or familiarity with a query language like SQL. It only requires the use of Natural Language. This thesis focuses on a subset of NLIDBs, namely those with 'plain English' sentences as input and SQL queries as output.Study 1 recruits participants from multiple origins (i.e. academia, a crowdsourcing platform, banking industry) without selection based on their query language capabilities. Next, participants are segmented based on query language capabilities to distinguish between non-experts and experts. A common way to retrieve information from databases is by using SQL. Thus knowledge of SQL is assumed to be a proxy for participants' skill level (i.e. SQL proficient, non-SQL proficient). We create an approach that uses an automated near semantic equivalence evaluation for user-generated queries against a predefined gold-standard SQL query and thus segment participants. We find that 70 out of 242 participants are identified as SQL proficient. To differentiate between the segmentations, we define 42 requirements often implemented for NLIDB systems, from which both segmentations pick a selection as their preferred requirements. We are unable to find statistically significant differences between the segmentations' preferences. However, exploratory findings reveal the importance of origin, namely the banking industry, which prefers explanation over answer accuracy, different from other segmentations.Study 2 is inspired by the exploratory findings of Study 1 and uses requirements from Study 1 to create an application that tests two conditions, one with an explanation by using color-coding (i.e. to show the relations between the natural language question asked and the models' output columns) and another without. NLIDBs make it hard for users to verify if the answer provided by its model is correct. Therefore, Study 2 uses these two conditions above to test if color-coding improves performance for the participants. Our findings suggest that color-coding only improves performance for non-aggregate selection queries with multiple columns. Subject NLIDBNLPSQLText-to-sqlDeep Learning To reference this document use: http://resolver.tudelft.nl/uuid:8ccc193c-35db-472f-a013-fe9aa87b44e7 Related dataset 4TU.ResearchData https://doi.org/10.4121/19733029 Part of collection Student theses Document type master thesis Rights © 2022 Reinier Koops Files PDF Reinier_Koops_MSc_Thesis.pdf 4.18 MB Close viewer /islandora/object/uuid:8ccc193c-35db-472f-a013-fe9aa87b44e7/datastream/OBJ/view