Emrah Budur Successfully defended his PhD thesis titled Domain-Aware Conversational Open Question Answering for Resource-Constrained Languages

Thesis: Domain-Aware Conversational Open Question Answering for Resource-Constrained Languages

 

 

The introduction of many modern conversational applications including Amazon Alexa, Apple Siri, Google Assistant and Microsoft Cortana into the market not only increased the users’ demand on information seeking through conversational interactions but also boosted the research-intensive innovations in the field of Conversational AI. Despite significant advancements in the field, the progress is mainly limited to the English language. This limitation is primarily attributed to the advanced models requring large amounts of labeled datasets, hence, leaving languages with limited resources behind the scene.

 

In this thesis, we explore the feasibility of Conversational AI for the resource-constrained languages, particularly focusing on Question Answering (QA) systems, driven by the increasing user demand and recent advancements in neural models. This research aims to bridge the gap between a Conversational Open QA System in English and low resource languages. Following a 3-step systematic approach, we address challenges for building a robust Conversational OpenQA systems in low-resource languages like Turkish.

 

As the first step, we present a cost-effective method to obtain a large amount of dataset for low-resource languages. We leverage neural machine translation (NMT) tools to translate mainstream Natural Language Inference datasets available in English into non-English languages. Our qualitative and quantitative analysis demonstrates that NMT is an effective tool to address the scarcity of data in low-resource languages. In the second step, we extend our approach to the Question Answering domain and show that Question Answering is also feasible for low-resource languages when we shift our focus to Open Question Answering. In this approach, we translate the SQuAD2.0 dataset from English to Turkish using machine translation and demonstrate the feasibility of OpenQA in low-resource languages. We also show that only a few hundred labeled examples may be enough to confidently assess the performance of OpenQA systems. As the final step, we incorporate conversational signals into our OpenQA system to obtain BIRI, a Conversational OpenQA system for low-resource languages. We demonstrate that incorporating conversational signals further improves the performance of OpenQA systems, providing a better solution aligned with real-world use cases.

Our thesis not only contributes to the advancement of Conversational AI in Turkish but also serves as a blueprint for making such systems accessible in other languages. This initiative paves the way for new research opportunities for Conversational Open QA systems in resource-constrained languages.

Advisor: Prof.Dr. Tunga Güngör

 

Publications:

  1. Budur, E., Özçelik, R., Güngör, T., and Potts, C., “Data and Representation for Turkish Natural Language Inference”. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.

  2. Budur, E., Özçelik, R., Khattab, O., Soylu, D., Güngör, T., and Potts, C., “Building Efficient and Effective OpenQA Systems for Low-Resource Languages”, Knowledge-Based Systems (KNOSYS), 2024.

  3. Budur, E. and Güngör, T., “Conversational Open Question Answering for Resource-Constrained Languages”, Turkish Journal of Electrical Engineering and Computer Sciences, 2024. (under review)

 

Contact us

Department of Computer Engineering, Boğaziçi University,
34342 Bebek, Istanbul, Turkey

  • Phone: +90 212 359 45 23/24
  • Fax: +90 212 2872461
 

Connect with us

We're on Social Networks. Follow us & get in touch.