Compiling a Verbal Multiword Expression Corpus for Turkish and Developing a Multilingual Deep Learning based System for Verbal Multiword Expression Identification

The first aim of this project is disambiguating the definitions of Turkish verbal multi-word expressions (VMWE) and compiling a text corpus. For this purpose, we will start with the corpus developed at PARSEME action Shared Task 1.0, and then update and expand it. The VMWEs in the corpus will be labeled according to the published guidelines. The text corpus is formed of newspaper articles on the topics of politics, world, life, and art. The corpus that will be built will be published at PARSEME Shared Task 1.1. The corpus will be a valuable resource for Turkish natural language processing studies including syntactic parsing, machine translation, and n-gram language modeling. The second and main aim is, by using gold standard corpora for different languages, developing a VMWE identification system that is based on deep learning techniques and that is suitable to multi-language structures. Processing of multi-word expressions is an important challenge in natural language processing. The use of deep learning frameworks in this task is a relatively new topic. We will try to increase the success rates in VMWE identification of multi-language systems in the literature.

Funding Institution: 

BAP

Principal Investigator / Project Partner: 

Tunga Güngör

Date: 

2018 to 2019

Project Code: 

18A01P4

Contact us

Department of Computer Engineering, Boğaziçi University,
34342 Bebek, Istanbul, Turkey

  • Phone: +90 212 359 45 23/24
  • Fax: +90 212 2872461
 

Connect with us

We're on Social Networks. Follow us & get in touch.