In this project, we aim at building a system that can automatically generate product reviews in a domain, such as movie reviews or book reviews. Two different approaches will be used. One of the approaches is a pattern-based approach, where the reviews are formed by filling in the slots of predefined patterns. The second one is an encoder-decoder approach, where the reviews will be generated by a deep learning model [1]. In the first part of the project, the literature on automatic review generation will be surveyed.
In this project, we aim at building a comprehensive word embedding repository for the Turkish language. Using each of the state-of-the-art word embedding methods, embeddings of all the words in the language will be formed using a corpus. First, the three commonly-used embedding methods (Word2Vec, Glove, Fasttext) will be used and an embedding dictionary for each one will be formed. Then we will continue with context-dependent embedding methods such as BERT and Elmo. Each method will be applied with varying parameters such as different corpora and different embedding dimensions.
This project aims to improve the quality of developed software and enhance the software development process by employing the well-known software architecture and design principles (like SOLID principles) into the development phase. Specifically, it aims to develop automated assistance of the process of developing a good software architecture from artifacts that describe the project.
The mispelling of de-da is a hot topic among Turkish speakers. Not only are there a lot of spelling mistakes for this case, it is can be quite triggering for many people. Unfortunately, spellchecker/correctors are not very successful in detecting these errors.
A Web Application for Annotating Dependency Parse Treebanks BoAT (Bogazici University Annotation Tool) [1] is an open-source annotation desktop tool designed for manual annotation of sentences in dependency parse format. It has been developed in the scope of a TÜBİTAK project and was implemented in Python 3. It is used by annotators in several projects to annotate the Turkish treebanks in the UD (Universal Dependencies) framework [2].
The first step in nearly all natural language processing (NLP) applications is applying preprocessing operations to the text. Preprocessing operations include tokenization (segmenting the text into tokens), sentence splitting (dividing the text into sentences), normalization (converting the text into a canonical form), and the like. In this project, you will develop and implement algorithms for preprocessing of Turkish text using deep learning approaches. First, a literature review will be conducted and similar systems for English will be analyzed (e.g. UDPipe, Stanza).
Mobile application stores allow users to provide their feedback on the applications as star ratings and natural language text. The user feedback include useful information on the application as bug reports, feature requests, rationale for praise, or comments on the business logic of the application. The vast number of reviews makes it difficult to process the reviews manually. Machine learning approaches can support product owners to categorize the reviews and extract useful information.
In this project, the aim is to generate an open-source simulator for our water channel testbed using generative adversarial networks (GAN). The water channel testbed basically consists of a transparent water channel, water tanks, adjustable circulation pump, information molecules, molecule emitter, molecule filters, laser, high-resolution programmable cameras, chemical sensors, synchronization unit between camera and laser, and a computer.