Course Program:
There has been a striking growth in text data such as web pages, news articles, e-mail messages, social media data, and scientific publications in the recent years. Developing tools for processing and utilizing this huge amount of textual information is getting increasingly important. This course will cover techniques for processing and making sense of text data written in natural (human) language. We will examine the core tasks in natural language processing, including morphological analysis, language modeling, syntactic analysis, probabilistic parsing, and semantical interpretation. We will also explore how these techniques can be used in several applications.
Course Schedule (subject to change):
Introduction
Regular Expressions
Basic Text Processing
Morphological Analysis
Finite State Transducers
N-gram Language Models
Smoothing
Naive Bayes Classification
Logistic Regression Classification
Lexical Semantics
Word Embeddings
Neural Language Models
Word Classes and Part-of-Speech Tagging
Hidden Markov Models
Sequence Processing with RNNs
Grammar Formalisms and Treebanks
Syntactic Parsing with CFGs
Statistical Parsing and Probabilistic CFGs
Shallow Semantic Parsing
Paper presentations
Textbook:
Reference Books:
- (Supplementary) Foundations of Statistical Natural Language Processing, C.D.Manning, H.Schütze, MIT Press, 2002.
- Handbook of Natural Language Processing, N.Indurkhya, F.J.Damerau (eds), Chapman & Hall, 2010.
- Natural Language Processing, E.Kumar, I K International Publishing House, 2011.
- Natural Language Processing for Online Applications : Text Retrieval, Extraction and Categorization, P.Jackson, I. Moulinier, John Benjamins, 2007.
- Natural Language Processing with Python, S.Bird, E.Klein, E.Loper, O’Reilly Media, 2009.
- Natural Language Processing and Text Mining, A.Kao, S.R.Poteet (eds), Springer, 2007.