Dottorato in Ingegneria Informatica
e Elettronica Industriali
A.A. 2003/2004


INTRODUCTION TO
NATURAL LANGUAGE PROCESSING



A small course (twelve hours) in natural language processing (NLP) is offered this year within the "Dottorato in Ingegneria Informatica e Elettronica Industriali". The course assumes general computer science background, elementary knowledge of formal language theory, ability to analyze algorithm complexity, and elementary knowledge of probability. The teacher Giorgio Satta can be contacted at satta@dei.unipd.it




I. SCHEDULE



Twelve (12) hours distributed into three weeks.


Tuesday, Sept. 16 2.30pm-4.30pm aula Ee
Wednesday, Sept. 17 2.30pm-4.30pm aula Ee
Tuesday, Sept. 23 2.30pm-4.30pm aula Le
Wednesday, Sept. 24 2.30pm-4.30pm aula Me
Tuesday, Sept. 30 2.30pm-4.30pm aula Ee
Wednesday, Oct. 01 2.30pm-4.30pm aula Ee



II. PREREQUISITES



Before first lecture, please read the following introductory material to natural language processing, which offers an overview of the field and some historical notes.


D. Jurafsky and J.H. Martin, "Speech and Language Processing", Chapter 1: Introduction, 1-18, Prentice-Hall, 2000 reading0.pdf
L. Lee, "I'm sorry Dave, I'm afraid I can't do that: Linguistics, Statistics, and Natural Language Processing circa 2001", to appear, National Research Council Study on the Fundamentals of Computer Science, 7 pages, 2003 reading1.ps



III. FINITE-STATE MODELS



Regular expressions, finite automata and finite-state transducers; basic properties and algorithms.

We will look at two applications. First application shows how to apply finite-state transducers to speed-up part-of-speech tagging systems. Second application shows how to implement a phonological theory using finite-state transducers.


M-J. Nederhof, "Introduction to Finite State Techniques", Lecture Notes, Univ. of Groningen, 19 pages, 1996 reading2.ps
E. Roche and Y. Schabes, "Deterministic Part-of-Speech Tagging with Finite-State Transducers", Computational Linguistics 21(2), 227-253, 1995 reading3.pdf
R. Frank and G. Satta, "Optimality Theory and the Generative Complexity of Constraint Violability", Computational Linguistics, 24(2), 307-316, 1998 reading4.pdf



IV. CONTEXT-FREE MODELS



Context-free grammars, push-down automata and push-down transducers. Parsing strategies and tabular parsing algorithms.

As an application, we will look at the probabilistic extensions of context-free grammars and maximum-likelihood estimators. We will also closely look at a state-of-the art statistical parser for natural language.


M-J. Nederhof and G. Satta, "Tabular Parsing", Lecture Notes, Univ. of Groningen and Univ. of Padua, 21 pages, 2003 reading5.ps
M. Collins, "Three Generative, Lexicalised Models for Statistical Parsing", 35th Annual Meeting of the Association for Computational Linguistics, 16-23, 1997 reading6.pdf



V. CREDITS



In order to get credits out of this course, candidates should write and discuss with the teacher a short project paper on a problem related to those presented in the class. The specific subject of the paper should be first negotiated with the teacher.




Last update: Fri Aug 29 2003