University of Padua -- Natural Language Processing

Dottorato in Ingegneria Informatica
e Elettronica Industriali

A.A. 2003/2004

INTRODUCTION TO
NATURAL LANGUAGE PROCESSING

A small course (twelve hours) in natural language processing (NLP) is offered this year within the "Dottorato in Ingegneria Informatica e Elettronica Industriali". The course assumes general computer science background, elementary knowledge of formal language theory, ability to analyze algorithm complexity, and elementary knowledge of probability. The teacher Giorgio Satta can be contacted at satta@dei.unipd.it

I. SCHEDULE

Twelve (12) hours distributed into three weeks.

Tuesday, Sept. 16	2.30pm-4.30pm	aula Ee
Wednesday, Sept. 17	2.30pm-4.30pm	aula Ee
Tuesday, Sept. 23	2.30pm-4.30pm	aula Le
Wednesday, Sept. 24	2.30pm-4.30pm	aula Me
Tuesday, Sept. 30	2.30pm-4.30pm	aula Ee
Wednesday, Oct. 01	2.30pm-4.30pm	aula Ee

II. PREREQUISITES

Before first lecture, please read the following introductory material to natural language processing, which offers an overview of the field and some historical notes.

D. Jurafsky and J.H. Martin, "Speech and Language Processing", Chapter 1: Introduction, 1-18, Prentice-Hall, 2000	reading0.pdf
L. Lee, "I'm sorry Dave, I'm afraid I can't do that: Linguistics, Statistics, and Natural Language Processing circa 2001", to appear, National Research Council Study on the Fundamentals of Computer Science, 7 pages, 2003	reading1.ps

III. FINITE-STATE MODELS

Regular expressions, finite automata and finite-state transducers; basic properties and algorithms.

We will look at two applications. First application shows how to apply finite-state transducers to speed-up part-of-speech tagging systems. Second application shows how to implement a phonological theory using finite-state transducers.

M-J. Nederhof, "Introduction to Finite State Techniques", Lecture Notes, Univ. of Groningen, 19 pages, 1996	reading2.ps
E. Roche and Y. Schabes, "Deterministic Part-of-Speech Tagging with Finite-State Transducers", Computational Linguistics 21(2), 227-253, 1995	reading3.pdf
R. Frank and G. Satta, "Optimality Theory and the Generative Complexity of Constraint Violability", Computational Linguistics, 24(2), 307-316, 1998	reading4.pdf

IV. CONTEXT-FREE MODELS

Context-free grammars, push-down automata and push-down transducers. Parsing strategies and tabular parsing algorithms.

As an application, we will look at the probabilistic extensions of context-free grammars and maximum-likelihood estimators. We will also closely look at a state-of-the art statistical parser for natural language.

M-J. Nederhof and G. Satta, "Tabular Parsing", Lecture Notes, Univ. of Groningen and Univ. of Padua, 21 pages, 2003	reading5.ps
M. Collins, "Three Generative, Lexicalised Models for Statistical Parsing", 35th Annual Meeting of the Association for Computational Linguistics, 16-23, 1997	reading6.pdf

V. CREDITS

In order to get credits out of this course, candidates should write and discuss with the teacher a short project paper on a problem related to those presented in the class. The specific subject of the paper should be first negotiated with the teacher.

Last update: Fri Aug 29 2003