TACHIR: a Tool for the Automatic Construction of Hypertexts for
Information Retrieval
TACHIR is the Tool for the Automatic Construction of Hypertexts for
Information Retrieval (IR) designed and developed by the Information Systems Management
research group at the Department of Electronics and Computer
Science of the University of
Padova, Italy. TACHIR was developed by Massimo Melucci as part of
his Ph.D. thesis. Massimo designed TACHIR with Maristella Agosti, who was his
Ph.D. supervisor, and Fabio
Crestani. This page describes the architecture of TACHIR; for the
detailed description of the methodologies underlying the approach to
automatic hypertext construction, and specifically to the automatic
hyper-textbook construction the reader can refer to [1,2,4].
TACHIR aims to fully automatically build a hypertext starting from a
"flat" collection of "flat" documents in order to retrieve information
from the document collection itself. The hypertext has the EXPLICIT
general conceptual schema [3], as depicted in the
following Figure.

The general schema includes two different levels: The level of data
and the level of auxiliary data. Data are the elementary documents to
be transformed into hypertext nodes for IR purposes. Auxiliary data
are terms or concepts describing the document content, and are
transformed into nodes as well to be used during the retrieval
process. As long as the hypertext has been built, the user can
effectively access the hypertext by navigating using links among data
and auxiliary data nodes.
The architecture of TACHIR is depicted in the following Figure.

TACHIR is a tool made of different software modules:
- the object-oriented IR class library,
- the indexing engine,
- the automatic hypertext construction engine,
- the querying tool
The resulting hypertext can be browsed and searched through a standard
Web browser.
From a conceptual point of view, the class library is the
TACHIR backbone. IR objects implement the basic IR structures and the
abstract interfaces of the library allow user to use the IR
functionalities. The class library includes classes that are
independent of the TACHIR methodology and can be used also in a
generic IR framework. The indexer engine takes as input the
text collection and the stop list, and produces the indexes to be used
for automatic hypertext construction. Stop word removal, stemming, and
weighting are algorithms are those standard being used in IR. The
automatic hypertext construction engine takes as input the
indexes and produces the links among data and auxiliary data. TACHIR
assumes that the user access the IR hypertext using a Web
browser such as Explorer or NetScape. This means that HTML has
been used for marking the documents and to implement the
hypertexts. The querying tool permits to access the documents
through free text queries and the retrieved documents can work as
entry points to the hypertext itself.
In 1997, Fabio Crestani and Massimo Melucci started to work on the
Hyper-TextBook (HTB) project [4]. They addressed
the problem of automatically converting a textbook to its hypertextual
version using some of the technology developed for TACHIR. The aim of
the project was to design, develop and test a methodology and a tool
for the automatic authoring of HTBs from full-text electronic
textbooks. The target documents were textbooks because of their
characteristics, usage, and relevance to the area of Information
Retrieval and Digital Libraries. Indeed, the availability of
electronic textbooks within digital libraries, the wide area access
provided by a digital library, and the need of providing potential
digital library users with wide area access to HTBs have been the main
reasons why we launched the project. The conceptual structure of the
hypertext and TACHIR algorithms have significantly been re-designed
and implemented in order to construct HTBs. The resulting HTBs can be
used both as a self instruction manual and as a self reference
source. The HTB included in this CD-ROM is the result of a case-study
conducted on the C.J. Van Rijsbergen's textbook on IR.
The modifications made to TACHIR aims to enhance over the textual
version of the textbook by automatically adding links of different
types to those inserted by the textbook author. These links improve
the effectiveness of the use of the book in search oriented tasks.
- Links between textbook pages and terms in the subject index
produced by the author of the textbook. These links enable accessing
parts of the textbook that have not been specifically indexed by the
author, but that are semantically related to items in the subject
index.
- Links between terms in the subject index. These links enable
navigating among terms expressing similar concepts or subjects by thus
permitting the user to search the textbook pages.
- Links between textbook pages. These links enable navigating among
pages about subjects by thus permitting the user to access pages that
have not been specifically indexed by the author.
References
- M. Agosti and F. Crestani. A methodology for the
automatic construction of a Hypertext for Information Retrieval. In
Proceedings of the ACM Symposium on Applied Computing, pages
745-753, Indianapolis, USA, February 1993.
- M. Agosti, F. Crestani, and M. Melucci. Automatic
construction of hypermedia for information retrieval. In ACM
Multimedia Systems, vol. 3:15-24, 1995 New York, USA.
- M. Agosti, G. Gradenigo, and P.G. Marchetti. A
hypertext environment for interacting with large textual databases.
Information Processing and Management,
vol. 28(3):371-387, 1992.
- F. Crestani, and M. Melucci. A Methodology for the
Enhancement of a Hypertext Version of a Textbook by the Automatic
Insertion of Links in the Subject Index. IEEE Advances in Digital
Libraries (IEEE-ADL) Conference, Santa Barbara, CA, April
1998.
- F. Crestani, and M. Melucci.
A case study of automatic authoring: from a textbook to a hyper-textbook.
Data and Knowledge Engineering, 27(1),
pages 1-30. September 1998.
Massimo Melucci
Dipartimento di Elettronica e Informatica
Via Gradenigo, 6/A
35131 Padova
Italy
Telephone: +39 049 827 7802
Fax: +39 049 827 7826
|