Information Retrieval and Permanent Data

Research activities:

The advent of the widespread digitization of cultural heritage collections has significant implications for institutions that hold these types of collections. Both digital preservation and access present challenges for owners of cultural heritage collections. The issues that surround access are complex and far-reaching. Many of these issues are not unique to digital cultural heritage, but cultural heritage raises specific questions about supporting access to collections and individual artefacts.

The group has a long experience in developing original methods and tools to support the user in knowledge discovery and exploration through digital cultural collections. In particular:

  • digital library systems
  • digital archive systems
  • metadata and interoperability
  • user engagement
  • user generated content


People: Maristella Agosti (contact person), Giorgio Maria Di Nunzio, Nicola Ferro

Information Retrieval (IR) is concerned with complex systems delivering a variety of key applications to industry and society: Web search engines, (bio)medical search, expertise retrieval systems, intellectual property and patent search, enterprise search, and many others. Nowadays, user tasks and needs are becoming increasingly demanding, the data sources to be searched are rapidly evolving and greatly heterogeneous, the interaction between users and IR systems is much more articulated, and the systems themselves become increasingly complicated and constituted by many interrelated components.

IMS research activities in the field are:

  • design and development of new IR models and indexing strategies
  • multilingual and multimodal information access
  • user context and implicit feedback
  • design and development of evaluation protocols and measures for both qualitative and quantitative assessment of system performances
  • visual analytics for experimental evaluation
  • application of IR techniques to several domains, e.g. cultural heritage, linguistics, social media


People:  Nicola Ferro (contact person), Giorgio Maria Di Nunzio, Maristella Agosti

The research activities of the group focus on innovative interactive visualisation approaches to support machine learning for big data by tackling open research questions in two novel research areas: Interactive Machine Learning (IML) and Visual Analytics (VA). Both areas rely on human knowledge to improve the learning systems. IML focuses on the development of machine learning procedures based on design choices such as selection and creation of the model, definition of evidential features, and the setting of parameters. VA focuses on the design of interactive graphical representations of information that might better support the human ability to perceive and construct meaningful patterns from data.

Core research activities in the field are:

  • Bayesian machine learning
  •  Design of visualisation techniques of probabilistic models for classification and retrieval
  •  Cost sensitive learning
  •  Interactive visualisation techniques for evaluation measures of ML and IR
  •  Interactive relevance feedback
  •  Model diverse sources of evidence for document classification and ranking


People: Giorgio Maria Di Nunzio (contact person), Nicola Ferro

One of the most relevant socio-economic and scientific changes in recent years has been the recognition of data as a valuable asset. The principal driver of this evolution is the Web of Data. The actual paradigm realising the Web of Data is the Linked Open Data (LOD), which by exploiting Web technologies, such as the Resource Framework Description (RDF), allows public data in machine-readable formats to be opened up ready for consumption and re-use. But LOD publishing is just the first step for revealing the ground-breaking potential of this approach residing in the semantic connections between data enabling new knowledge creation and discovery possibilities. LOD is shifting from a publishing paradigm to a knowledge creation and sharing one. One of the research goals of our group is to study approaches which shift focus from the systems handling the data to the data themselves.

Core research activities in the field are:

  • Methodologies for enabling data-level interoperability
  •  Research data publishing
  •  Digital geo-linguistics
  •  Dialectometrics
  •  Data citation

People: Giorgio Maria Di Nunzio (contact person), Nicola Ferro

Structured and semi-structured data deal with representation of and efficient access to information by exploiting the explicit structure present within it. In this context, we focus on:

  • keyword-based search over structured data
  • efficient access to semi-structured and XML data.

Where keyword search is the foremost approach for searching information and it has been successfully applied for retrieving non-structured documents. Nonetheless, retrieving information from documents is intrinsically different from querying structured data sources with either an explicit schema, as relational databases or triple stores, or an implicit one, as tables in textual documents and on the Web.
Efficient access to semi-structured and XML data concerns the development of alternative data models for representing hierarchical data. Instead of using links between nodes or adjacency matrices for representing a tree, we represent a hierarchy as a family of nested sets where the inclusion relationship among sets allows us to represent parent/child relations.


People: Nicola Ferro (contact person), Maristella Agosti

Geomatics is a field of activities referring to the integrated approach of measurement, analysis, management, storage and display of the descriptions and location of spatial data.
Geomatics includes a wide range of activities, from the acquisition and analysis of site-specific spatial data to the application of GIS and remote sensing technologies in environmental management. It plays an important role in land administration and land use management.
Research activities at DEI concern two important aspects in Geomatics: 

  • Cartographic generalization: study and development of algorithms to automatically produce small scale cartographic maps based on the information available in larger scale cartographic data bases. 
  • Quality control of spatial databases: spatial databases are complex systems for which data integrity, data consistency and reliability are vital. Two approaches are investigated: on the one hand quality analysis of spatial databases to assess their consistency, completeness, and integration to point out elements that need to be corrected; on the other hand, preventive mechanisms to achieve the desired quality of the databases produced in the process of map generalization.


People: Michele Moro (contact person)