Filter by Type

Filter by Year

Sort by Year

Towards an Anatomy of IR System Component Performances

Nicola Ferro and Gianmaria Silvello
Journal Paper Journal of the Association for Information Science and Technology (JASIST), accepted for publication, pp. 1-21, 2017.

Abstract

Information Retrieval (IR) systems are the prominent means for searching and accessing huge amounts of unstructured information on the Web and elsewhere. They are complex systems, constituted by many different components interacting together, and evaluation is crucial to both tune and improve them. Nevertheless, in the current evaluation methodology, there is still no way to determine how much each component contributes to the overall performances and how the components interact together. This hampers the possibility of a deep understanding of IR system behaviour and, in turn, prevents us from designing ahead which components are best suited to work together for a specific search task.

In this paper, we move the evaluation methodology one step forward by overcoming these barriers and beginning to devise an “anatomy” of IR systems and their internals. In particular, we propose a methodology based on the General Linear Mixed Model (GLMM) and ANalysis Of VAriance (ANOVA) to develop statistical models able to isolate system variance and component effects as well as their interaction, by relying on a Grid of Points (GoP) containing all the combinations of the analysed components. We apply the proposed methodology to the analysis of two relevant search tasks – news search and Web search – by using standard TREC collections. We analyse the basic set of components typically part of an IR system, namely stop lists, stemmers and n-grams, and IR models. In this way, we derive insights about English text retrieval.

Data Citation: a Computational Challenge

Susan Davidson, Peter Buneman, Daniel Deutch, Tova Milo and Gianmaria Silvello
Conference Paper Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2017), pp. 1-4, 2017.

Abstract

Data citation is an interesting computational challenge, whose solution draws on several well-studied problems in database theory: query answering using views, and provenance. We describe the problem, suggest an approach to its solution, and highlight several open research problems, both practical and theoretical.

Automating data citation: the eagle-i experience

Abdussalam Alawini, Leshang Chen, Susan Davidson and Gianmaria Silvello
Conference Paper Proc. of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2017), accepted for publication, 2017.

Abstract

Data citation is of growing concern for owners of curated databases, who wish to give credit to the contributors and curators responsible for portions of the dataset and enable the data retrieved by a query to be later examined. While several databases specify how data should be cited, they leave it to users to manually construct the citations and do not generate them automatically.

We report our experiences in automating data citation for an RDF dataset called eagle-i, and discuss how to gen- eralize this to a citation framework that can work across a variety of different types of databases (e.g. relational, XML, and RDF). We also describe how a database administrator would use this framework to automate citation for a partic- ular dataset.

A Model for Fine-Grained Data Citation

Susan Davidson, Daniel Deutch, Tova Milo and Gianmaria Silvello
Conference Paper Proc. of the biennial Conference on Innovative Data Systems Research (CIDR 2017), 2017.

Abstract

An increasing amount of information is being collected in structured, evolving, curated databases, driving the question of how information extracted from such datasets via queries should be cited. Unlike traditional research products, such books and journals, which have a fixed granularity, data citation is a challenge because the granularity varies. Different portions of the database, with varying granularity, may have different citations.

Furthermore, there are an infinite number of queries over a database, each accessing and generating different subsets of the database, so we cannot hope to explicitly attach a citation to every possible result set and/or query. We present the novel problem of automatically generating citations for general queries over a relational database, and explore a solution based on a set of citation views, each of which attaches a citation to a view of the database. Citation views are then used to automatically construct citations for general queries. Our approach draws inspiration from results in two areas, query rewriting using views and database provenance and combines them in a robust model. We then discuss open issues in developing a practical solution to this challenging problem.

Learning to Cite Framework: How to Automatically Construct Citations for Hierarchical Data

Gianmaria Silvello
Journal Paper Journal of the Association for Information Science and Technology (JASIST), in press, 2017.

Abstract

The practice of citation is foundational for the propagation of knowledge along with scientific development and it is one of the core aspects on which scholarship and scientific publishing rely.

Within the broad context of data citation, we focus on the automatic construction of citations problem for hierarchically structured data. We present the “learning to cite” framework which enables the automatic construction of human- and machine-readable citations with different level of coarseness. The main goal is to reduce the human intervention on data to a minimum and to provide a citation system general enough to work on heterogeneous and complex XML datasets. We describe how this framework can be realized by a system for creating citations to single nodes within an XML dataset and, as a use case, show how it can be applied in the context of digital archives.

We conduct an extensive evaluation of the proposed citation system by analyzing its effectiveness from the correctness and completeness viewpoints, showing that it represents a suitable solution that can be easily employed in real-world environments and that reduces human intervention on data to a minimum.

Visual Analytics for Information Retrieval Evaluation Campaigns

Marco Angelini, Nicola Ferro, Giuseppe Santucci and Gianmaria Silvello
WorkshopIn M. Sedlmair and C. Tominski eds. EuroVis Workshop on Visual Analytics (EuroVis 2017). 2017.

Measuring Dataset Impact: Data Citation as an Economic Process

Gianmaria Silvello
Workshop AbstractInformation Retrieval and Interaction Fest in Honour of Peter Ingwersen. (October 2016)

3.5K runs, 5K topics, 3M assessments and 70M measures: What trends in 10 years of Adhoc-ish CLEF?

Nicola Ferro and Gianmaria Silvello
Journal Paper Information Processing & Management (IP&M), 53(1):175-202, 2017.

Abstract

Multilingual information access and retrieval is a key concern in today global society and, despite the considerable achievements over the past years, it still presents many challenges. In this context, experimental evaluation represents a key driver of innovation and multilinguality is tackled in several evaluation initiatives worldwide, such as CLEF in Europe, NTCIR in Japan and Asia, and FIRE in India. All these activities have run several evaluation cycles and there is a general consensus about their strong and positive impact on the development of multilingual information access systems. However, a systematic and quantitative assessment of the impact of evaluation initiatives on multilingual information access and retrieval over the long period is still missing.

Therefore, in this paper we conduct the first systematic and large-scale longitudinal study on several CLEF Adhoc-ish tasks – namely the Adhoc, Robust, TEL, and GeoCLEF labs – in order to gain insights on the performance trends of monolingual, bilingual and multilingual information access systems, spanning several European and non-European languages, over a range of 10 years.

We learned that monolingual retrieval exhibits a stable positive trend for many of the languages analyzed, even though the performance increase is not always steady from year to year due to the varying interests of the participants, who may not always be focused on just increasing performances. Bilingual retrieval demonstrates higher improvements in recent years – probably due to the better language resources now available – and it also outperforms monolingual retrieval in several cases. Multilingual retrieval shows improvements over the years and performances are comparable to those of bilingual and monolingual retrieval, and sometimes even better. Moreover, we have found evidence that the rule-of-thumb of a 3-year duration for an evaluation task is typically enough since top performances are usually reached by the third year and sometimes even by the second year, which then leaves room for research groups to investigate relevant research issues other than top performances.

Overall, this study provides quantitative evidence that CLEF has achieved the objective which led to its establishment, i.e. making multilingual information access a reality for European languages. However, the outcomes of this paper not only indicate that CLEF has steered the community in the right direction, but they also highlight the many open challenges for multilinguality. For instance, multilingual technologies greatly depend on language resources and targeted evaluation cycles help not only in developing and improving them, but also in devising methodologies which are more and more language-independent. Another key aspect concerns multimodality, intended not only as the capability of providing access to information in multiple media, but also as the ability of integrating access and retrieval over different media and languages in a way that best fits with user needs and tasks.

Semantic Representation and Enrichment of Information Retrieval Experimental Data

Gianmaria Silvello, Georgeta Bordea, Nicola Ferro, Paul Buitelaar and Toine Bogers
Journal Paper International Journal on Digital Libraries, 18(2):145-172, 2017.

Abstract

Experimental evaluation carried out in international large-scale campaigns is a fundamental pillar of the scientific and technological advancement of Information Retrieval (IR) systems. Such evaluation activities produce a large quantity of scientific and experimental data, which are the foundation for all the sub- sequent scientific production and development of new systems. In this work, we discuss how to semantically annotate and interlink this data, with the goal of enhancing their interpretation, sharing, and reuse. We discuss the underlying evaluation workflow and propose a Resource Description Framework (RDF) model for those workflow parts. We use expertise retrieval as a case study to demonstrate the benefits of our semantic representation approach. We employ this model as a means for exposing experimental data as Linked Open Data (LOD) on the Web and as a basis for enriching and automatically connecting this data with expertise topics and expert profiles.

In this context, a topic-centric approach for expert search is proposed, addressing the extraction of expertise topics, their semantic grounding with the LOD cloud, and their connection to IR experimental data. Several methods for expert profiling and expert finding are analysed and evaluated. Our results show that it is possible to construct expert profiles starting from automatically extracted expertise topics and that topic-centric approaches outperform state-of-the-art language modelling approaches for expert finding.

The CLEF Monolingual Grid of Points

Nicola Ferro and Gianmaria Silvello
Conference PaperInformation Access Evaluation. Multilinguality, Multimodality, and Interaction - Seventh International Conference of the Cross-Language Evaluation Forum, CLEF 2016: Evora, Portugal, September 5-8, 2016. pp. 16-27. In Lecture Notes in Computer Science 9822, Springer International Publishing Switzerland. .

Abstract

In this paper we run a systematic series of experiments for creating a grid of points where many combinations of retrieval methods and components adopted by MultiLingual Information Access (MLIA) systems are represented. This grid of points has the goal to provide insights about the effectiveness of the different components and their interaction and to identify suitable baselines with respect to which all the comparisons can be made.

We publicly release a large grid of points comprising more than 4K runs obtained by testing 160 IR systems combining different stop lists, stem- mers, n-grams components and retrieval models on CLEF monolingual tasks for eight European languages. Furthermore, we evaluate such grid of points by employing four different effectiveness measures and provide some insights about the quality of the created grid of points and the behaviour of the different systems.

"Data Citation is Coming". Introduction to the Special Issue on Data Citation

Gianmaria Silvello and Nicola Ferro(2016)
Journal Paper w/o prBulletin of IEEE Technical Committee on Digital Libraries, Volume 12 Issue 1, May 2016.

Abstract

This is the introduction to the special issue on data citation of the Bulletin of IEEE Technical Committee on Digital Libraries. In this introduction we state the “lay of the land” of research on data citation, we discuss some open issues and possible research directions and present the main contributions provided by the papers of the special issue.

From Users to Systems: Identifying and Overcoming Barriers to Efficiently Access Archival Data

Nicola Ferro and Gianmaria Silvello (2016)
workshop paper 1st International Workshop on Accessing Cultural Heritage at Scale (ACHS'16), 22nd June 2016, Newark, NJ, USA.

Abstract

Digital archives are one of the pillars of our cultural heritage and they are increasingly opening up to end-users by focusing on accessibility of their resources. Moreover, digi- tal archives are complex and distributed systems where interoperability plays a central role and efficient access and exchange of resources is a challenge. In this paper, we investigate user and interoperability requirements in the archival realm and we discuss how next generation archival systems should operate a paradigm shift bringing a new model of access to archival resources which allows to better address these needs. To this end, we employ the data structures and query primitives based on the NEsted SeTs for Object hieRarchies (NESTOR) model to efficiently access archival data overcoming the identified barriers and limitations.

A General Linear Mixed Models Approach to Study System Component Effects

Nicola Ferro and Gianmaria Silvello
Conference Paper 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016), pages 25-34, ACM Press, New York, NY, USA, 2016.

Abstract

Topic variance has a greater effect on performances than system variance but it cannot be controlled by system developers who can only try to cope with it. On the other hand, system variance is important on its own, since it is what system developers may affect directly by changing system components and it determines the differences among systems.

In this paper, we face the problem of studying system variance in order to better understand how much system components contribute to overall performances. To this end, we propose a methodology based on General Linear Mixed Model (GLMM) to develop statistical models able to isolate system variance, component effects as well as their interaction. We apply the proposed methodology to the analysis of TREC Ad-hoc data in order to show how it works and discuss some interesting outcomes of this new kind of analysis. Finally, we extend the analysis to different evaluation mea- sures, showing how they impact on the sources of variance.

A Visual Analytics Approach for What-If Analysis of Information Retrieval Systems

Marco Angelini, Nicola Ferro, Giuseppe Santucci and Gianmaria Silvello
Conference Paper 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016), pages 1081-1084, ACM Press, New York, NY, USA, 2016

Abstract

We present the innovative visual analytics approach of the VATE2 system, which eases and makes more effective the experimental evaluation process by introducing the what-if analysis. The what-if analysis is aimed at estimating the possible effects of a modification to an IR system to select the most promising fixes before implementing them, thus saving a considerable amount of effort. VATE2 builds on an analytical framework which models the behavior of the systems in order to make estimations, and integrates this analytical framework into a visual part which, via proper interaction and animations, receives input and provides feedback to the user.

Descendants, Ancestors, Children and Parent: A Set-Based Approach to Efficiently Address XPath Primitives

Nicola Ferro and Gianmaria Silvello
Journal Paper Information Processing & Management (IP&M) , 52(3):399-429, 2016.

Abstract

XML is a pervasive technology for representing and accessing semi-structured data. XPath is the standard language for navigational queries on XML documents and there is a growing demand for its efficient processing.

In order to increase the efficiency in executing four navigational XML query primitives, namely descendants, ancestors, children and parent, we introduce a new paradigm where traditional approaches based on the efficient traversing of nodes and edges to reconstruct the requested subtrees are replaced by a brand new one based on basic set operations which allow us to directly return the desired subtree, avoiding to create it passing through nodes and edges.

Our solution stems from the NEsted SeTs for Object hieRarchies (NESTOR) formal model, which makes use of set-inclusion relations for representing and providing access to hierarchical data. We define in-memory efficient data structures to implement NESTOR, we develop algorithms to perform the descendants, ancestors, children and parent query primitives and we study their computational complexity.

We conduct an extensive experimental evaluation by using several datasets: digital archives (EAD collections), INEX 2009 Wikipedia collection, and two widely-used synthetic datasets (XMark and XGen). We show that NESTOR-based data structures and query primitives consistently outperform state-of-the-art solutions for XPath processing at execution time and they are competitive in terms of both memory occupation and pre-processing time.

38th European Conference on IR Research, ECIR 2016

Nicola Ferro, Fabio Crestani, Marie-Francine Moens, Josiane Mothe, Fabrizio Silvestri, Giorgio Maria Di Nunzio, Claudia Hauff, and Gianmaria Silvello
Editorship Proceedings of the Advances in Information Retrieval, Lecture Notes in Computer Science 9626, Springer 2016.

Keyword-based Search over Databases: A Roadmap for a Reference Architecture Paired with an Evaluation Framework

Sonia Bergamaschi, Nicola Ferro, Francesco Guerra and Gianmaria Silvello
Journal Paper Transactions on Computational Collective Intelligence (TCCI), LNCS 9630, vol. 21, pp. 1-20, 2016

Abstract

Structured data sources promise to be the next driver of a significant socio-economic impact for both people and companies. Nevertheless, accessing them through formal languages, such as SQL or SPARQL, can become cumbersome and frustrating for end-users. To overcome this issue, keyword search in databases is becoming the technology of choice, even if it suffers from efficiency and effectiveness problems that prevent it from being adopted at Web scale.

In this paper, we motivate the need for a reference architecture for keyword search in databases to favor the development of scalable and effective components, also borrowing methods from neighbor fields, such as information retrieval and natural language processing. Moreover, we point out the need for a companion evaluation framework, able to assess the efficiency and the effectiveness of such new systems and in the light of real and compelling use cases.

The Twist Measure for IR Evaluation: Taking User’s Effort into Account

Nicola Ferro, Gianmaria Silvello, Heikki Keskustalo, Ari Pirkola and Kalervo Jӓrvelin
Journal Paper Journal of the Association for Information Science and Technology (JASIST), vol. 67, num. 3, pp. 620-648, March 2016.

Abstract

In this paper we present a novel measure for ranking evaluation, called Twist (τ). It is a measure for informational intents, it handles both binary and graded relevance, and it shares the scene mainly with Average Precision (AP), cumulated-gain family of metrics as Discounted Cumulated Gain (DCG), and Rank-Biased Precision (RBP).

The above mentioned metrics adopt different user models but share a common approach: they measure the “utility” of a ranked list for the user and this “utility” is the user motivation for continuing to scan the result list when non-relevant documents are retrieved. The different user models adopted account for the way in which this “utility” (or gain) is computed.

τ stems from a different observation: searching is nowadays a commodity, like water, electricity and the like, and it is natural for users assume that it is available, it fits their needs, it works well. In this sense, they may not perceive the “utility” they have in finding relevant documents but rather they may perceive that the system is just doing what it is expected to do. On the other hand, they may feel uneasy when the system returns non-relevant documents in wrong positions since they are then forced to do additional work to get the desired information, work they would not have expected to do when using a commodity. Thus, τ tries to grasp the avoidable effort caused to the user by the actual ranking of the system with respect to an ideal ranking.

We provide a formal definition of τ as well as a demonstration of its properties. We introduce the notion of effort-gain plots, which allow us to easily spot those systems that look similar from a utility/gain perspective but are actually different in terms of the effort required of their users to attain that utility/gain. Finally, by means of an extensive experimental evaluation with TREC collections, τ is proven not to be highly correlated with existing metrics, to be stable when shallow pools are employed, and to have a good discriminative power.

In short, τ grasps different aspects of system performances with respect to traditional metrics, it does not require extensive and costly assessments, and it is a robust tool for detecting differences between systems.

Digital Library Interoperability at High Level of Abstraction

Maristella Agosti, Nicola Ferro and Gianmaria Silvello
Journal PaperFuture Generation Computer Systems, Volume 55, Pages 129–146, February 2016.

Abstract

Digital Library (DL) are the main conduits for accessing our cultural heritage and they have to address the requirements and needs of very diverse memory institutions, namely Libraries, Archives and Museums (LAM). Therefore, the interoperability among the Digital Library System (DLS) which manage the digital resources of these institutions is a key concern in the field.

DLS are rooted in two foundational models of what a digital library is and how it should work, namely the DELOS Reference Model and the Streams, Structures, Spaces, Scenarios, Societies (5S) model. Unfortunately these two models are not exploited enough to improve interoperability among systems.

To this end, we express these foundational models by means of ontologies which exploit the methods and technologies of Semantic Web and Linked Data. Moreover, we link the proposed ontologies for the foundational models to those currently used for publishing cultural heritage data in order to maximize interoperability.

We design an ontology which allows us to model and map the high level concepts of both the 5S model and the DELOS Reference Model. We provide detailed ontologies for all the domains of such models, namely the user, content, functionality, quality, policy and architectural component domains in order to make available a working tool for making DLS interoperate together at a high level of abstraction. Finally, we provide a concrete use case about digital annotation of illuminated manuscripts to show how to apply the proposed ontologies and illustrate the achieved interoperability between the 5S and DELOS Reference models.

Report on ECIR 2016: 38th European Conference on Information Retrieval

Ferro, N., Crestani, F., Moens, M.-F., Mothe, J., Silvestri, F., Kekäläinen, J., Rosso, P., Clough, P., Pasi, G., Lioma, C., Mizzaro, S., Di Nunzio, G. M., Hauff, C., Alonso, O., Serdyukov, P., and Silvello, G. (2016)
Journal Paper w/o prSIGIR Forum, Volume 50 Issue 1, 2016. ACM New York, NY, USA.

Fast Access to XML Data: A Set-based Approach

Nicola Ferro and Gianmaria Silvello (2016)
Conference Paper In Paolini, P., Bochicchio, M. A., and Mecca, G., editors, Proc. 24th Italian Symposium on Advanced Database Systems (SEBD 2016)

What-If Analysis: A Visual Analytics Approach to Information Retrieval Evaluation

Marco Angelini, Nicola Ferro, Giuseppe Santucci and Gianmaria Silvello (2016)
Workshop PaperProceedings of the 7th Italian Information Retrieval Workshop, IIR 2016. S. Orlando, Di Nunzio, G. M. and Nardini, F. M. Eds., 2016, CEUR Workshop Proceedings.

An Ontology to Make the DELOS Reference Model and the 5S Model Interoperable

M. Agosti, N. Ferro and G. Silvello (2016)
Nat. Conference Paper In Marinai, S., Bertini, M., Orio, N., and Ferilli, S., editors, Proc. 12th Italian Research Conference on Digital Libraries (IRCDL 2016), Communications in Computer and Information Science (CCIS), Springer, Heidelberg, Germany.

IR Scientific Data: How to Semantically Represent and Enrich Them

T. Bogers, G. Bordea, P. Buitelaar, N. Ferro and G. Silvello (2016)
Extended Abstract In Corazza, A., Montemagni, S., and Semeraro, G., editors, Proc. 3rd Italian Conference on Computational Linguistics (CLiC-it 2016).

A Methodology for Citing Linked Open Data Subsets

Gianmaria Silvello
Journal PaperD-Lib Magazine 21 (1/2), 2015, available on-line at the URL: http://www.dlib.org/dlib/january15/silvello/01silvello.html

Abstract

In this paper we discuss the problem of data citation with a specific focus on Linked Open Data. We outline the main requirements a data citation methodology must fulfill: (i) uniquely identify the cited objects; (ii) provide descriptive metadata; (iii) enable variable granularity citations; and (iv) produce both human- and machine-readable references. We propose a methodology based on named graphs and RDF quad semantics that allows us to create citation meta-graphs respecting the outlined requirements. We also present a compelling use case based on search engines experimental evaluation data and possible applications of the citation methodology.

Rank-Biased Precision Reloaded: Reproducibility and Generalization

Nicola Ferro and Gianmaria Silvello
Conference PaperIn N. Fuhr, A. Rauber, G. Kazai and A. Hanbury, eds. Proc of the 37th European Conference on Information Retrieval (ECIR 2015), Lecture Notes in Computer Science (LNCS) 9022, pp. 768-780. Springer International Publishing Switzerland.

Abstract

In this work we reproduce the experiments presented in the paper entitled “Rank-Biased Precision for Measurement of Retrieval Effectiveness”. This paper introduced a new effectiveness measure – Rank- Biased Precision (RBP) – which has become a reference point in the IR experimental evaluation panorama.

We will show that the experiments presented in the original RBP paper are repeatable and we discuss points of strength and limitations of the approach taken by the authors. We also present a generalization of the results by adopting four experimental collections and different analysis methodologies.

Visual Analytics for Information Retrieval Evaluation (VAIRЁ 2015)

Marco Angelini, Nicola Ferro, Giuseppe Santucci and Gianmaria Silvello
Conference PaperIn N. Fuhr, A. Rauber, G. Kazai and A. Hanbury, eds. Proc of the 37th European Conference on Information Retrieval (ECIR 2015), Lecture Notes in Computer Science (LNCS) 9022, pp. 809–812. Springer International Publishing Switzerland.

Abstract

Measuring is a key to scientific progress. This is particularly true for research concerning complex systems, whether natural or human-built. The tutorial introduced basic and intermediate concepts about lab-based evaluation of information retrieval systems, its pitfalls, and shortcomings and it complemented them with a recent and innovative angle to evaluation: the application of methodologies and tools coming from the Visual Analytics (VA) domain for better interacting, understanding, and exploring the experimental results and Information Retrieval (IR) system behaviour.

Unfolding Off-the-shelf IR Systems for Reproducibility

Emanuele Di Buccio, Giorgio Maria Di Nunzio, Nicola Ferro, Donna Harman, Maria Maistro and Gianmaria Silvello
Workshop PaperSIGIR Workshop on Reproducibility, Inexplicability, and Generalizability of Results, RIGOR 2015.

Abstract

In this position paper, we discuss the issue of how to ensure reproducibility of the results when off-the-shelf open source Information Retrieval (IR) systems are used. These systems provided a great advancement to the field but they rely on many configurations parameters which are often implicit or hidden in the documentation and/or source code. If not fully understood and made explicit, these parameters may make it difficult to reproduce results or even to understand why a system is not behaving as expected.

The paper provides examples of the effects of hidden parameters in off-the-shelf IR systems, describes the enabling technologies needed to embody the approach, and show how these issues can be addressed in the broader context of component based IR evaluation.

We propose a solution for systematically unfolding the configuration details of off-the-shelf IR systems and understanding whether a particular instance of a system using is behaving as expected. The proposal requires to: 1) build a taxonomy of components used by off-the-shelf systems, 2) uniquely identify them and their combination in a given configuration, 3) run each configuration on standard test collections, 4) compute the expected performance measures for each run, 4) and publish on a Web portal all the gathered information in order to make accessible and comparable for everybody how an off-the-shelf system with a given configuration is expected to behave.

Linked Open Data Framework for Serendipity in History of Art Research

Gianmaria Silvello
Workshop Paper1st AI*IA Workshop on Intelligent Techniques At LIbraries and Archives, IT@LIA 2015. S. Ferilli and N. Ferro Eds., CEUR-WS.org, Vol. 1509, 2015.

Abstract

In this paper we outline the main lines of research for defining a framework based on Linked Open Data (LOD) for supporting knowledge creation in the Cultural Heritage (CH) field with a particular focus on History of Art research.

We delineate the main challenges we need to deal with and we explore the state-of-the-art in LOD publishing systems, LOD citation and authority management. Furthermore, we introduce the idea of computer-aided serendipity in History of Art research with the purpose of contributing to the advancement of the field and to the definition of new methodologies for entity linking and retrieval.

CLEF 2000-2014: Lessons Learnt from Ad Hoc Retrieval

Nicola Ferro and Gianmaria Silvello
Workshop PaperProceedings of the 6th Italian Information Retrieval Workshop, IIR 2015. P. Boldi, R. Perego, F. Sebastiani Eds., 2014, CEUR Workshop Proceedings, Volume 1404.

A Graphical View of Distance Between Rankings: The Point and Area Measures

Giorgio Maria Di Nunzio and Gianmaria Silvello
Workshop PaperProceedings of the 6th Italian Information Retrieval Workshop, IIR 2015. P. Boldi, R. Perego, F. Sebastiani Eds., 2014, CEUR Workshop Proceedings, Volume 1404.

A Perspective Look at Keyword-based Search Over Relation Data and its Evaluation

Sonia Bergamaschi, Nicola Ferro, Francesco Guerra, and Gianmaria Silvello (2015)
Conference Paper In Atzeni, P., Lenzerini, M., Lembo, D., and Torlone, R., editors, Proc. 23rd Italian Symposium on Advanced Database Systems (SEBD 2015)

The PREFORMA Project: Federating Memory Institutions for Better Compliance of Preservation Formats

L. Cappellato, N. Ferro, A. Fresa, M. Geber, B. Justrel, B. Lemmen, C. Prandoni, and G. Silvello (2015)
Conference Paper In Calvanese, D., De Nart, D. and Tasso, C., editors, Proc. 11th Italian Research Conference on Digital Libraries (IRCDL 2015), CCIS 612, Springer, Germany, pp. 86-91

Towards a Semantic Web Enabled Representation of DL Foundational Models: The Quality Domain Example

Nicola Ferro and Gianmaria Silvello (2015)
Conference Paper In Calvanese, D., De Nart, D. and Tasso, C., editors, Proc. 11th Italian Research Conference on Digital Libraries (IRCDL 2015), CCIS 612, Springer, Germany, pp. 24-35

Interaction, Measures and Models

Gianmaria Silvello, Leif Azzopardi, Charles Clarke, Matthias Hagen, and Robert Villa
Journal Paper w/o pr In "Evaluation Methodologies in Information Retrieval", M. Agosti, N. Fuhr, E. Toms and P. Vakkari eds. Dagstuhl Seminar 13441, Dagstuhl Reports 3(10):123–126. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. ISSN 2192-5283. 2014.

A Visual Tool for Information Retrieval Performance Evaluation and Failure Analysis

Marco Angelini, Nicola Ferro, Giuseppe Santucci and Gianmaria Silvello
Journal PaperJournal of Visual Languages and Computing, 25(4):394–413, Elsevier, August 2014.

Abstract

Objective: Information Retrieval (IR) is strongly rooted in experimentation where new and better ways to measure and interpret the behavior of a system are key to scientific advancement. This paper presents an innovative visualization environment: Visual Information Retrieval Tool for Upfront Evaluation (VIRTUE), which eases and makes more effective the experimental eval- uation process.

Methods: VIRTUE supports and improves performance analysis and failure analysis. Performance analysis: VIRTUE offers interactive visualizations based on well-know IR met- rics allowing us to explore system performances and to easily grasp the main problems of the system.

Failure analysis: VIRTUE develops visual features and interaction, allowing researchers and developers to easily spot critical regions of a ranking and grasp possible causes of a failure.

Results: VIRTUE was validated through a user study involving IR experts. The study reports on a) the scientific relevance and innovation and b) the comprehensibility and efficacy of the visualizations. Conclusion: VIRTUE eases the interaction with experimental results, supports users in the evaluation process and reduces the user effort.

Practice: VIRTUE will be used by IR analysts to analyze and understand experimental re- sults. Implications: VIRTUE improves the state-of-the-art in the evaluation practice and integrates Visualization and IR research fields in an innovative way.

Comparing Methodologies: Linked Open Data and Digital Libraries

Karen Coyle Gianmaria Silvello and Anna Maria Tammaro
Conference PaperProceedings of the Third AIUCD Annual Conference on Humanities and Their Methods in the Digital Ecosystem (AIUCD '14), Selected Papers. Francesca Tomasi, Roberto Rosselli Del Turco, and Anna Maria Tammaro (Eds.). ACM Press, New York, NY, USA. ISBN: 978-1-4503-3295-8.

Abstract

This paper reports the outcomes of the conversation moderated by Anna Maria Tammaro, which took place in Bologna during the third AIUCD (Associazione per l'Informatica Umanistica e la Cultura Digitale) conference, between Karen Coyle and Gianmaria Silvello about convergences and divergences of Cultural Heritage (CH) and Computer Science (CS) communities about digital libraries and the Linked Open Data (LOD) paradigm. The conversation has been stimulated in the context of the community of Digital Humanities (DH) scholars, in order to actively engaging them in the linked open data and digital libraries services.

The LOD paradigm is a promising technology not only for opening up digital libraries resources, but also for augmenting the discoverability, re-use, enrichment and sharing of their resources on the Web. For the digital libraries LOD can represent a quite significant shift from a "closed paradigm" where the domain expert (e.g. the librarian) has the control of the resources to an "open paradigm" where the resources are free to circulate and evolve "without" explicit control of domain experts.

In this paper we report some existing positive experiences of integration of the LOD paradigm in the digital library context where the LOD has been used as a publishing paradigm. We also discuss some limitations of the current approach by presenting some open problems that should be investigated to fully realize the LOD paradigm potentialities.

A Linked Open Data Approach for Geolinguistics Applications

Emanuele Di Buccio, Giorgio Maria Di Nunzio and Gianmaria Silvello
Journal PaperInternational Journal on Metadata, Semantics and Ontologies (IJMSO), Vol. 9, No. 1, 2014.

Abstract

The aim of digital geolinguistic systems is to encourage the integration of different competencies by stimulating the cooperation between linguists, historians, archaeologists, and ethnographers. These systems explore the relationship between language and cultural adaptation and change and they can be used as instructional tools, presenting complex data and relationships in a way accessible to all educational levels.

However, the heterogeneity of geolinguistic projects has been recognized as a key problem limiting the reusability of linguistic tools and data collections. In this paper, we propose an approach based on Linked Open Data (LOD) which moves the focus from the systems handling the data to the data themselves with the main goal of increasing the level of interoperability of geolinguistic applications and the reuse of the data. We defined an extensible ontology for geolinguistic resources based on the common ground defined by current European linguistic projects. We provide a Geolinguistic Linked Open Dataset based on the data case study of a linguistic project named Atlante Sintattico d’Italia, Syntactic Atlas of Italy (ASIt). Finally, we show a geolinguistic application which exploits this dataset for dynamically generating linguistic maps.

NESTOR: A Formal Model for Digital Archives

Nicola Ferro and Gianmaria Silvello
Journal PaperInformation Processing & Management (IP&M), 49(6):1206-1240, 2013.

Abstract

Archives are an extremely valuable part of our cultural heritage since they represent the trace of the activities of a physical or juridical person in the course of their business. Despite their importance, the models and technologies that have been developed over the past two decades in the Digital Library (DL) field have not been specifically tailored to archives. This is especially true when it comes to formal and foundational frameworks, as the Streams, Structures, Spaces, Scenarios, Societies (5S) model is.

Therefore, we propose an innovative formal model, called NEsted SeTs for Object hieRarchies (NESTOR), for archives, explicitly built around the concepts of context and hierarchy which play a central role in the archival realm. NESTOR is composed of two set-based data models: the Nested Sets Model (NS-M) and the Inverse Nested Sets Model (INS-M) that express the hierarchical relationships between objects through the inclusion property between sets. We formally study the properties of these models and prove their equivalence with the notion of hierarchy entailed by archives.

We then use NESTOR to extend the 5S model in order to take into account the specific features of archives and to tailor the notion of digital library accordingly. This offers the possibility of opening up the full wealth of DL methods and technologies to archives. We demonstrate the impact of NESTOR on this problem through three example use cases.

A Curated and Evolving Linguistic Linked Dataset

Emanuele Di Buccio, Giorgio Maria Di Nunzio and Gianmaria Silvello
Journal PaperSemantic Web Journal, 4(3): 265-270, 2013.

Abstract

This paper describes the Atlante Sintattico d’Italia, Syntactic Atlas of Italy (ASIt) linguistic linked dataset. ASIt is a scientific project aiming to account for minimally different variants within a sample of closely related languages; it is part of the Edisyn network, the goal of which is to establish a European network of researchers in the area of language syntax that use similar standards with respect to methodology of data collection, data storage and annotation, data retrieval and cartography. In this context, ASIt is defined as a curated database which builds on dialectal data gathered during a twenty-year-long survey investigating the distribution of several grammatical phenomena across the dialects of Italy.

Both the ASIt linguistic linked dataset and the Resource Description Framework Schema (RDF/S) on which it is based are publicly available and released with a Creative Commons license (CC BY-NC-SA 3.0). We report the characteristics of the data exposed by ASIt, the statistics about the evolution of the data in the last two years, and the possible usages of the dataset, such as the generation of linguistic maps.

Targeted Query Expansions as a Method for Searching Mixed Quality Digitized Cultural Heritage Documents

Keskustalo, H., Kettunen, K., Kumpulainen, S., Ferro, N., Silvello, G., Jӓrvelin, A., Kekӓlӓinen, J., Arvola, P., Sormunen, E., Jӓrvelin, K., and Saastamoinen, M.
Conference PaperiConference 2015 Proceedings.

Abstract

Digitization of cultural heritage is a huge ongoing effort in many countries. In digitized historical documents, words may occur in different surface forms due to three types of variation - morphological variation, historical variation, and errors in optical character recognition (OCR). Because individual documents may differ significantly from each other regarding the level of such variations, digitized collections may contain documents of mixed quality. Such different types of documents may require different types of retrieval methods. We suggest using targeted query expansions (QE) to access documents in mixed-quality text collections. In QE the user-given search term is replaced by a set of expansion keys (search words); in targeted QE the selection of expansion terms is based on the type of surface level variation occurring in the particular text searched. We illustrate our approach in a highly inflectional compounding language, Finnish while the variation occur across all natural languages. We report a minimal-scale experiment based on the QE method and discuss the need to support targeted QEs in the search interface.

CLEF 15th Birthday: What Can We Learn From Ad Hoc Retrieval?

Nicola Ferro and Gianmaria Silvello
Conference PaperInformation Access Evaluation. Multilinguality, Multimodality, and Interaction - Fifth International Conference of the Cross-Language Evaluation Forum, CLEF 2014: Sheffield, UK, September 15-18, 2014, pp. 32-44. In Lecture Notes in Computer Science 8685, Springer International Publishing Switzerland.

Abstract

This paper reports the outcomes of a longitudinal study on the CLEF Ad Hoc track in order to assess its impact on the effective- ness of monolingual, bilingual and multilingual information access and retrieval systems. Monolingual retrieval shows a positive trend, even if the performance increase is not always steady from year to year; bilingual retrieval has demonstrated higher improvements in recent years, proba- bly due to the better linguistic resources now available; and, multilingual retrieval exhibits constant improvement and performances comparable to bilingual (and, sometimes, even monolingual) ones.

A Vector Space Model for Syntactic Distances Between Dialects

Emanuele Di Buccio and Giorgio Maria Di Nunzio and Gianmaria Silvello
Conference PaperIn Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC '14). European Language Resources Association (ELRA), 2486-2489. ISBN 978-2-9517408-8-4

Abstract

Syntactic comparison across languages is essential in the research field of linguistics, e.g. when investigating the relationship among closely related languages. In IR and NLP, the syntactic information is used to understand the meaning of word occurrences according to the context in which their appear. In this paper, we discuss a mathematical framework to compute the distance between languages based on the data available in current state-of-the-art linguistic databases. This framework is inspired by approaches presented in IR and NLP.

A Visual Interactive Environment for Making Sense of Experimental Data

Marco Angelini, Nicola Ferro, Giuseppe Santucci and Gianmaria Silvello
Conference PaperIn Advances in Information Retrieval - 36th European Conference on IR Research, ECIR 2014: Amsterdam, The Netherlands, April 13-16, 2014, pp. 767-770. In Lecture Notes in Computer Science 8416, Springer, ISBN 978-3-319-06027-9

Abstract

We present the Visual Information Retrieval Tool for Upfront Evaluation (VIRTUE) which is an interactive and visual system supporting two relevant phases of the experimental evaluation process: performance analysis and failure analysis.

Making it Easier to Discover, Re-Use and Understand Search Engine Experimental Evaluation Data

Nicola Ferro and Gianmaria Silvello
Journal Paper w/o prERCIM News, Volume 96, January 2014.

Interacting with Digital Cultural Heritage Collections via Annotations: The CULTURA Approach

Agosti, M., Conlan, O., Ferro, N., Hampson, C., Munnelly, G., Ponchia, C., and Silvello, G. (2014)
Conference Paper In Greco, S. and Picariello, A., editors, Proc. 22nd Italian Symposium on Advanced Database Systems (SEBD 2014)

PROMISE Winter School 2013: Bridging Between Information Retrieval and Databases

Maristella Agosti, Nicola Ferro and Gianmaria Silvello
Journal PaperSIGIR Forum, Volume 47 Issue 1, June 2013. Pages 46-52. ACM New York, NY, USA.

PROMISE Retreat Report: Prospects and Opportunities for Information Access Evaluation

Nicola Ferro, Richard Berendsen, Allan Hanbury, Mihai Lupu, Vivien Petras, Maarten de Rijke, and Gianmaria Silvello
Journal PaperSIGIR Forum, Volume 46 Issue 2, December 2012. Pages 60-84. ACM New York, NY, USA.

Abstract

The PROMISE network of excellence organized a two-days brainstorming workshop on 30th and 31st May 2012 in Padua, Italy, to discuss and envisage future directions and perspectives for the evaluation of information access and retrieval systems in multiple languages and multiple media. 25 researchers from 10 different European countries attended the event, covering many different research areas – information retrieval, information extraction, natural language processing, humancomputer interaction, semantic technologies, information visualization and visual analytics, system architectures, and so on. The event has been organized as a “retreat” allowing researchers to work back to back and propose hot topics where to focus research in the field in the coming years. This document reports on the outcomes of this event and provides details about the six envisaged research lines: search applications; contextual evaluation; challenges in test collection design and exploitation; component-based evaluation; ongoing evaluation; and signal-aware evaluation. The ultimate goal of the PROMISE retreat is to stimulate and involve the research community along these research lines and to provide funding agencies with effective and scientifically sound ideas for coordinating and supporting information access research.

Improving Ranking Evaluation Employing Visual Analytics

Marco Angelini, Nicola Ferro, Giuseppe Santucci and Gianmaria Silvello
Conference PaperIn Information Access Evaluation. Multilinguality, Multimodality, and Visualization - Forth International Conference of the Cross-Language Evaluation Forum, CLEF 2013: Valencia, Spain, September 23-26, 2013, pp. 29-40. In Lecture Notes in Computer Science 8138, Springer, ISBN 978-3-642-40801-4

Abstract

In order to satisfy diverse user needs and support challenging tasks, it is fundamental to provide automated tools to examine system behavior, both visually and analytically. This paper provides an analytical model for examining rankings produced by IR systems, based on the discounted cumulative gain family of metrics, and visualization for performing failure and “what-if” analyses.

A Geolinguistic Web Application Based on Linked Open Data

Emanuele Di Buccio, Giorgio Maria Di Nunzio and Gianmaria Silvello
Conference PaperIn Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13). ACM, New York, NY, USA, 1101-1102.

Abstract

Digital Geolinguistic systems encourage collaboration be- tween linguists, historians, archaeologists, ethnographers, as they explore the relationship between language and cultural adaptation and change. In this demo, we propose a Linked Open Data approach for increasing the level of interoperability of geolinguistic applications and the reuse of the data. We present a case study of a geolinguistic project named Atlante Sintattico d’Italia, Syntactic Atlas of Italy (ASIt).

Formal Models for Digital Archives: NESTOR and the 5S

Nicola Ferro and Gianmaria Silvello
Conference PaperResearch and Advanced Technology for Digital Libraries - International Conference on Theory and Practice of Digital Libraries (TPDL 2013): T. Aalberg, C.Papatheodorou, M. Dobreva, G. Tsakonas, C. J. Farrugia Eds., Lecture Notes in Computer Science 8092, pp. 192-203. Springer Berlin Heidelberg, Germany.

Abstract

Archives are a valuable part of our cultural heritage but despite their importance, the models and technologies that have been developed over the past two decades in the Digital Library (DL) field have not been specifically tailored to them. This is especially true when it comes to formal and foundational frameworks, as the Streams, Structures, Spaces, Scenarios, Societies (5S) model is.

Therefore, we propose an innovative formal model, called NEsted SeTs for Object hieRarchies (NESTOR), for archives, explicitly built around the concepts of context and hierarchy which play a central role in the archival realm. We then use NESTOR to extend the 5S model offering the possibility of opening up the full wealth of DL methods to archives. We provide account for this by presenting two concrete applications.

An Open Source System Architecture for Digital Geolinguistic Linked Open Data

Emanuele Di Buccio, Giorgio Maria Di Nunzio and Gianmaria Silvello
Conference PaperResearch and Advanced Technology for Digital Libraries - International Conference on Theory and Practice of Digital Libraries (TPDL 2013): T. Aalberg, C.Papatheodorou, M. Dobreva, G. Tsakonas, C. J. Farrugia Eds., Lecture Notes in Computer Science 8092, pp. 438-441. Springer Berlin Heidelberg, Germany.

Abstract

Digital Geolinguistic systems encourages collaboration be- tween linguists, historians, archaeologists, ethnographers, as they explore the relationship between language and cultural adaptation and change. These systems can be used as instructional tools, presenting complex data and relationships in a way accessible to all educational levels. In this poster, we present a system architecture based on a Linked Open Data (LOD) approach the aim of which is to increase the level of interoperability of geolinguistic applications and the reuse of the data.

Information retrieval failure analysis: Visual analytics as a support for interactive 'what-if' investigation

Marco Angelini, Nicola Ferro, Guido Granato, Giuseppe Santucci and Gianmaria Silvello
Conference Paper2012 IEEE Conference on Visual Analytics Science and Technology, VAST 2012, Seattle, WA, USA, October 14-19, 2012, pp. 204-206. IEEE Computer Society, USA. ISBN 978-1-4673-4752-5.

Abstract

This poster provides an analytical model for examining perfor- mances of IR systems, based on the discounted cumulative gain family of metrics, and visualization for interacting and exploring the performances of the system under examination. Moreover, we propose machine learning approach to learn the ranking model of the examined system in order to be able to conduct a “what-if” anal- ysis and visually explore what can happen if you adopt a given so- lution before having to actually implement it.

Cumulated Relative Position: A Metric for Ranking Evaluation

Marco Angelini, Nicola Ferro, Kalervo Jarvelin, Heikki Keskustalo, Ari Pirkola, Giuseppe Santucci and Gianmaria Silvello
Conference PaperMultilingual and Multimodal Information Access Evaluation - Third International Conference of the Cross-Language Evaluation Forum, CLEF 2012: Rome, Italy, September 17-20, 2012. Lecture Notes in Computer Science 7488, Springer, ISBN 978-3-642-33246-3, pp. 112-123.

Abstract

The development of multilingual and multimedia information access systems calls for proper evaluation methodologies to ensure that they meet the expected user requirements and provide the desired effectiveness. IR research offers a strong evaluation methodology and a range of evaluation metrics, such as MAP and (n)DCG. In this paper, we propose a new metric for ranking evaluation, the CRP. We start with the observation that a document of a given degree of relevance may be ranked too early or too late regarding the ideal ranking of documents for a query. Its relative position may be negative, indicating too early ranking, zero indicating correct ranking, or positive, indicating too late ranking. By cumulating these relative rankings we indicate, at each ranked position, the net effect of document displacements, the CRP. We first define the metric formally and then discuss its properties, its relationship to prior metrics, and its visualization. Finally we propose different visualizations of CRP by exploiting a test collection to demonstrate its behavior.

DIRECTions: Design and Specification of an IR Evaluation Infrastructure

Maristella Agosti, Emanuele Di Buccio, Nicola Ferro, Ivano Masiero, Simone Peruzzo and Gianmaria Silvello
Conference PaperMultilingual and Multimodal Information Access Evaluation - Third International Conference of the Cross-Language Evaluation Forum, CLEF 2012: Rome, Italy, September 17-20, 2012, pp. 88-99. In Lecture Notes in Computer Science 7488, Springer, ISBN 978-3-642-33246-3.

Abstract

Information Retrieval (IR) experimental evaluation is an essential part of the research on and development of information access methods and tools. Shared data sets and evaluation scenarios allow for comparing methods and systems, understanding their behaviour, and tracking performances and progress over the time. On the other hand, experimental evaluation is an expensive activity in terms of human effort, time, and costs required to carry it out.

Software and hardware infrastructures that support experimental evaluation operation as well as management, enrichment, and exploitation of the produced scientific data provide a key contribution in reducing such effort and costs and carrying out systematic and throughout analysis and comparison of systems and methods, overall acting as enablers of scientific and technical advancement in the field. This paper describes the specification for an IR evaluation infrastructure by conceptually modeling the entities involved in IR experimental evaluation and their relationships and by defining the architecture of the proposed evaluation infrastructure and the APIs for accessing it.

Visual Interactive Failure Analysis: Supporting Users in Information Retrieval Evaluation

Marco Angelini, Nicola Ferro, Giuseppe Santucci and Gianmaria Silvello
Conference PaperFourth Information Interaction in Context Symposium (IIiX 2012): Nijmegen, the Netherlands, August 21-24, 2012. In Kamps, J., Kraaij, W., and Fuhr, N., editors, pages 195-203. ACM Press, New York, USA.

Abstract

Measuring is a key to scientific progress. This is particularly true for research concerning complex systems, whether natural or human- built. Multilingual and multimedia information access systems, such as search engines, are increasingly complex: they need to satisfy diverse user needs and support challenging tasks. Their development calls for proper evaluation methodologies to ensure that they meet the expected user requirements and provide the desired effectiveness. In this context, failure analysis is crucial to under- stand the behaviour of complex systems. Unfortunately, this is an especially challenging activity, requiring vast amounts of human effort to inspect query-by-query the output of a system in order to understand what went well or bad.

It is therefore fundamental to provide automated tools to examine system behaviour, both visually and analytically. Moreover, once you understand the reason behind a failure, you still need to conduct a "what-if" analysis to understand what among the different possible solutions is most promising and effective before actually starting to modify your system. This paper provides an analytical model for examining performances of IR systems, based on the discounted cumulative gain family of metrics, and visualization for interacting and exploring the performances of the system under examination. Moreover, we propose machine learning approach to learn the ranking model of the examined system in order to be able to conduct a "what-if" analysis and visually explore what can happen if you adopt a given solution before having to actually implement it.

A System for Exposing Linguistic Linked Open Data

Emanuele Di Buccio, Giorgio Maria Di Nunzio and Gianmaria Silvello
Conference PaperResearch and Advanced Technology for Digital Libraries - International Conference on Theory and Practice of Digital Libraries (TPDL 2012): Paphos, Cyprus, September 23-27,2012. Springer, Lecture Notes in Computer Science 7489, ISBN: 978-3-642-33289-0, pages 173-178.

Abstract

In this paper we introduce the Atlante Sintattico d’Italia, Syntactic Atlas of Italy (ASIt) enterprise which is a linguistic project aiming to account for minimally different variants within a sample of closely related languages. One of the main goals of ASIt is to share and make linguistic data re-usable. In order to create a universally available resource and be compliant with other relevant linguistic projects, we define a Resource Description Framework (RDF) model for the ASIt linguistic data thus providing an instrument to expose these data as Linked Open Data (LOD). By exploiting RDF native capabilities we overcome the ASIt methodological and technical peculiarities and enable different linguistic projects to read, manipulate and re-use linguistic data.

Per il sistema archivistico regionale

Nicola Ferro and Gianmaria Silvello (2012)
Conference Paper w/o pr In Regione del Veneto, editor, Memoria e innovazione. Nuovi strumenti / Nuove esigenze. Atti della Prima Giornata regionale degli Archivi, pages 91-101. Canova Edizioni, Treviso

Handling Hierarchically Structured Resources Addressing Interoperability Issues in Digital Libraries

Maristella Agosti, Nicola Ferro, and Gianmaria Silvello
Book chapter Learning Structure and Schemas from Documents, Biba, M. and Xhafa, F. Eds., Studies in Computational Intelligence, vol. 375, pp. 17-49, Springer Berlin-Heidelberg, 2011.

Abstract

We present and describe the NEsted SeTs for Object hieRarchies (NESTOR) Frame- work that allows us to model, manage, access and exchange hierarchically structured resources. We envision this framework in the context of Digital Libraries and using it as a mean to address the complex and multiform concept of interoperability when dealing with hierarchical structures. The NESTOR Framework is based on three main components: The Model, the Algebra and a Prototype. We detail all these components and present a concrete use case based on archives that are collections of historical documents or records providing information about a place, institution, or group of people, because the archives are fundamental and challenging entities in the digital libraries panorama. Within the archives we show how an archive can be represented through set data models and how these models can be instantiated. We compared two instantiations of the NESTOR Model and show how interoperability issues can be addressed by exploiting the NESTOR Framework.

The NESTOR Framework: How to Handle Hierarchical Data Structures

Nicola Ferro and Gianmaria Silvello
Conference PaperResearch and Advanced Technology for Digital Libraries (ECDL 2009), in Lecture Notes in Computer Science (LNCS) 5741 series, pp. 215-226, Springer-Verlag.

Abstract

In this paper we study the problem of representing, managing and exchanging hierarchically structured data in the context of a Digital Library (DL). We present the NEsted SeTs for Object hieRarchies (NESTOR) framework defining two set data models that we call: the "Nested Set Model (NS-M)" and the "Inverse Nested Set Model (INS- M)" based on the organization of nested sets which enable the representation of hierarchical data structures. We present the mapping between the tree data structure to NS-M and to INS-M. Furthermore, we shall show how these set data models can be used in conjunction with Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) adding new functionalities to the protocol without any change to its basic functioning. At the end we shall present how the couple OAI-PMH and the set data models can be used to represent and exchange archival metadata in a distributed environment.

Access and Exchange of Hierarchically Structured Resources on the Web with the NESTOR Framework

Maristella Agosti, Nicola Ferro and Gianmaria Silvello
Conference Paper2009 IEEE / WIC / ACM International Conferences on Web Intelligence, IEEE Computer Society, pp. 659-662, 2009.

Abstract

The paper addresses the problem of representing, managing and exchanging hierarchically structured data in the context of Digital Library (DL) systems in order to enhance the access and exchange DL resources on the Web. We propose the NEsted SeTs for Object hieRarchies (NESTOR) framework, which relies on two set data models - the "Nested Set Model (NS-M)" and the "Inverse Nested Set Model (INS-M)" - to enable the representation of hierarchical data structures by means of a proper organization of nested sets. In particular, we show how NESTOR can be effectively exploited to enhance Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) for better access and exchange of hierarchical resources on the Web.

A Methodology for Sharing Archival Descriptive Metadata in a Distributed Environment

Nicola Ferro and Gianmaria Silvello
Conference PaperProceedings of the 12th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2008), in Lecture Notes in Computer Science (LNCS) 5173 series, Springer-Verlag, Heidelberg, Germany, pp. 268-279, 2008.

Abstract

This paper discusses how to exploit widely accepted solutions for interoperation, such as the pair Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and Dublin Core (DC) metadata for- mat, in order to deal with the peculiar features of archival description metadata and allow their sharing. We present a methodology for mapping Encoded Archival Description (EAD) metadata into Dublin Core (DC) metadata records without losing information. The methodology exploits Digital Library System (DLS) technologies enhancing archival metadata sharing possibilities and at the same time considers archival needs; fur- thermore, it permits to open valuable information resources held by archives to the wider context of the cross-domain interoperation among different cultural heritage institutions.

An Architecture for Sharing Metadata among Geographically Distributed Archives

Maristella Agosti, Nicola Ferro and Gianmaria Silvello
Conference PaperPost Proceedings of the DELOS Conference, in Lecture Notes in Computer Science (LNCS) 4877 series, Springer-Verlag, Heidelberg, Germany, pp. 56-65, 2007.

Abstract

We present a solution to the problem of sharing metadata between different archives spread across a geographic region. In particular we consider the Italian Veneto Region archives. Initially we analyze the Veneto Region information system based on a domain gateway system called “SIRV-INTEROP project” and we propose a solution to provide advanced services against the regional archives. We deal with these is- sues in the context of the SIAR – Regional Archival Information System – project. The aim of this work is to integrate different archive realities in order to provide unique public access to archival information. Moreover we propose a non-intrusive, flexible and scalable solution that preserves archives identity and autonomy.

Keyword Search and Evaluation over Relational Databases: an Outlook to the Future

Sonia Bergamaschi, Francesco Guerra, Nicola Ferro and Gianmaria Silvello
Workshop Paper7th International Workshop on Ranking in Databases (DBRank 2013), Riva Del Garda, Italy, in conjunction with VDLB 2013, 2013.

Abstract

This position paper discusses the need for considering keyword search over relational databases in the light of broader systems, where keyword search is just one of the components and which are aimed at better supporting users in their search tasks. These more complex systems call for appropriate evaluation methodologies which go beyond what is typically done today, i.e. measuring performances of components mostly in isolation or not related to the actual user needs, and, instead, able to consider the system as a whole, its constituent components, and their inter-relations with the ultimate goal of supporting actual user search tasks.

A Visual Analytics Tool for Experimental Evaluation

Marco Angelini, Nicola Ferro, Giuseppe Santucci and Gianmaria Silvello (2013)
Conference Paper In Buccafurri, F. and Saccà, D., editors, Proc. 21st Italian Symposium on Advanced Database Systems (SEBD 2013), pages 139–150

Enabling Cross-Language Access to Archival Metadata

Maristella agosti, Nicola Ferro and Gianmaria Silvello
Workshop PaperCultural Heritage 2009: Empowering Users: An Active Role for User Communities (CH 2009), pp. 179-183, 2009.

The Design of a DLS for the Management of Very Large Collections of Archival Objects

Maristella Agosti, Nicola Ferro and Gianmaria Silvello
Workshop PaperFirst Workshop on Very Large Digital Libraries in conjunction with the 12th European Conference on Research and Advanced Technologies on Digital Libraries (ECDL 2008), published by ISTI-CNR Gruppo A.L.I - Pisa, 2008.

Building a Distributed Digital Library System Enhancing the Role of Metadata

Gianmaria Silvello
Workshop PaperBCS-IRSG Symposium: Future Directions in Information Access - BCS-IRSG FDIA 2008, in Published as part of the eWiC Series, pp. 46-53, 2008.

Abstract

This position paper discusses the need for considering keyword search over relational databases in the light of broader systems, where keyword search is just one of the components and which are aimed at better supporting users in their search tasks. These more complex systems call for appropriate evaluation methodologies which go beyond what is typically done today, i.e. measuring performances of components mostly in isolation or not related to the actual user needs, and, instead, able to consider the system as a whole, its constituent components, and their inter-relations with the ultimate goal of supporting actual user search tasks.

Measuring Syntactic Distances between Dialects: A Web Application for Annotating Dialect Data

Emanuele Di Buccio, Giorgio Maria Di Nunzio and Gianmaria Silvello
Conference PaperIn M. Agosti, T. Catarci and F. Esposito eds. 10th Italian Research Conference on Digital Libraries, IRCDL 2014, 38:44-47, Elsevier, 2014.

Abstract

Research in dialectal variation allows linguists to understand the fundamental principles underlying language systems and grammatical changes in time and space. Since different dialectal variants do not occur randomly on the territory and geographical patterns of variation are recognizable for an individual syntactic form, we believe that a systematic approach for studying this variations is required. In this paper, we present a Web application for annotating dialectal data, in particular with the aim of measuring the degree of syntactic differences between dialects.

Measuring and Analyzing the Scholarly Impact of Experimental Evaluation Initiatives

Marco Angelini, Nicola Ferro, Birger Larsen, Henning Muller, Giuseppe Santucci, Gianmaria Silvello and Theodora Tsikrika
Conference PaperIn M. Agosti, T. Catarci and F. Esposito eds. 10th Italian Research Conference on Digital Libraries, IRCDL 2014, 38:133-137, Elsevier, 2014.

Abstract

Evaluation initiatives have been widely credited with con- tributing highly to the development and advancement of information access systems, by providing a sustainable platform for conducting the very demanding activity of comparable experimental evaluation in a large scale. Measuring the impact of such benchmarking activities is crucial for assessing which of their aspects have been successful, which activities should be continued, enforced or suspended and which research paths should be further pursued in the future. This work introduces a framework for modeling the data produced by evaluation campaigns, a methodology for measuring their scholarly impact, and tools exploiting visual analytics to analyze the outcomes.

Biblioteche digitali tra modellazione, gestione e valutazione

Maristella Agosti, Nicola Ferro and Gianmaria Silvello
Conference PaperDigital Humanities: progetti italiani ed esperienze di convergenza multidisciplinare. F. Ciotti Eds. Atti del convegno annuale dell'Associazione per l’Informatica Umanistica e la Cultura Digitale (AIUCD) 2012. DigiLab, 2014, pp. 33-50 (in Italian).

Abstract

Le biblioteche digitali e i sistemi di gestione di biblioteche digitali operano in contesti eterogenei e in rapida evoluzione. Ne consegue che i sistemi che vengono ideati ed utilizzati devono essere progettati per essere dinamici e in grado di gestire l'interoperabilità con altri sistemi per favorire la fruizione dei contenuti digitali da parte di diverse categorie di utenti. Per raggiungere questi obiettivi di dinamicità e interoperabilità i sistemi di biblioteche digitali devono far riferimento a modelli di qualità per gestire i contenuti in modo consistente. Per questo si illustra un modello di qualità che può essere adottabile per la conservazione della qualità di una biblioteca digitale nel tempo. Da ultimo si presentano gli aspetti fondamentali della valutazione sperimentale, perché, utilizzando i metodi propri della valutazione sperimentale, si attua un circolo virtuoso che tiene conto delle varie caratteristiche utili ad attuare sistemi orientati alla soddisfazione degli utenti finali.

Cumulated Relative Position: A Metric for Ranking Evaluation

Marco Angelini, Nicola Ferro, Kalervo Jarvelin, Heikki Keskustalo, Ari Pirkola, Giuseppe Santucci and Gianmaria Silvello
Workshop PaperProceedings of the 4th Italian Information Retrieval Workshop, IIR 2013. R. Basili and F. Sebastiani and G. Semeraro Eds., 2014, CEUR Workshop Proceedings, Volume 964, pp. 57-60.

Visual Interactive Failure Analysis: Supporting Users in Information Retrieval Evaluation

Marco Angelini, Nicola Ferro, Giuseppe Santucci and Gianmaria Silvello
Workshop PaperProceedings of the 4th Italian Information Retrieval Workshop, IIR 2013. R. Basili and F. Sebastiani and G. Semeraro Eds., 2014, CEUR Workshop Proceedings, Volume 964, pp. 61-64.

The Evaluation Approach of IPSA@CULTURA

Maristella Agosti, Marta Manfioletti, Nicola Orio, Chiara Ponchia and Gianmaria Silvello
Conference PaperPost-Proceedings of the 9th Italian Research Conference, IRCDL 2013. Tiziana Catarci, Nicola Ferro and Antonella Poggi Eds., Bridging Between Cultural Heritage Institutions Communications in Computer and Information Science, Revised Selected Papers, Volume 385, 2014, pp. 147-152.

Abstract

This paper reports on the original approach envisaged for the evaluation of a digital archive accessible through a Web application, in its transition from an isolated archive to an archive fully immersed in a new adaptive environment.

Digital Archives: Extending the 5S Model through NESTOR

Nicola Ferro and Gianmaria Silvello
Conference PaperPost-Proceedings of the 9th Italian Research Conference, IRCDL 2013. Tiziana Catarci, Nicola Ferro and Antonella Poggi Eds., Bridging Between Cultural Heritage Institutions Communications in Computer and Information Science, Revised Selected Papers, Volume 385, 2014, pp. 130-135.

Abstract

Archives are an extremely valuable part of our cultural heritage. Although their importance, the models and technologies that have been developed over the past two decades in the Digital Library (DL) field have not been specifically tailored on archives and this is especially true when it comes to formal and foundational frameworks, as the Streams, Structures, Spaces, Scenarios, Societies (5S) model is. There- fore, we propose an innovative formal model, called NEsted SeTs for Object hieRarchies (NESTOR), for archives, using it to extend the 5S model in order to take into account the specific features of the archives and to tailor the notion of digital library accordingly.

A Rule-Based Citation System for Structured and Evolving Datasets

Peter Buneman and Gianmaria Silvello
Journal PaperIEEE Bulletin of the Technical Committee on Data Engineering , Vol. 3, No. 3. IEEE Computer Society, pp. 33-41, September 2010.

Abstract

We consider the requirements that a citation system must fulfill in order to cite structured and evolving data sets. Such a system must take into account variable granularity, context and the temporal dimension. We look at two examples and discuss the possible forms of citation to these data sets. We also describe a rule-based system that generates citations which fulfill these requirements.

A Set-Based Approach to Deal with Hierarchical Structures

Gianmaria Silvello
PhD ThesisPh.D. School in Information Engineering, University of Padua, 2011.

Abstract

Hierarchical structures are pervasive in computer science because they are a fundamental means for modeling many aspects of reality and for representing and managing a wide corpus of data and digital resources. One of the most important hierarchical structures is the tree, which has been widely studied, analyzed and adopted in several contexts and scientific fields over time. Our work takes into major consideration the role and impact of the tree in computer science and investigates its applications starting from the following pivotal question: "Is the tree always the most advantageous choice for modeling, representing and managing hierarchies?" Our aim is to analyze the nature and use of hierarchical structures and determine the most suitable way of employing them in different contexts of interests.

We concentrate our work mainly on the scientific field of Digital Libraries. Digital Libraries are the compound and complex systems which manage digital resources from our cultural heritage – belonging to different cultural organizations such as libraries, archives and museums – and which provide advanced services over these digital resources. In particular, we point out a focal use case within this scientific field based on the modeling, representation, management and exchange of archival resources in a distributed environment. We take into consideration the hierarchical inner structure of archives by considering the solutions proposed in the literature for modeling, representing, managing and sharing the archival resources. Archives are usually modeled by means of a tree structure; furthermore, the standard de facto for digital encoding of digital cultural resources – described and represented by means of metadata – is the eXtensible Markup Language (XML) that supports a tree representation. The problem often affecting this approach is that the model used to represent the hierarchies is bounded by the specific technology of choice adopted for its instantiation – e.g. the XML. In the archival context the tree structure is commonly instantiated by means of a unique XML file which mixes up the hierarchical structure elements with the content elements, without a clear distinction between the two; it is then not straightforward to determine how to access and exchange a specific subset of data without navigating the whole hierarchy or without losing meaningful hierarchical relationships.

To address the problems exemplified in the previous scenario we propose the NEsted SeT for Object hieRarchies (NESTOR) Framework which is composed of two main components: the NESTOR Model and the NESTOR Prototype.

The NESTOR Model is the core of the NESTOR Framework because it defines the set data models on which every component of the framework relies. It defines two set data models that we have called the "Nested Set Model (NS-M)" and the "Inverse Nested Set Model (INS-M)". We formally define these two set data models by showing how we can model and represent hierarchies throughout collections of nested sets. We show how these models add some features with respect to the tree while maintaining its full expressive power. We formally prove several properties of these models and show the correspondences with the tree. Furthermore, we define four distance measures for the the NS-M and the INS-M and we prove them to be metric spaces.

The NESTOR Model is presented from a formal point-of-view and then envisioned in a practical application context defined by the NESTOR Prototype. In order to describe the prototype we rely on the archive use case, and propose an application for modeling, representing, managing and sharing of archival resources. The expressive power of the archive modeled by means of a tree and the set data models are compared. We analyze the advantages and disadvantages of our approach when data management and exchange in distributed environments have to be faced. We provide a concrete implementation of the described models in the context of the informative system called SIAR (Sistema Informativo Archivistico Regionale) that we designed and developed for the management of the archival resources of the Italian Veneto Region. Furthermore, we show how the NESTOR Framework can be used in conjunction with well-established and widely-used Digital Libraries technological advances.

Modeling Archives by Means of OAI-ORE

Nicola Ferro and Gianmaria Silvello
Conference Paper Post-Proceedings of the 8th Italian Research Conference, IRCDL 2012. M. Agosti et Al. Eds., Communications in Computer and Information Science 354, Springer-Verlag Berlin Heidelberg, 2012, pp. 216-227.

Empowering Archives through Annotations

Nicola Ferro and Gianmaria Silvello
Conference Paper Post-Proceedings of the 8th Italian Research Conference, IRCDL 2012. M. Agosti et Al. Eds., Communications in Computer and Information Science 354, Springer-Verlag Berlin Heidelberg, 2012, pp. 57-68.

Structural and Content Queries on the Nested Sets Model

Gianmaria Silvello
Conference Paper Proceedings of the Twentieth Italian Symposium on Advanced Database Systems, SEBD 2012, Venice, Italy, June 24-27, 2012. Edizioni Libreria Progetto, Padova, Italy, ISBN: 978-88-96477-23-6, pp. 283-288.

SIAR: A User-Centric Digital Archive System

Maristella Agosti, Nicola Ferro, Andreina Rigon, Erilde Terenzoni, Gianmaria Silvello and Cristina Tommasi
Conference Paper 7th Italian Research Conference, IRCDL 2011. Revised Selected Papers, Springer, Communications in Computer and Information 249, pp. 87-99, 2011.

PROMISE - Participative Research labOratory for Multimedia and Multilingual Information Systems Evaluation

Emanuela Di Buccio, Marco Dussin, Nicola Ferro, Emanuele Di Buccio, Ivano Masiero, and Gianmaria Silvello
Conference Paper 7th Italian Research Conference, IRCDL 2011. Revised Selected Papers, Springer, Communications in Computer and Information 249, pp. 140-143, 2011.

The NESTOR Model: Properties and Applications in the Context of Digital Archives

Nicola Ferro and Gianmaria Silvello
Conference Paper In Mecca, G. and Greco, S., editors,Proceedings of the 19th Italian Symposium on Advanced Database Systems, SEBD 2011. Maratea, Italy, pp. 274-285, 2011.

Metodologie e percorsi interdisciplinari per la ideazione di un Sistema Informativo Archivistico

Maristella Agosti, Giorgetta Bonfiglio-Dosio, Nicola Ferro and Gianmaria Silvello (2008)
Journal Paper w/o pr Atti e Memorie dell'Accademia Galileana di Scienze Lettere ed Arti in Padova, già Dei Ricoverati e Patavina, CXX:261-287

The NESTOR Framework: Manage, Access and Exchange Hierarchical Data Structures

Maristella Agosti, Nicola Ferro, and Gianmaria Silvello
Conference PaperProceedings of the 18th Italian Symposium on Advanced Database Systems (SEBD 2010), Societa' Editrice Esculapio, Bologna, Italy, pp. 242-253, 2010.

FAST and NESTOR: How to Exploit Annotation Hierarchies

Nicola Ferro, and Gianmaria Silvello
Conference Paper6th Italian Research Conference, IRCDL 2010, Revised Selected Papers, Springer, Communications in Computer and Information, vol. 91, pp. 55-66, 2010.

Design and Development of the Data Model of a Distributed DLS Architecture for Archive Metadata

Nicola Ferro, and Gianmaria Silvello
Conference Paper5th Italian Research Conference on Digital Libraries, IRCDL 2009, Published by DELOS: an Association for Digital Libraries, pp. 12-21, 2009.

A Distributed Digital Library System Architecture for Archive Metadata

Nicola Ferro, and Gianmaria Silvello
Conference Paper4th Italian Research Conference on Digital Libraries (IRCDL 2008), published by DELOS: an Association for Digital Libraries, pp. 99-104, 2008.

Proposta metodologica e architetturale per la gestione distribuita e condivisa di collezioni di documenti digitali

Maristella Agosti, Nicola Ferro and Gianmaria Silvello (2007)
Journal Paper w/o pr Archivi, 2(2):49-73