Search Engines represents an important application of Information Retrieval. In particular, a major branch of Search Engines is devoted to web search. In this document we summarize our work to produce a submission for the CLEF LongEval initiative, primarily concerning web search. The described activity first focuses onto the development of an indexing and searching IR system with the best possible performance based on the provided training data then evaluates its performance on test data coming from different scenarios. We first introduce the task and related problems. Subsequently we present the retrieval systems that we have used for the program submission.
Afterwards, we discuss the results obtained with the various systems and compare them in the training scope to explain why some systems perform better than others. Finally, metrics analysis is extended to the additional scenarios LongEval focuses on, along with statistical considerations over the systems’ output.
The user equipment (UE) of cellular systems operating in frequency division duplexing (FDD) mode feeds back (report) the channel state information (CSI) to the new generation base node (gNB). We exploit the correlation between the uplink and downlink channels, although they are at different frequencies, to obtain a two-way protocol to report the downlink CSI. First, the gNB estimates the uplink channel and sends a feedforward message to the UE to select the codebook used by the UE to quantize the downlink CSI; then, the UE transmits the index of the quantized code vector in feedback to the gNB. This protocol is modeled as the cascade of two rate-distortion source coding schemes, for the feedforward and feedback signals, where the feedback scheme also has mixed-side information. We design both encoders and compare their performance with existing schemes, including an approach using natural networks.
Conversational systems are increasing their popularity since they allow users to interact in a simple and natural way. Information Retrieval (IR) and Recommender Systems (RS) represents two categories of systems that strongly rely on the interaction with the user. For these reasons, recently many researches increased their effort towards the development Conversational Information Retrieval (CIR) and Conversational Recommender Systems (CRS). Such systems, in fact, allow to increase the ease of use from the user perspective and also to improve the quality of the results. The aim of this tutorial is to show the best and most frequently used approaches/paradigms to build CIR and CRS systems and to understand how these can be evaluated. During the tutorial the participants will be provided with the knowledge that is needed to understand, create and evaluate CIR and CRS.
Information Retrieval (IR) systems and Recommender Systems (RS) are ubiquitous commodities, essential to satisfy users’ information needs in digital environments. These two classes of systems are traditionally treated as two isolated components with limited, if any, interaction. Recent studies showed that jointly operating retrieval and recommendation allows for improved performance on both tasks. In this regard, the state-of-the-art is represented by the Unified Information Access (UIA) framework. In this work, we analyse the UIA framework from the reproducibility, replicability and generalizability sides. To do this, we first reproduce the original results the UIA framework achieved – highlighting a good reproducibility degree. Then we examine the behavior of UIA when using a public dataset – discovering that UIA is not always replicable. Moreover, to further investigate the generalizability of the UIA framework, we introduce some changes in its data processing and training procedures. Our empirical assessment highlights that the robustness and effectiveness of the UIA framework depend on several factors. In particular, some tasks, such as the Keyword Search, appear to be more robust, while others, such as Complementary Item Retrieval, are more vulnerable to changes in the underlying training process.
Conversational Information Access systems have experienced widespread di!usion thanks to the natural and e!ortless interactions they enable with the user. In particular, they represent an effective interaction interface for conversational search (CS) and conversa- tional recommendation (CR) scenarios. Despite their commonalities, CR and CS systems are often devised, developed, and evaluated as isolated components. Integrating these two elements would allow for handling complex information access scenarios, such as exploring unfamiliar recommended product aspects, enabling richer dialogues, and improving user satisfaction. As of today, the scarce availability of integrated datasets — focused exclusively on either of the tasks — limits the possibilities for evaluating by-design integrated CS and CR systems. To address this gap, we propose CoSRec, the "rst dataset for joint Conversational Search and Recommendation (CSR) evaluation. The CoSRec test set includes 20 high-quality conversations, with human-made annotations for the quality of conversations, and manually crafted relevance judgments for products and documents. Additionally, we provide supplementary training data comprising partially annotated dialogues and raw conversations to support diverse learning paradigms. CoSRec is the first resource to model CR and CS tasks in a unifed framework, enabling the training and evaluation of systems that must shift between answering queries and making suggestions dynamically.
Large Language Models (LLMs) hugely impacted many research fields, including Information Retrieval (IR), where they are used for many sub-tasks, such as query rewriting and retrieval augmented generation. At the same time, the research community is investigating whether and how to use LLMs to support, or even replace, humans to generate relevance judgments. Indeed, generating relevance judgements automatically – or integrating an LLM in the annotation process – would allow us to improve the number of evaluation collections, also for scenarios where the annotation process is particularly challenging. To validate relevance judgements produced by an LLM they are compared with human-made relevance judgements, measuring the inter-assessor agreement between the human and the LLM. Our work introduces an innovative framework for estimating the quality of LLM-generated relevance judgments, providing statistical guarantees while minimizing human involvement. The proposed framework allows to: i) estimate the quality of LLM-generated relevance judgments with a defined confidence while minimizing human involvement; and ii) estimate the quality of LLM-generated relevance judgments with a fixed budget while providing bounds on the estimate. Our experimental results on three well-known IR collections using multiple LLMs as assessors show it is sufficient to assess 16% of the LLM-generated relevance judgments to estimate the LLM’s performance with a 95% confidence.
Information Retrieval (IR) evaluation deeply relies on human-made relevance judgments. To overcome the high costs of the judgment collection process, a potential solution is to utilize LLMs as judges to replace human annotators. However, the validation of LLM-generated judgments is fundamental for informed use. Standard validation approaches typically rely on simple sampling techniques to collect a sample of the LLM-generated judgments and estimate the LLM agreement with the human. In this work, we propose using stratified sampling, a more sophisticated sampling strategy that, by leveraging appropriate stratification features, reduces human involvement in the validation process while still providing statistical guarantees on the human-LLM agreement estimate. Through the analysis of various candidate features, we identify the LLM-generated judgments themselves as the most promising one. Our approach achieves up to an 85% reduction in the required human involvement in the validation process.
Information Retrieval (IR) and Recommender Systems (RS) represent the core components in the information access scenario. These two categories of systems are traditionally developed in isolation and have a very limited interaction. However, since the nineties it was clear that there were significant connections between IR and RS and in recent times systems performing retrieval and recommendation jointly have been created. This contributed to showing that developing joint IR and RS systems allows to improve the performance of both tasks. The current state-of-the-art in the joint IR and RS field is represented by the Unified Information Access (UIA) framework. Driven by the importance of reproducibility, in this work, we discuss the reproducibility, replicability and generalizability of UIA. First, we analyse the reproducibility degree of UIA. Then, we focus on its replicability by studying its behaviour on a public dataset. Finally, we explore its generalizability by altering the data processing and training algorithms. The obtained results show that the performance of UIA and, in general, of joint IR and RS systems, may strongly depend on the dataset used for the training and evaluation and that its stability may vary depending on the task.
Everyone has some information need related to work tasks, entertainment or other fields. The technological components that are used to answer them usually are Information Retrieval (IR) systems and Recommender Systems (RS). Despite these two types of systems are traditionally developed in isolation, since the nineties it was clear that there were common aspects between IR and RS. Indeed, they are both concerned with retrieving the most relevant documents or items in a collection according to a user request. Only recently some efforts have been directed towards the development of joint IR and RS systems. Nonetheless, most of the created systems focus on gaining the knowledge to carry out one of the two tasks based on the data of the other. A few relevant results really addressed the issue of joint IR and RS but they present several limitations: most of existing models are jointly optimized by aggregating data from both tasks without considering that users’ intents in IR and RS sometimes may be different; current models focus on personalization without considering cold-start users; lack of appropriate, public datasets suitable for training and evaluating such models. This paper outlines the author’s PhD research objectives in designing new models and resources that allow to overcome the discussed limitations.
Conversational Information Access systems have undergone widespread adoption due to the natural and seamless interactions they enable with the user. In particular, they provide an effective interaction interface for both Con- versational Search (CS) and Conversational Recommendation (CR) scenarios. Despite their inherent similarities, current research frequently address CS and CR systems as distinct and isolated entities. The integration of these two capabilities would enable to address complex information access scenarios, including the exploration of unfamiliar features of recommended products, which leads to richer dialogues and enhanced user satisfaction. At current time, the evaluation of integrated by-design CS and CR systems is severely hindered by the limited availability of comprehensive datasets that jointly address both tasks. To bridge this gap, we introduce CoSRec, the first dataset for joint Conversational Search and Recommendation (CSR) evaluation. The CoSRec test set includes 20 high-quality conversations, with human-made annotations for the quality of conversations, and manually crafted relevance judgments for products and documents. In addition, we provide auxiliary training resources, including partially annotated dialogues and raw conversations, to support diverse learning paradigms. CoSRec is the first resource to model CS and CR tasks within a unified framework, facilitating the design, de- velopment, and evaluation of systems capable of dynamically alternating between answering user queries and offering personalized recommendations.