Filter by Type

Filter by Year

Sort by Year

AWARE: Exploiting Evaluation Measures to Combine Multiple Assessors

Marco Ferrante, Nicola Ferro and Maria Maistro
Journal Paper Accepted in June 2017, ACM Transaction on Information Systems

Abstract

We propose the Assessor-driven Weighted Averages for Retrieval Evaluation (AWARE) probabilistic framework, a novel methodology for dealing with multiple crowd assessors, who may be contradictory and/or noisy. By modeling relevance judgements and crowd assessors as sources of uncertainty, AWARE takes the expectation of a generic performance measure, like Average Precision (AP), composed with these random variables. In this way, it approaches the problem of aggregating different crowd assessors from a new perspective, i.e. directly combining the performance measures computed on the ground-truth generated by the crowd assessors instead of adopting some classification technique to merge the labels produced by them. We propose several unsupervised estimators that instantiate the AWARE framework and we compare them with state-of-the-art approaches, i.e. Majority Vote (MV) and Expectation Maximization (EM), on TREC collections. We found that AWARE approaches improve in terms of their capability of correctly ranking systems and predicting their actual performance scores.

Understanding User Behavior in Job and Talent Search: An Initial Investigation

Damiano Spina, Maria Maistro, Yongli Ren, Sargol Sadeghi, Wilson Wong, Timothy Baldwin, Lawrence Cavedon, Alistair Moffat, Mark Sanderson, Falk Scholer, and Justin Zobel
Workshop Paper In J. Degenhardt, S. Kallumadi, M. de Rijke, L. Si, A. Trotman, and X. Yinghui, editors, Proceedings of the 2017 SIGIR workshop On eCommerce, (eCom 2017), co-located with the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017), Tokyo, Japan, August 11, 2017, CEUR Workshop Proceedings.

Abstract

The Web has created a global marketplace for e-Commerce as well as for talent. Online employment marketplaces provide an effective channel to facilitate the matching between job seekers and hirers. This paper presents an initial exploration of user behavior in job and talent search using query and click logs from a popular employment marketplace. The observations suggest that the understanding of users’ search behavior in this scenario is still at its infancy and that some of the assumptions made in general web search may not hold true. The open challenges identified so far are presented.

On Including the User Dynamic in Learning to Rank

Nicola Ferro, Claudio Lucchese, Maria Maistro and Raffaele Perego
Conference PaperIn N. Kando, T. Sakai, H. Joho, H. Li, A. P. de Vries, and R. White, editors, Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017), Tokyo, Japan, August 7-11, 2017, pages 1041-1044, ACM Press, New York, USA.

Abstract

Ranking query results effectively by considering user past behaviour and preferences is a primary concern for IR researchers both in academia and industry. In this context, LtR is widely believed to be the most effective solution to design ranking models that account for user-interaction features that have proved to remarkably impact on IR e ectiveness. In this paper, we explore the possibility of integrating the user dynamic directly into the LtR algorithms. Specifically, we model with Markov chains the behaviour of users in scanning a ranked result list and we modify LambdaMart, a state-of-the-art LtR algorithm, to exploit a new discount loss function calibrated on the proposed Markovian model of user dynamic. We evaluate the performance of the proposed approach on publicly available LtR datasets, finding that the improvements measured over the standard algorithm are statistically significant.

Extending Learning to Rank with User Dynamic.

Nicola Ferro, Claudio Lucchese, Maria Maistro and Raffaele Perego
Workshop Paper In F. Crestani, T. Di Noia, and R. Perego, editors, Proceedings of the 8th Italian Information Retrieval Workshop, (IIR 2017), Lugano, Switzerland, June 5-7, 2017.

Abstract

In this paper we explore the possibility of integrating the user dynamic directly into LambdaMart by modeling the user behaviour with Markov chains and by defining a new discount loss function calibrated on the proposed model. This approach achieves significantly better performances than standard algorithms.

A Game of Lines: Developing Game Mechanics for Text Classification.

Giorgio Maria Di Nunzio, Maria Maistro and Daniel Zilio
Workshop Paper In F. Crestani, T. Di Noia, and R. Perego, editors, Proceedings of the 8th Italian Information Retrieval Workshop, (IIR 2017), Lugano, Switzerland, June 5-7, 2017, CEUR Workshop Proceedings.

Abstract

In this paper, we describe a set of experiments that turn the machine learning classification task into a game, through gamification techniques, and let non expert users to perform text classification without even knowing the problem. The application is implemented in R using the Shiny package for interactive graphics. We present the outcome of three different experiments: a pilot experiment with PhD and post-doc students, and two experiments carried out with primary and secondary school students. The results show that the human aided classifier performs similarly and sometimes even better than state of the art classifiers.

Adapting Information Retrieval to User Signals via Stochastic Models

Maria Maistro
Conference Paper In M. de Rijke, M. Shokouhi, A. Tomkins, and M. Zhang, editors, Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM 2017), Cambridge, United Kingdom, February 6-10, 2017, pages 843-843, ACM Press, New York, USA.

Abstract

To address the challenge of adapting Information Retrieval (IR) to the constantly evolving user tasks and needs and to adjust it to user interactions and preferences we develop a new model of user behavior based on Markov chains. We aim at integrating the proposed model into several aspects of IR, i.e. evaluation measures, systems and collections. Firstly, we studied IR evaluation measures and we propose a theoretical framework to describe their properties. Then, we presented a new family of evaluation measures, called Markov Precision (MP), based on the proposed model and able to explicitly link lab-style and on-line evaluation metrics. Future work will include the presented model into Learning to Rank (LtR) algorithms and will define a collection for evaluation and comparison of Personalized Information Retrieval (PIR) systems.

The University of Padua (IMS) at TREC 2016 Total Recall Track

Giorgio Maria Di Nunzio, Maria Maistro and Daniel Zilio
Conference Paper In E. M. Voorhees and A. Ellis, editors, Proceedings of the 25th Text REtrieval Conference (TREC 2016), Gaithersburg, Maryland, USA, November 15-18, 2016, National Institute of Standards and Technology (NIST).

Abstract

The participation of the Information Management System (IMS) Group of the University of Padua in the Total Recall track at TREC 2016 consisted in a set of fully automated experiments based on the two-dimensional probabilistic model. We trained the model in two ways that tried to mimic a real user, and we compared it to two versions of the BM25 model with different parameter settings. This initial set of experiments lays the ground for a wider study that will explore a gamification approach in the context of high recall situations.

Gamification for IR: The Query Aspects Game

Giorgio Maria Di Nunzio, Maria Maistro and Daniel Zilio
Workshop PaperIn P. Basile, A. Corazza, F. Cutugno, S. Montemagni, M. Nissim, V. Patti, G. Semeraro, and R. Sprugnoli, editors, Proceeding of the 3rd Italian Conference on Computational Linguistics, (CLiC-it 2016), Napoli, Italy, December 5-6, 2016, CEUR Workshop Proceedings, Volume 1749.

Abstract

The creation of a labelled dataset for Information Retrieval (IR) purposes is a costly process. For this reason, a mix of crowd-sourcing and active learning approaches have been proposed in the literature in order to assess the relevance of documents of a collection given a particular query at an affordable cost. In this paper, we present the design of the gamification of this interactive process that draws inspiration from recent works in the area of gamification for IR. In particular, we focus on three main points: i) we want to create a set of relevance judgements with the least effort by human assessors, ii) we use interactive search interfaces that use game mechanics, iii) we use Natural Language Processing (NLP) to collect different aspects of a query.

Gamification for Machine Learning: The Classification Game

Giorgio Maria Di Nunzio, Maria Maistro and Daniel Zilio
Workshop Paper In F. Hopfgartner, G. Kazai, U. Kruschwitz, and M. Meder, editors, Proceedings of the the Third International Workshop on Gamification for Information Retrieval, (GamifIR 2016), co-located with the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016), Pisa, Italy, July 21, 2016, CEUR Workshop Proceedings, Volume 1642, pages 45-52.

Abstract

The creation of a labelled dataset for machine learning purposes is a costly process. In recent works, it has been shown that a mix of crowd-sourcing and active learning approaches can be used to annotate objects at an affordable cost. In this paper, we study the gamification of machine learning techniques; in particular, the problem of classification of objects. In this first pilot study, we designed a simple game, based on a visual interpretation of probabilistic classifiers, that consists in separating two sets of coloured points on a two-dimensional plane by means of a straight line. We present the current results of this first experiment that we used to collect the requirements for the next version of the game and to analyze i) what is the 'price' to build a reasonably accu- rate classifier with a small amount of labelled objects, ii) and compare the accuracy of the player to the state-of-the-art classification algorithms.

Basis of a Formal Framework for Information Retrieval Evaluation Measurements

Marco Ferrante, Nicola Ferro and Maria Maistro
Workshop Paper In G. M. Di Nunzio, F. M. Nardini, and S. Orlando, editors, Proceedings of the 7th Italian Information Retrieval Workshop, (IIR 2016), Venezia, Italy, May 30-31, 2016, CEUR Workshop Proceedings, Volume 1653.

Abstract

In this paper we present a formal framework, based on the representational theory of measurement and we define and study the properties of utility-oriented measurements of retrieval effectiveness like AP, RBP, ERR and many other popular IR evaluation measures.

Towards a Formal Framework for Utility-oriented Measurements of Retrieval Effectiveness

Marco Ferrante, Nicola Ferro and Maria Maistro
Conference PaperIn J. Allan, W. B. Croft, A. P. de Vries, C. Zhai, N. Fuhr, Y. Zhang, editors, Proceedings of the ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2015), Northampton, Massachusetts, USA, September 27-30, 2015, pages 21–30. ACM Press, New York, USA.

Abstract

In this paper we present a formal framework to define and study the properties of utility-oriented measurements of retrieval effectiveness, like AP, RBP, ERR and many other popular IR evaluation measures. The proposed framework is laid in the wake of the representational theory of measurement, which provides the foundations of the modern theory of measurement in both physical and social sciences, thus contributing to explicitly link IR evaluation to a broader context.

The proposed framework is minimal, in the sense that it relies on just one axiom, from which other properties are derived. Finally, it contributes to a better understanding and a clear separation of what issues are due to the inherent problems in comparing systems in terms of retrieval effec- tiveness and what others are due to the expected numerical properties of a measurement.

Unfolding Off-the-shelf IR Systems for Reproducibility

Emanuele Di Buccio, Giorgio Maria Di Nunzio, Nicola Ferro, Donna Harman, Maria Maistro and Gianmaria Silvello
Workshop PaperIn J. Arguello, F. Diaz, J. Lin, A. Trotman, editors, Proceedings of the First Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR), co-located with 38th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (SIGIR 2015), Santiago, Chile, August 13, 2015.

Abstract

In this position paper, we discuss the issue of how to ensure reproducibility of the results when off-the-shelf open source Information Retrieval (IR) systems are used. These systems provided a great advancement to the field but they rely on many configurations parameters which are often implicit or hidden in the documentation and/or source code. If not fully understood and made explicit, these parameters may make it difficult to reproduce results or even to understand why a system is not behaving as expected.

The paper provides examples of the effects of hidden pa- rameters in off-the-shelf IR systems, describes the enabling technologies needed to embody the approach, and show how these issues can be addressed in the broader context of com- ponent based IR evaluation.

We propose a solution for systematically unfolding the configuration details of off-the-shelf IR systems and under- standing whether a particular instance of a system using is behaving as expected. The proposal requires to: 1) build a taxonomy of components used by off-the-shelf systems, 2) uniquely identify them and their combination in a given configuration, 3) run each configuration on standard test collections, 4) compute the expected performance measures for each run, 4) and publish on a Web portal all the gathered information in order to make accessible and comparable for everybody how an off-the-shelf system with a given config- uration is expected to behave.

Markov Precision: Modelling User Behaviour over Rank and Time

Marco Ferrante, Nicola Ferro and Maria Maistro
Workshop PaperProceedings of the 6th Italian Information Retrieval Workshop, (IIR 2015). P. Boldi, R. Perego, F. Sebastiani, editors, 2014, CEUR Workshop Proceedings, Volume 1404.

Abstract

We propose a family of new evaluation measures, called Markov Precision (MP), which exploits continuous-time and discrete-time Markov chains and we conduct a thorough experimental evaluation providing also an example of calibration of its time parameters.

Improving Information Retrieval Evaluation via Markovian User Models and Visual Analytics

Maria Maistro
Workshop PaperIn L. Azzopardi, M. L. Wilson, I. Kompatsiaris, S. Papadopoulos, T. Tsikrika, S. Vrochidis, editors, Proceedings of the 6th Symposium on Future Directions in Information Access (FDIA 2015), pages 16 - 19, Electronic Workshops in Computing (eWiC).

Abstract

To address the challenge of adapting experimental evaluation to the constantly evolving user tasks and needs, we develop a new family of Markovian Information Retrieval (IR) evaluation measures, called Markov Precision (MP), where the interaction between the user and the ranked result list is modelled via Markov chains, and which will be able to explicitly link lab-style and on-line evaluation methods.

Moreover, since experimental results are often not so easy to understand, we will develop a Web-based Visual Analytics (VA) prototype where an animated state diagram of the Markov chain will explain how the user is interacting with the ranked result list in order to offer a support for a careful failure analysis.

Rethinking How to Extend Average Precision to Graded Relevance

Marco Ferrante, Nicola Ferro and Maria Maistro
Conference Paper In E. Kanoulas, M. Lupu, P. D. Clough, M. Sanderson, M. M. Hall, A. Hanbury, E. G. Toms, K. Järvelin, editors, Information Access Evaluation. Multilinguality, Multimodality, and Interaction, Proceeidings of the 5th International Conference of the CLEF Initiative, (CLEF 2014), Sheffield, UK, September 15-18, 2014.

Abstract

We present two new measures of retrieval effectiveness, inspired by Graded Average Precision (GAP), which extends Average Precision (AP) to graded relevance judgements. Starting from the random choice of a user, we define Extended Graded Average Precision (xGAP) and Expected Graded Average Precision (eGAP), which are more accurate than GAP in the case of a small number of highly relevant documents with high probability to be considered relevant by the users. The proposed measures are then evaluated on TREC 10, TREC 14, and TREC 21 collections showing that they actually grasp a different angle from GAP and that they are robust when it comes to incomplete judgments and shallow pools. <\p>

Injecting User Models and Time into Precision via Markov Chains

Marco Ferrante, Nicola Ferro and Maria Maistro
Conference PaperIn S. Geva, A. Trotman, P. Bruza, C. L. A. Clarke, K. Järvelin, editors, Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2014), Gold Coast, QLD, Australia, July 6-11, 2014, pages 597 - 606, ACM Press, New York, USA.

Abstract

We propose a family of new evaluation measures, called Markov Precision (MP), which exploits continuous-time and discrete-time Markov chains in order to inject user models into precision. Continuous-time MP behaves like timecalibrated measures, bringing the time spent by the user into the evaluation of a system; discrete-time MP behaves like traditional evaluation measures. Being part of the same Markovian framework, the time-based and rank-based versions of MP produce values that are directly comparable. <\p>

We show that it is possible to re-create average precision using specific user models and this helps in providing an explanation of Average Precision (AP) in terms of user models more realistic than the ones currently used to justify it. We also propose several alternative models that take into account different possible behaviors in scanning a ranked result list.

Finally, we conduct a thorough experimental evaluation of MP on standard TREC collections in order to show that MP is as reliable as other measures and we provide an example of calibration of its time parameters based on click logs from Yandex.