Filter by Type

Filter by Year

Sort by Year

Simple but Effective Knowledge-Based Query Reformulations for Precision Medicine Retrieval

Stefano Marchesin, Giorgio Maria Di Nunzio, and Maristella Agosti
Journal Paper MDPI Information, accepted for publication, 2021, pages 1-28.

Abstract

In Information Retrieval (IR), the semantic gap represents the mismatch between users’ queries and how retrieval models answer to these queries. In this paper, we explore how to use external knowledge resources to enhance bag-of-words representations and reduce the effect of the semantic gap between queries and documents. In this regard, we propose several knowledge-based query expansion and reduction techniques and we evaluate them for the medical domain – where the semantic gap is prominent and the presence of manually curated knowledge resources allows for the development of knowledge-enhanced methods to address it. The proposed query reformulations are used to increase the probability of retrieving relevant documents through the addition to or the removal from the original query of highly specific terms. The experimental analyses on different test collections for Precision Medicine – a particular use case of Clinical Decision Support – show the effectiveness of the developed query reformulations. In particular, a specific subset of query reformulations allow IR models to achieve top performing results in all the considered collections.

What Makes a Query Semantically Hard?

Guglielmo Faggioli and Stefano Marchesin
Int. Conference Paper Proceedings of the Second International Conference on Design of Experimental Search & Information REtrieval Systems (DESIRES 2021), Padova, Italy, September 15-18, 2021, pages 61-69.

Abstract

Traditional Information Retrieval (IR) models, also known as lexical models, are hindered by the semantic gap, which refers to the mismatch between different representations of the same underlying concept. To address this gap, semantic models have been developed. Semantic and lexical models exploit complementary signals that are best suited for different types of queries. For this reason, these model categories should not be used interchangeably, but should rather be properly alternated depending on the query. Therefore, it is important to identify queries where the semantic gap is prominent and thus semantic models prove effective. In this work, we quantify the impact of using semantic or lexical models on different queries, and we show that the interaction between queries and model categories is large. Then, we propose a labeling strategy to classify queries into semantically hard or easy, and we deploy a prototype classifier to discriminate between them.

Multi-Scale Task Multiple Instance Learning for the Classification of Digital Pathology Images with Global Annotations

Niccolò Marini, Sebastian Otálora, Francesco Ciompi, Gianmaria Silvello, Stefano Marchesin, Simona Vatrano, Genziana Buttafuoco, Manfredo Atzori, and Henning Müller
Int. Workshop Paper Proceedings of the MICCAI Computational Pathology (COMPAY) Workshop, Strasbourg, France, September 27, 2021, pages 170-181.

Abstract

Whole slide images (WSIs) are high-resolution digitized images of tissue samples, stored including different magnification levels. WSIs datasets often include only global annotations, available thanks to pathology reports. Global annotations refer to global findings in the high-resolution image and do not include information about the location of the regions of interest or the magnification levels used to identify a finding. This fact can limit the training of machine learning models, as WSIs are usually very large and each magnification level includes different information about the tissue. This paper presents a Multi-Scale Task Multiple Instance Learning (MuSTMIL) method, allowing to better exploit data paired with global labels and to combine contextual and detailed information identified at several magnification levels. The method is based on a multiple instance learning framework and on a multi-task network, that combines features from several magnification levels and produces multiple predictions (a global one and one for each magnification level involved). MuSTMIL is evaluated on colon cancer images, on binary and multilabel classification. MuSTMIL shows an improvement in performance in comparison to both single scale and another multi-scale multiple instance learning algorithm, demonstrating that MuSTMIL can help to better deal with global labels targeting full and multi-scale images.

SAFIR: a Semantic-Aware Neural Framework for IR

Maristella Agosti, Stefano Marchesin, and Gianmaria Silvello
Nat. Conference Paper Proceedings of the 11th Italian Information Retrieval Workshop (IIR 2021), Bari, Italy, September 13-15, 2021, pages 4.

Abstract

The semantic mismatch between query and document terms – i.e., the semantic gap – is a long-standing problem in Information Retrieval (IR). Two main linguistic features related to the semantic gap that can be exploited to improve retrieval are synonymy and polysemy. Recent works integrate knowledge from curated external resources into the learning process of neural language models to reduce the effect of the semantic gap. However, these knowledge-enhanced language models have been used in IR mostly for re-ranking. We propose the Semantic-Aware Neural Framework for IR (SAFIR), an unsupervised knowledge-enhanced neural framework explicitly tailored for IR. SAFIR jointly learns word, concept, and document representations from scratch. The learned representations encode both polysemy and synonymy to address the semantic gap. We investigate SAFIR application in the medical domain, where the semantic gap is prominent and there are many specialized and manually curated knowledge resources. The evaluation on shared test collections for medical retrieval shows the effectiveness of SAFIR to address the semantic gap.

Developing Unsupervised Knowledge-Enhanced Models to Reduce the Semantic Gap in Information Retrieval

Stefano Marchesin
Journal w/o Peer Review Paper SIGIR Forum, Volume 55, Issue 1, Article 18, 2021, pages 1-2.

Abstract

In this thesis we tackle the semantic gap, a long-standing problem in Information Retrieval (IR). The semantic gap can be described as the mismatch between users' queries and the way retrieval models answer to such queries. Two main lines of work have emerged over the years to bridge the semantic gap: (i) the use of external knowledge resources to enhance the bag-of-words representations used by lexical models, and (ii) the use of semantic models to perform matching between the latent representations of queries and documents. To deal with this issue, we first perform an in-depth evaluation of lexical and semantic models through different analyses [Marchesin et al., 2019]. The objective of this evaluation is to understand what features lexical and semantic models share, if their signals are complementary, and how they can be combined to effectively address the semantic gap. In particular, the evaluation focuses on (semantic) neural models and their critical aspects. Each analysis brings a different perspective in the understanding of semantic models and their relation with lexical models. The outcomes of this evaluation highlight the differences between lexical and semantic signals, and the need to combine them at the early stages of the IR pipeline to effectively address the semantic gap. Then, we build on the insights of this evaluation to develop lexical and semantic models addressing the semantic gap. Specifically, we develop unsupervised models that integrate knowledge from external resources, and we evaluate them for the medical domain - a domain with a high social value, where the semantic gap is prominent, and the large presence of authoritative knowledge resources allows us to explore effective ways to address it. For lexical models, we investigate how - and to what extent - concepts and relations stored within knowledge resources can be integrated in query representations to improve the effectiveness of lexical models. Thus, we propose and evaluate several knowledge-based query expansion and reduction techniques [Agosti et al., 2018, 2019; Di Nunzio et al., 2019]. These query reformulations are used to increase the probability of retrieving relevant documents by adding to or removing from the original query highly specific terms. The experimental analyses on different test collections for Precision Medicine - a particular use case of Clinical Decision Support (CDS) - show the effectiveness of the proposed query reformulations. In particular, a specific subset of query reformulations allow lexical models to achieve top performing results in all the considered collections. Regarding semantic models, we first analyze the limitations of the knowledge-enhanced neural models presented in the literature. Then, to overcome these limitations, we propose SAFIR [Agosti et al., 2020], an unsupervised knowledge-enhanced neural framework for IR. SAFIR integrates external knowledge in the learning process of neural IR models and it does not require labeled data for training. Thus, the representations learned within this framework are optimized for IR and encode linguistic features that are relevant to address the semantic gap. The evaluation on different test collections for CDS demonstrate the effectiveness of SAFIR when used to perform retrieval over the entire document collection or to retrieve documents for Pseudo Relevance Feedback (PRF) methods - that is, when it is used at the early stages of the IR pipeline. In particular, the quantitative and qualitative analyses highlight the ability of SAFIR to retrieve relevant documents affected by the semantic gap, as well as the effectiveness of combining lexical and semantic models at the early stages of the IR pipeline - where the complementary signals they provide can be used to obtain better answers to semantically hard queries.

Learning Unsupervised Knowledge-Enhanced Representations to Reduce the Semantic Gap in Information Retrieval

Maristella Agosti, Stefano Marchesin, and Gianmaria Silvello
Journal Paper ACM Transactions on Information Systems (TOIS), Volume 38, Issue 4, Article 38, 2020, pages 1-48.

Abstract

The semantic mismatch between query and document terms – i.e., the semantic gap – is a long-standing problem in Information Retrieval (IR). Two main linguistic features related to the semantic gap that can be exploited to improve retrieval are synonymy and polysemy. Recent works integrate knowledge from curated external resources into the learning process of neural language models to reduce the effect of the semantic gap. However, these knowledge-enhanced language models have been used in IR mostly for re-ranking and not directly for document retrieval. We propose the Semantic-Aware Neural Framework for IR (SAFIR), an unsupervised knowledge-enhanced neural framework explicitly tailored for IR. SAFIR jointly learns word, concept, and document representations from scratch. The learned representations encode both polysemy and synonymy to address the semantic gap. SAFIR can be employed in any domain where external knowledge resources are available. We investigate its application in the medical domain where the semantic gap is prominent and there are many specialized and manually curated knowledge resources. The evaluation on shared test collections for medical literature retrieval shows the effectiveness of SAFIR in terms of retrieving and ranking relevant documents most affected by the semantic gap.

Focal Elements of Neural Information Retrieval Models. An Outlook through a Reproducibility Study

Stefano Marchesin, Alberto Purpura and Gianmaria Silvello
Journal Paper Information Processing & Management (IP&M), Volume 57, Issue 6, 2020, pages 102109.

Abstract

This paper analyzes two state-of-the-art Neural Information Retrieval (NeuIR) models: the Deep Relevance Matching Model (DRMM) and the Neural Vector Space Model (NVSM). Our contributions include: (i) a reproducibility study of two state-of-the-art supervised and unsupervised NeuIR models, where we present the issues we encountered during their reproducibility; (ii) a performance comparison with other lexical, semantic and state-of-the-art models, showing that traditional lexical models are still highly competitive with DRMM and NVSM; (iii) an application of DRMM and NVSM on collections from heterogeneous search domains and in different languages, which helped us to analyze the cases where DRMM and NVSM can be recommended; (iv) an evaluation of the impact of varying word embedding models on DRMM, showing how relevance-based representations generally outperform semantic-based ones; (v) a topic-by-topic evaluation of the selected NeuIR approaches, comparing their performance to the well-known BM25 lexical model, where we perform an in-depth analysis of the different cases where DRMM and NVSM outperform the BM25 model or fail to do so. We run an extensive experimental evaluation to check if the improvements of NeuIR models, if any, over the selected baselines are statistically significant.

A Study on Reciprocal Ranking Fusion in Consumer Health Search. IMS UniPD at CLEF eHealth 2020 Task 2

Giorgio Maria di Nunzio, Stefano Marchesin, and Federica Vezzani
Int. Conference Paper Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, September 22-25, 2020, pages 7.

Abstract

In this paper, we describe the results of the participation of the Information Management Systems (IMS) group at CLEF eHealth 2020 Task 2, Consumer Health Search Task. In particular, we participated in both subtasks: Ad-hoc IR and Spoken queries retrieval. The goal of our work was to evaluate the reciprocal ranking fusion approach over 1) different query variants; 2) different retrieval functions; 3) w/out pseudo-relevance feedback. The results show that, on average, the best performances are obtained by a ranking fusion approach together with pseudo-relevance feedback.

A Post-Analysis of Query Reformulation Methods for Clinical Trials Retrieval

Maristella Agosti, Giorgio Maria di Nunzio, and Stefano Marchesin
Nat. Conference Paper Proceedings of the 28th Italian Symposium on Advanced Database Systems (SEBD 2020), Villasimius, Sardinia, Italy, June 21-24, 2020, pages 152-159.

Abstract

The Precision Medicine (PM) track of the Text REtrieval Conference (TREC) focuses on providing useful precision medicine information to clinicians treating cancer patients. The PM track gives the unique opportunity to evaluate medical IR systems on two different collections: scientific literature and clinical trials. In this paper, we evaluate several state-of-the-art query expansion and reduction methods to see whether a particular approach can be helpful in clinical trials retrieval. We present those approaches that are consistently effective in all three TREC PM editions and we compare them to the results obtained by the research groups who participated in all three editions.

Reproducibility of the Neural Vector Space Model via Docker

Nicola Ferro, Stefano Marchesin, Alberto Purpura, and Gianmaria Silvello
Nat. Conference Paper Proceedings of the 16th Italian Research Conference on Digital Libraries (IRCDL 2020), Bari, Italy, January 30-31, 2020, pages 3-8.

Abstract

In this work we describe how Docker images can be used to enhance the reproducibility of Neural IR models. We report our results reproducing the Vector Space Neural Model (NVSM) and we release a CPU-based and a GPU-based Docker image. Finally, we present some insights about reproducing Neural IR models.

Exploring how to Combine Query Reformulations for Precision Medicine

Giorgio Maria Di Nunzio, Stefano Marchesin, and Maristella Agosti
Int. Conference Paper Proceedings of the 28th Text REtrieval Conference (TREC 2019), Gaithersburg, Maryland, USA, November 13-15, 2019, pages 14.

Abstract

We report on our participation as the IMS Unipd team in both TREC PM 2019 tasks. The objective of the work is twofold: (i) we want to evaluate how different query reformulations affect the results and whether the findings obtained in previous years remain valid; (ii) we want to verify if combining different query reformulations based on expansion and reduction techniques prove effective in such a highly specific scenario. In particular, we designed a procedure to (1) filter out clinical trials based on demographic data, (2) perform query reformulations – both expansion and reduction techniques – based on knowledge bases to increase the probability of findings relevant documents, (3) apply rank fusion techniques to the rankings produced by the different query reformulations. We consider those query reformulations that have been validated on previous TREC PM experimental collections. These queries represent the most effective reformulations for our system on those topics/collections. The results obtained – especially in the clinical trials task – validate our assumptions and provide interesting insights in terms of the different per-topic effectiveness of the query reformulations.

Knowledge Enhanced Representations for Clinical Decision Support (Abstract)

Stefano Marchesin, and Maristella Agosti
Nat. Conference Paper Proceedings of the 10th Italian Information Retrieval Workshop (IIR 2019), Padua, Italy, September 16-18, 2019, pages 17-18.

Abstract

The study presents a methodology that contributes to reduce the semantic gap in clinical decision support systems. The methodology integrates semantic information – provided by external knowledge resources – into unsupervised neural Information Retrieval (IR) models. The objective is to design and develop innovative methods that can be effective in real-case medical scenarios.

A Docker-Based Replicability Study of a Neural Information Retrieval Model

Nicola Ferro, Stefano Marchesin, Alberto Purpura and Gianmaria Silvello
Int. Workshop Paper Proceedings of the Open-Source IR Replicability Challenge (OSIRRC 2019), Paris, France, July 25, 2019, pages 37-43.

Abstract

In this work, we propose a Docker image architecture for the replicability of Neural IR (NeuIR) models. We also share two self-contained Docker images to run the Neural Vector Space Model (NVSM)[22], an unsupervised NeuIR model. The first image we share (nvsm_cpu) can run on most machines and relies only on CPU to perform the required computations. The second image we share (nvsm_gpu) relies instead on the Graphics Processing Unit (GPU) of the host machine, when available, to perform computationally intensive tasks, such as the training of the NVSM model. Furthermore, we discuss some insights on the engineering challenges we encountered to obtain deterministic and consistent results from NeuIR models, relying on TensorFlow within Docker. We also provide an in-depth evaluation of the differences between the runs obtained with the shared images. The differences are due to the usage within Docker of TensorFlow and CUDA libraries–whose inherent randomness alter, under certain circumstances, the relative order of documents in rankings.

An Analysis of Query Reformulation Techniques for Precision Medicine

Maristella Agosti, Giorgio Maria Di Nunzio and Stefano Marchesin
Int. Conference Paper Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019), Paris, France, July 21-25, 2019, pages 973-976.

Abstract

The Precision Medicine (PM) track at the Text REtrieval Conference (TREC) focuses on providing useful precision medicine-related information to clinicians treating cancer patients. The PM track gives the unique opportunity to evaluate medical IR systems using the same set of topics on two different collections: scientific literature and clinical trials. In the paper, we take advantage of this opportunity and we propose and evaluate state-of-the-art query expansion and reduction techniques to identify whether a particular approach can be helpful in both scientific literature and clinical trial retrieval. We present those approaches that are consistently effective in both TREC editions and we compare the results obtained with the best performing runs submitted to TREC PM 2017 and 2018.

Knowledge Enhanced Representations to Reduce the Semantic Gap in Clinical Decision Support

Stefano Marchesin
Int. Conference Paper Proceedings of the 9th PhD Symposium on Future Directions in Information Access (FDIA 2019), Milan, Italy, July 17, 2019, pages 4-9.

Abstract

The semantic gap between queries and documents is a longstanding problem in Information Retrieval (IR), and it poses a critical challenge for medical IR due to the large presence in the medical language of synonymous and polysemous words, along with context-specific expressions. Two main lines of work have emerged in the past years to tackle this issue: (i) the use of external knowledge resources to enhance query and document bag-of-words representations; and (ii) the use of semantic models, based on the distributional hypothesis, which perform matching on latent representations of documents and queries. The presented research investigates the use of external knowledge resources in both lines – with a focus on knowledge-enhanced unsupervised neural latent representations and their analysis in terms of effectiveness and semantic representativeness.

Medical Retrieval using Structured Information Extracted from Knowledge Bases (Discussion Paper)

Maristella Agosti, Giorgio Maria Di Nunzio, Stefano Marchesin and Gianmaria Silvello
Nat. Conference Paper Proceedings of the 27th Italian Symposium on Advanced Database Systems (SEBD 2019), Castiglione della Pescaia (Grosseto), Italy, 16-19 June 2019, pages 8.

Abstract

We investigate how semantic relations between concepts extracted from medical documents, and linked to a reference knowledge base, can be employed to improve the retrieval of medical literature. Semantic relations explicitly represent relatedness between concepts and carry high informative power that can be leveraged to improve the effectiveness of the retrieval. We present preliminary results and show how relations are able to provide a sizable increase of the precision for several topics, albeit having no impact on others. We then discuss some future directions to minimize the impact of negative results while maximizing the impact of good results.

The University of Padua IMS Research Group at TREC 2018 Precision Medicine Track

Maristella Agosti, Giorgio Maria Di Nunzio and Stefano Marchesin
Int. Conference Paper Proceedings of the 27th Text REtrieval Conference Proceedings (TREC 2018), Gaithersburg, Maryland, USA, November 14-16, 2018, pages 10.

Abstract

We report on the participation of the Information Management System (IMS) Research Group of the University of Padua in the second task of the Precision Medicine Track at TREC 2018: the Clinical Trials task. We designed a procedure to: i) expand query terms iteratively, based on knowledge bases, to increase the probability of finding relevant trials by adding neoplasm, gene, and protein term variants to the initial query; ii) filter out trials based on demographic data. We submitted three runs: a plain BM25 using the provided textual fields and as query, a BM25 with a first knowledge-based query expansion, and another BM25 with an additional knowledge-based query expansion. This initial set of experiments lays the ground for a deeper study on the effectiveness of (automatic) knowledge-based expansion techniques in the context of precision medicine.

The University of Padua IMS Research Group at CENTRE@TREC 2018

Giorgio Maria Di Nunzio and Stefano Marchesin
Int. Conference Paper Proceedings of the 27th Text REtrieval Conference Proceedings (TREC 2018), Gaithersburg, Maryland, USA, November 14-16, 2018, pages 10.

Abstract

In this paper, we present our participation in one of the tasks of the CENTRE@TREC 2018 Track: the Clinical Decision Support task. We describe the steps of the original paper we wanted to reproduce, identifying the elements of ambiguity that may affect the reproducibility of the results. The experimental results we obtained follow a similar trend to those presented in the original paper: using clinical trials’ “note” field decreases the retrieval performances significantly, while the pseudo-relevance feedback approach together with query expansion achieves the best results across different measures. In the experimental results we find out that the choice of the stoplist is fundamental to achieve a reasonable level of reproducibility. However, stoplist creation is not described sufficiently well in the original paper.

A Relation Extraction Approach for Clinical Decision Support

Maristella Agosti, Giorgio Maria Di Nunzio, Stefano Marchesin and Gianmaria Silvello
Int. Workshop Paper Proocedings of the CIKM 2018 Workshops co-located with the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), Torino, Italy, October 22, 2018, pages 6.

Abstract

In this paper, we investigate how semantic relations between concepts extracted from medical documents can be employed to improve the retrieval of medical literature. Semantic relations explicitly represent relatedness between concepts and carry high informative power that can be leveraged to improve the effectiveness of retrieval functionalities of clinical decision support systems. We present preliminary results and show how relations are able to provide a sizable increase of the precision for several topics, albeit having no impact on others. We then discuss some future directions to minimize the impact of negative results while maximizing the impact of good results.

Implicit-Explicit Representations for Case-Based Retrieval

Stefano Marchesin
Int. Conference Paper Proceedings of the 1st Biennial Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES 2018), Bertinoro, Italy, August 28-31 2018, page 109.

Abstract

We propose an IR framework to combine the implicit representations — identified using distributional representation techniques — and the explicit representations — derived from external knowledge sources — of documents to improve medical case-based retrieval. Combining implicit-explicit representations of documents aims at enriching the semantic understanding of documents and reducing the semantic gap between documents and queries.

Case-Based Retrieval Using Document-Level Semantic Networks

Stefano Marchesin
Int. Conference Paper Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), Ann Arbor, Michigan (USA), July 8-12, 2018, page 1451.

Abstract

We propose a research that aims at improving the effectiveness of case-based retrieval systems through the use of automatically created document-level semantic networks. The proposed research leverages the recent advancements in information extraction and relational learning to revisit and advance the core ideas of concept-centered hypertext models. The automatic extraction of semantic relations from documents---and their centrality in the creation and exploitation of the documents' semantic networks---represents our attempt to go one step further than previous approaches.

A Concept-Centered Hypertext Approach to Case-Based Retrieval

Stefano Marchesin
Journal w/o Peer Review Paper CoRR, abs/1811.11133, 2018.

Abstract

The goal of case-based retrieval is to assist physicians in the clinical decision making process, by finding relevant medical literature in large archives. We propose a research that aims at improving the effectiveness of case-based retrieval systems through the use of automatically created document-level semantic networks. The proposed research tackles different aspects of information systems and leverages the recent advancements in information extraction and relational learning to revisit and advance the core ideas of conceptcentered hypertext models. We propose a two-step methodology that in the first step addresses the automatic creation of documentlevel semantic networks, then in the second step it designs methods that exploit such document representations to retrieve relevant cases from medical literature. For the automatic creation of documents’ semantic networks, we design a combination of information extraction techniques and relational learning models. Mining concepts and relations from text, information extraction techniques represent the core of the document-level semantic networks’ building process. On the other hand, relational learning models have the task of enriching the graph with additional connections that have not been detected by information extraction algorithms and strengthening the confidence score of extracted relations. For the retrieval of relevant medical literature, we investigate methods that are capable of comparing the documents’ semantic networks in terms of structure and semantics. The automatic extraction of semantic relations from documents, and their centrality in the creation of the documents’ semantic networks, represent our attempt to go one step further than previous graph-based approaches.

Thirty Years of Digital Libraries Research at the University of Padua: The User Side

Maristella Agosti, Giorgio Maria Di Nunzio, Nicola Ferro, Maria Maistro, Stefano Marchesin, Nicola Orio, Chiara Ponchia and Gianmaria Silvello
Nat. Conference Paper Proceedings of the 14th Italian Research Conference on Digital Libraries (IRCDL 2018) , Udine, Italy, January 25-26, 2018, pages 42-54.

Abstract

For the 30th anniversary of the Information Management Systems (IMS) research group of the University of Padua, we report the main and more recent contributions of the group that focus on the users in the field of Digital Library (DL). In particular, we describe a dynamic and adaptive environment for user engagement with cultural heritage collections, the role of log analysis for studying the interaction between users and DL, and how to model user behaviour.

Keyword-based access to relational data: To reproduce, or to not reproduce?

Alex Badan, Luca Benvegnù, Matteo Biasetton, Giovanni Bonato, Alessandro Brighente, Stefano Marchesin, Alberto Minetto, Leonardo Pellegrina, Alberto Purpura, Riccardo Simionato, Matteo Tessarotto, Andrea Tonon and Nicola Ferro
Nat. Conference Paper Proceedings of the 25th Italian Symposium on Advanced Database Systems (SEBD 2017), Squillace Lido (Catanzaro), Italy, June 25-29, 2017, pages 166-177.

Abstract

We investigate the problem of the reproducibility of keywordbased access systems to relational data. These systems address a challenging and important issue, i.e. letting users to access in natural language databases whose schema and instance are possibly unknown. However, neither there are shared implementations of state-of-the-art algorithms nor experimental results are easily replicable. We explore the difficulties in reproducing such systems and experimental results by implementing from scratch several state-of-the-art algorithms and testing them on shared datasets.

Towards open-source shared implementations of keyword-based access systems to relational data

Alex Badan, Luca Benvegnù, Matteo Biasetton, Giovanni Bonato, Alessandro Brighente, Alberto Cenzato, Piergiorgio Ceron, Giovanni Cogato, Stefano Marchesin, Alberto Minetto, Leonardo Pellegrina, Alberto Purpura, Riccardo Simionato, Nicolò Soleti, Matteo Tessarotto, Andrea Tonon, Federico Vendramin and Nicola Ferro
Int. Workshop Paper Proceedings of 1st EDBT/ICDT Workshop on Keyword-based Access and Ranking at Scale (KARS 2017) - Proceedings of the Workshops of the EDBT/ICDT 2017 Joint Conference (EDBT/ICDT 2017), Venice, Italy, March 21-24, 2017, pages 5.

Abstract

Keyword-based access systems to relational data address a challenging and important issue, i.e. letting users to exploit natural language to access databases whose schema and instance are possibly unknown. Unfortunately, there are almost no shared implementations of such systems and this hampers the reproducibility of experimental results. We explore the difficulties in reproducing such systems and share implementations of several state-of-the-art algorithms.

An Adaptive Cross-Site User Modelling Platform for Cultural Heritage Websites

Maristella Agosti, Séamus Lawless, Stefano Marchesin and Vincent Wade
Nat. Conference Paper Proceedings of the 13th Italian Research Conference on Digital Libraries (IRCDL 2017), Modena, Italy, January 26-27, 2017, pages 132-141.

Abstract

This paper discusses an adaptive cross-site user modelling platform for cultural heritage websites. The objective is to present the overall design of this platform that allows for information exchange techniques, which can be subsequently used by websites to provide tailored personalisation to users that request it. The information exchange is obtained by implementing a third party user model provider that, through the use of an API, interfaces with custom-built module extensions of websites based on the Web-based Content Management System (WCMS) Drupal. The approach is non-intrusive, not hindering the browsing experience of the user, and has a limited impact on the core aspects of the websites that integrate it. The design of the API ensures user’s privacy by not disclosing personal browsing information to non-authenticated users. The user can enable/disable the cross-site service at any time.