Filter by Year

Sort by Year

Empowering Digital Pathology Applications through Explainable Knowledge Extraction Tools

Stefano Marchesin, Fabio Giachelle, Niccolò Marini, Manfredo Atzori, Svetla Boytcheva, Genziana Buttafuoco, Francesco Ciompi, Giorgio Maria Di Nunzio, Filippo Fraggetta, Ornella Irrera, Henning Muller, Todor Primov, Simona Vatrano, and Gianmaria Silvello
Journal Paper Journal of Pathology Informatics, Volume 13, Issue 1, September 2022, 100139, 2022.

Abstract

Exa-scale volumes of medical data have been produced for decades. In most cases, the diagnosis is reported in free text, encoding medical knowledge that is still largely unexploited. In order to allow decoding medical knowledge included in reports, we propose an unsupervised knowledge extraction system combining a rule-based expert system with pre-trained Machine Learning (ML) models, namely the Semantic Knowledge Extractor Tool (SKET). Combining rule-based techniques and pre-trained ML models provides high accuracy results for knowledge extraction. This work demonstrates the viability of unsupervised Natural Language Processing (NLP) techniques to extract critical information from cancer reports, opening opportunities such as data mining for knowledge extraction purposes, precision medicine applications, structured report creation, and multimodal learning. SKET is a practical and unsupervised approach to extracting knowledge from pathology reports, which opens up unprecedented opportunities to exploit textual and multimodal medical information in clinical practice. We also propose SKET eXplained (SKET X), a web-based system providing visual explanations about the algorithmic decisions taken by SKET. SKET X is designed/developed to support pathologists and domain experts in understanding SKET predictions, possibly driving further improvements to the system.

Will open science change authorship for good? Towards a quantitative analysis

Andrea Mannocci, Ornella Irrera, and Paolo Manghi
Conference Paper Proc. 18th Italian Research Conference on Digital Libraries (IRCDL 2016)

Abstract

Authorship of scientific articles has profoundly changed from early science until now. If once upon a time a paper was authored by a handful of authors, scientific collaborations are much more prominent on average nowadays. As authorship (and citation) is essentially the primary reward mechanism according to the traditional research evaluation frameworks, it turned to be a rather hot-button topic from which a significant portion of academic disputes stems. However, the novel Open Science practices could be an opportunity to disrupt such dynamics and diversify the credit of the different scientific contributors involved in the diverse phases of the lifecycle of the same research effort. In fact, a paper and research data (or software) contextually published could exhibit different authorship to give credit to the various contributors right where it feels most appropriate. We argue that this can be computationally analysed by taking advantage of the wealth of information in model Open Science Graphs. Such a study can pave the way to understand better the dynamics and patterns of authorship in linked literature, research data and software, and how they evolved over the years.

Open Science and Authorship of Supplementary Material. Evidence from a research community.

Andrea Mannocci, Ornella Irrera and Paolo Manghi
Conference Paper 26th International Conference on Science, Technology and Innovation Indicators (STI 2022)

Abstract

Authorship of scientific articles has profoundly changed from early science until now. While once upon a time a paper was authored by a handful of authors, scientific collaborations are much more prominent on average nowadays. As authorship (and citation) is essentially the primary reward mechanism according to the traditional research evaluation frameworks, it turned out to be a rather hot-button topic from which a significant portion of academic disputes stems. However, the novel Open Science practices could be an opportunity to disrupt such dynamics and diversify the credit of the different scientific contributors involved in the diverse phases of the lifecycle of the same research effort. In fact, a paper and research data (or software) contextually published could exhibit different authorship to give credit to the various contributors right where it feels most appropriate. As a preliminary study, in this paper, we leverage the wealth of information contained in Open Science Graphs, such as OpenAIRE, and conduct a focused analysis on a subset of publications with supplementary material drawn from the European Marine Science (MES) research community. The results are promising and suggest our hypothesis is worth exploring further as we registered in 22% of the cases substantial variations between the authors participating in the publication and the authors participating in the supplementary dataset (or software), thus posing the premises for a longitudinal, large-scale analysis of the phenomenon.

An Open-Source Annotation Tool for Collaboratively Annotating Biomedical Documents

Ornella Irrera, Fabio Giachelle, and Gianmaria Silvello
Conference Paper Proc. 18th Italian Research Conference on Digital Libraries (IRCDL 2016)

Abstract

In recent years there has been a growing interest in developing techniques to effectively extract knowledge from biomedical textual documents. Many solutions rely on Named Entity Recognition and Linking (NER+L) which consists in detecting entities in text and disambiguating them through the use of knowledge bases. Despite its potential, applying this approach to the biomedical domain is limited by the lack of large annotated corpora useful to train Machine Learning (ML) models. Nowadays, it is difficult to find large sets of annotated data covering a wide range of biomedical sub-domains: the creation of annotated corpora in fact, is an expensive and time-consuming task usually performed by experts. To address this problem and ease and speed up the annotation process, we propose MedTAG, a web-based, collaborative, customizable annotation tool for biomedical documents; it is platform-independent and it provides a straightforward installation procedure.

DocTAG: A Customizable Annotation Tool for Ground Truth Creation

Fabio Giachelle, Ornella Irrera, and Gianmaria Silvello
Conference Paper In European Conference on Information Retrieval (pp. 288-293). Springer, Cham.

Abstract

We conduct an extensive evaluation of the proposed citation system by analyzing its effectiveness from the correctness and completeness viewpoints, showing that it represents a suitable solution that can be easily employed in real-world environments and that reduces human intervention on data to a minimum.

Visual Analytics for Information Retrieval Evaluation Campaigns

Fabio Giachelle, Ornella Irrera, and Gianmaria Silvello
Journal Paper BMC Medical Informatics and Decision Making volume 21, Article number: 352 (2021)

Abstract

Background
Semantic annotators and Natural Language Processing (NLP) methods for Named Entity Recognition and Linking (NER+L) require plenty of training and test data, especially in the biomedical domain. Despite the abundance of unstructured biomedical data, the lack of richly annotated biomedical datasets poses hindrances to the further development of NER+L algorithms for any effective secondary use. In addition, manual annotation of biomedical documents performed by physicians and experts is a costly and time-consuming task. To support, organize and speed up the annotation process, we introduce MedTAG, a collaborative biomedical annotation tool that is open-source, platform-independent, and free to use/distribute.

Results
We present the main features of MedTAG and how it has been employed in the histopathology domain by physicians and experts to annotate more than seven thousand clinical reports manually. We compare MedTAG with a set of well-established biomedical annotation tools, including BioQRator, ezTag, MyMiner, and tagtog, comparing their pros and cons with those of MedTag. We highlight that MedTAG is one of the very few open-source tools provided with an open license and a straightforward installation procedure supporting cross-platform use.

Conclusions
MedTAG has been designed according to five requirements (i.e. available, distributable, installable, workable and schematic) defined in a recent extensive review of manual annotation tools. Moreover, MedTAG satisfies 20 over 22 criteria specified in the same study.

Background Linking: Joining Entity Linking with Learning to Rank Models

Ornella Irrera, and Gianmaria Silvello
Conference Paper Proc. 17th Italian Research Conference on Digital Libraries (IRCDL 2016)

Abstract

The recent years have been characterized by a strong democratization of news production on the web. In this scenario it is rare to find self-contained news articles that provide useful background and context information. The problem of finding information providing context to news articles has been tackled by the Background Linking task of the TREC News Track. In this paper, we propose a system to address the background linking task. Our system relies on LambdaMART learning to rank algorithm trained on classic textual features and on entity-based features. The idea is that the entities extracted from the documents as well as their relationships provide valuable context to the documents. We analyzed how this idea can be used to improve th