Filter by Type

Filter by Year

Sort by Year

Terminologie numérique : conception, représentation et gestion

Federica Vezzani
BookLINGUISTIC INSIGHTS, Peter Lang, Bern 2022.

Abstract

Cet ouvrage se consacre à la notion de terminologie numérique considérée comme une approche de la discipline impliquant la représentation numérique d’informations conceptuelles et linguistiques d’un domaine spécifique. L’objectif est l’illustration des étapes de conception et d’implémentation de base de données terminologiques multilingues permettant le respect des meilleures pratiques dans la gestion des données terminologiques du numérique. Pour ce faire, l’ouvrage met en exergue les nouvelles compétences du terminologue à l’ère numérique. Celles-ci trouvent leur véritable essence dans l’esprit interdisciplinaire et collaboratif de la recherche.

Pour une étude de la terminologie médicale de Proust : rétro-numérisation et analyse de la Correspondance avec sa mère

Ludovico Monaci and Federica Vezzani
Journal PaperL'ANALISI LINGUISTICA E LETTERARIA, 31(2), pp. 127-140, 2022.

Abstract

This paper aims to highlight the potential of a terminological analysis approach to literary texts. We present a work of retro-digitization and terminological study of a corpus of letters exchanged between Proust and his mother. The identified medical terminolog y constitutes the object of investigation to illustrate how specialized lexical units emigrate from intimate writings to join the experimental laboratory of the novel. The illustrated case study will focus on the term “trional” and its evolution from the Correspondance to the Recherche.

Entre TBX et Ontolex-Lemon : Quelles Nouvelles Perspectives en Terminologie?

Silvia Piccini, Federica Vezzani, and Andrea Bellandi
Conference PaperIn Proceedings of the 1st International Conference on Multilingual Digital Terminology Today (MDTT 2022).

Abstract

Cet article porte sur une analyse contrastive multi-niveaux des technologies à la base de TBX et de Ontolex-lemon afin de modéliser les données terminologiques multilingues au sein de ressources terminologiques.

La traduction médicale : un panorama de ressources terminologiques multilingues

Federica Vezzani
Book chapterApproches linguistiques contemporaines de la traduction, Artois Presses Université, pp. 129-144, 2022.

Abstract

La traduction médicale nécessite, comme tous les processus de traduction spécialisée, une étude systématique de la terminologie utilisée pour véhiculer les messages technico-scientifiques. Cet article porte sur la description d’un nouveau modèle de fiche terminologique spécifiquement conçu et formulé pour l'implémentation de la ressource terminologique multilingue TriMED pour le domaine médical. La fiche proposée visa à l'exhaustivité afin de fournir une image complète du comportement morphosyntaxique, sémantique et phraséologique du terme source et de son traduisant.

Elaborazione e gestione di (meta) dati terminologici

Federica Vezzani and Giorgio Maria Di Nunzio
Book chapterRisorse e strumenti per l’elaborazione e la diffusione della terminologia, Eurac Research, pp. 152-168, 2022.

Abstract

The optimal organization of terminological (meta) data is an indispensable practice in the design and implementation of language resources. In this paper, we describe a methodology for the structural standardization of terminological resources based on the application of de jure standards developed by the ISO TC 37/SC 3 in order to ensure the FAIRness of terminological data. In this regard, we describe a project, recently launched by the University of Padua, which adopts the proposed paradigm in order to create the CAMEO multilingual terminological database for the commercial domain. This resource aims to be a valid standardized linguistic support for two categories of text professionals (technical communicators and specialized translators) dealing with monolingual and multilingual commercial product documentation.

Knowledge Representation and Language Simplification of Human Rights

Sara Silecchia, Federica Vezzani, and Giorgio Maria Di Nunzio
Workshop PaperIn Proceedings of the LREC 2022 Workshop on Terminology in the 21st century: many faces, many places (Term21) (LREC 2022).

Abstract

In this paper, we propose the description of a very recent interdisciplinary project aiming at analysing both the conceptual and linguistic dimensions of human rights terminology. This analysis will result in the form of a new knowledge-based multilingual terminological resource which is designed in order to meet the FAIR principles for Open Science and will serve, in the future, as a prototype for the development of a new software for the simplified rewriting of international legal texts relating to human rights, in order to facilitate their comprehension for non-expert people. Given the early stage of the project, we will focus on the description of its rationale, the planned workflow, and the theoretical approach which will be adopted to achieve the main goal of this ambitious research project.

Preliminary considerations on a systematic approach to semic analysis: The case study of medical terminology

Vanessa Bonato, Giorgio Maria Di Nunzio, and Federica Vezzani
Journal PaperUmanistica Digitale, 10 , pp. 211-234, 2021.

Abstract

Semic analysis is a linguistic technique aimed at capturing the essential specificities of terms meaning through the identification of minimum semantic units. This procedure is functional for the achievement of an in-depth comprehension of technical terminology and the acquisition of a specialised conceptual knowledge. In this paper, we focus on semic analysis applied to medical terminology. In particular, we discuss some preliminary considerations in order to establish the starting points for a systematic approach to semic analysis. Firstly, we propose a preliminary experiment to 1) study users’ perception of semic analysis and 2) validate the absence of systematicity in its performance. Based on the resulting data, we secondly propose a methodology aiming at increasing the systematic factorisation of semic analysis. Finally, we propose an experimental study to investigate on the potential interrelation in terms of applicability and productivity of Word Embeddings with respect to semic analysis in the framework of the proposed methodological criteria.

La ressource FAIRterm: entre pratique pédagogique et professionnalisation en traduction spécialisée

Federica Vezzani
Journal PaperSYNERGIES ITALIE, 17, pp. 51-64, 2021.

Abstract

This study aims at describing a new linguistic product developed in order to support pedagogical practice and professionalization in specialized translation. The resource, named FAIRterm, is configured as a collection of multilingual terminological records to assist the process of decoding and transcoding the terminology of a given domain. The tool is designed in compliance with the ISO standards in force in terms of terminology management. For its validation, we propose the description of a pedagogical experiment conducted for the Italian-French working languages in the perspective of professionalization of translation learners in the oenological domain.

One Size Fits All: A Conceptual Data Model for Any Approach to Terminology

Giorgio Maria Di Nunzio and Federica Vezzani
Workshop PaperIn the Proceedings of the Workshop TOTh 2021. Terminology, Interoperability and Data integration: Issues and Challenges (2021).

Abstract

In this paper, we want to speculate about the possibility to model all the currently known/proposed approaches to terminology into a single schema. We will use the Entity-Relationship (ER) diagram as our tool for the conceptual data model of the problem and to express the associations between the objects of the study.We will analyse the onomasiological and semasiological approaches, the ontoterminology paradigm, and the frame-based model, and we will draw the consequences in terms of the conceptual data model. The result of this discussion will be used as the basis of the next step of the data organization in terms of standardized terminological records and Linked Data.

On the Reusability of Terminological Data

Giorgio Maria Di Nunzio and Federica Vezzani
Conference PaperIn Proceedings of the 10th AIUCD Conference (Associazione per l'Informatica Umanistica e la Cultura Digitale), AlmaDL Journals (AIUCD 2021).

Abstract

Reusability of data is one of the most important practices in science, and investments in this (underestimated) operation may have positive long-term consequences in research (Pasquetto et al. 2017). In this paper, we discuss the benefits of this approach in terminology management by presenting a methodology for the preservation of multilingual terminological records and the practice of standardization as a fundamental step towards reusability. We present a case study to show the effectiveness of this methodology on an obsolete Website containing a multilingual medical glossary, and we share the source code as well as the standardized dataset.

IMS-UNIPD @ CLEF eHealth Task 2: Reciprocal ranking fusion in CHS

Giorgio Maria Di Nunzio and Federica Vezzani
Conference PaperIn 2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum (CLEF-WN 2021).

Abstract

In this paper, we describe the results of the participation of the Information Management Systems (IMS) group at CLEF eHealth 2021 Task 2, Consumer Health Search Task. We participated in the three subtasks: Ad-hoc IR, Weakly Supervised IR, Document credibility. The goal of our work was to evaluate the reciprocal ranking fusion approach over 1) manual query variants; 2) different retrieval functions; 3) w/out pseudo-relevance feedback; 4) reciprocal ranking fusion.

La connotation du vocabulaire somatique : une étude de cas comparative bilingue en oncologie

Federica Vezzani
Conference PaperIn Actes du colloque Lexique et corps humain (2021).

Abstract

Management of Diastratic Variation in Medical Settings: On the Collaboration Within the YourTerm MED Project

Elisa Callegari, Giorgio Maria Di Nunzio, Rodolfo Maslias, and Federica Vezzani
Conference PaperIn the European Association for Terminology's (EAFT) Summit (EAFT 2021).

Abstract

Findings of the WMT 2021 Biomedical Translation Shared Task: Summaries of Animal Experiments as New Test Set

Yeganova, Lana, Dina Wiemann, Mariana Neves, Federica Vezzani, Amy Siu, Iñigo Unanue, Maite Oronoz et al.
Conference PaperIn the Proceedings of the 6th Conference on Machine Translation (WMT 2021).

Abstract

In the sixth edition of the WMT Biomedical Task, we addressed a total of eight language pairs, namely English/German, English/French, English/Spanish, English/Portuguese, English/Chinese, English/Russian, English/Italian, and English/Basque. Further, our tests were composed of three types of textual test sets. New to this year, we released a test set of summaries of animal experiments, in addition to the test sets of scientific abstracts and terminologies. We received a total of 107 submissions from 15 teams from 6 countries.

Methodology for the standardization of terminological resources. Design of TriMED database to support multi-register medical communication

Federica Vezzani and Giorgio Maria Di Nunzio
Journal PaperTerminology. International Journal of Theoretical and Applied Issues in Specialized Communication. Vol 26 (2), 2020.

Abstract

Terminology standardization reflects two different aspects involving the meaning of terms and the structure of terminological resources. In this paper, we focus on the structural aspect of standardization and we present the work of re-modeling TriMED, a multilingual terminological database conceived to support multi-register medical communication. In particular, we provide a general methodology to make the termbase compliant to three of the most recent ISO/TC 37 standards. We focus on the definition of (i) the structural meta-model of the resource, (ii) the provided data categories and its Data Category Repository, and (iii) the TBX format for its implementation. In particular, we provide a general methodology to make the termbase compliant to three of the most recent ISO/TC 37 standards. We focus on the definition of (i) the structural meta-model of the resource, (ii) the provided data categories and its Data Category Repository, and (iii) the TBX format for its implementation.

On the Formal Standardization of Terminology Resources: The Case Study of TriMED

Federica Vezzani and Giorgio Maria Di Nunzio
Conference PaperIn the proceedings of the 12th Edition of the Language Resources and Evaluation Conference, Marseille, France, 2020 (LREC 2020).

Abstract

The process of standardization plays an important role in the management of terminological resources. In this context, we present the work of re-modeling an existing multilingual terminological database for the medical domain, named TriMED. This resource was conceived in order to tackle some problems related to the complexity of medical terminology and to respond to different users’ needs. We provide a methodology that should be followed in order to make a termbase compliant to the three most recent ISO/TC 37 standards. In particular, we focus on the definition of i) the structural meta-model of the resource, ii) the data categories provided, and iii) the TBX format for its implementation. In addition to the formal standardization of the resource, we describe the realization of a new data category repository for the management of the TriMED terminological data and a Web application that can be used to access the multilingual terminological records.

Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages

Bawden, Rachel, Giorgio Di Nunzio, Cristian Grozea, Iñigo Unanue, Antonio Yepes, Nancy Mah, David Martinez et al.
Conference PaperIn Proceedings of the Fifth Conference on Machine Translation (2020).

Abstract

Machine translation of scientific abstracts and terminologies has the potential to support health professionals and biomedical researchers in some of their activities. In the fifth edition of the WMT Biomedical Task, we addressed a total of eight language pairs. Five language pairs were previously addressed in past editions of the shared task, namely, English/German, English/French, English/Spanish, English/Portuguese, and English/Chinese. Three additional languages pairs were also introduced this year: English/Russian, English/Italian, and English/Basque. The task addressed the evaluation of both scientific abstracts (all language pairs) and terminologies (English/Basque only). We received submissions from a total of 20 teams. For recurring language pairs, we observed an improvement in the translations in terms of automatic scores and qualitative evaluations, compared to previous years.

A Study on Reciprocal Ranking Fusion in Consumer Health Search. MS UniPD at CLEF eHealth 2020 Task 2

Giorgio Maria Di Nunzio, Stefano Marchesin, and Federica Vezzani
Conference PaperIn the Proceedings of the 11th Conference and Labs of the Evaluation Forum (CLEF 2020).

Abstract

In this paper, we describe the results of the participation of the Information Management Systems (IMS) group at CLEF eHealth 2020 Task 2, Consumer Health Search Task. In particular, we participated in both subtasks: Ad-hoc IR and Spoken queries retrieval. The goal of our work was to evaluate the reciprocal ranking fusion approach over 1) different query variants; 2) different retrieval functions; 3) w/out pseudo-relevance feedback. The results show that, on average, the best performances are obtained by a ranking fusion approach together with pseudo-relevance feedback.

(Not so) Elementary, my dear Watson! A different perspective on medical terminology.

Federica Vezzani and Giorgio Maria Di Nunzio
Journal PaperUmanistica Digitale. Special Issue on The Literature-Linguistics Interface. Bridging the Gap Between Qualitative and Quantitative Approaches to Literary Texts. No. 6, 2019.

Abstract

Sir. Arthur Conan Doyle was an esteemed and highly experienced physician and much of his medical knowledge spreads into his literary works. In this paper, we propose to study the medical terminology in the stories of Sherlock Holmes through the combination of a mixed method of quantitative and qualitative analysis. Our approach is based on 1) the automatic extraction of medical terminology throughthetidytextR package for text analyses, 2) a terminological analysis by means of the model of terminological record designed fortheTriMED database, and 3) the study of collocations through the linguistic tool Sketch Engine. Thanks to this approach, we perform a linguistic analysis in order to evaluate different terminological aspects such as: the semantic variation due to temporal and historical factors, the difference of the context of use, the change of meaning based on the reference corpus, the variation of use depending on speakers/writers register and, finally, the relationship between terms and their collocations from the syntactic viewpoint.

On the Use of Terminological Records in Specialised Translation

Federica Vezzani and Giorgio Maria Di Nunzio
Conference PaperIn the proceedings of the Associazione per l'Informatica Umanistica e le Culture Digitali, Udine, Italy, 2019 (AIUCD 2019).

Abstract

In this paper, we focus on the teaching of specialised translation and, in particular, on the preliminary phase of the translation process which is based on a broad and systematic work on the terminology of the micro-language considered. We present a new model of bilingual terminological record, as a digital tool supporting the process of translation of medical documents. Finally, we describe the results of a set of experiments which we have run since 2017 with two groups of students of the master’s degree of the University of Padua.

I principi FAIR nell’attività terminologica

Federica Vezzani
Conference PaperIn XXIX Convegno Ass.I.Term (Associazione italiana per la Terminologia) (2019).

Abstract

Computational Terminology in eHealth

Federica Vezzani and Giorgio Maria Di Nunzio
Book chapterDigital Libraries: Supporting Open Science - 15th Italian Research, Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January, 31 - February 1, 2019, Proceedings, Springer, pp. 72-85, 2019.

Abstract

In this paper, we present a methodology for the development of a new eHealth resource in the context of Computational Terminol- ogy. This resource, named TriMED, is a digital library of terminological records designed to satisfy the information needs of different categories of users within the healthcare field: patients, language professionals and physicians. TriMED offers a wide range of information for the purpose of simplification of medical language in terms of understandability and readability. Finally, we present two applications of our resource in or- der to conduct different types of studies in particular in Information Retrieval and Literature Analysis.

La technicité des termes : le v-tech comme paramètre d’évaluation

Federica Vezzani
Book chapter Terminologie & Ontologie : Théories et Applications Actes de la conférence (TOTh 2019), Presses Universitaires Savoie Mont Blanc, pp. 215-227.

Abstract

Dans cette étude, nous proposons une perspective neuve du concept de poids des termes techniques en nous concentrant sur la notion de « technicité » comme propriété sémantique de l’unité linguistique elle-même. L’idée de base est que la valeur de technicité d’un terme est inversement proportionnelle à sa nature polysémique. Nous formalisons la formule v-tech et effectuons une évaluation expérimentale afin de 1) comparer la valeur v-tech avec d’autres mesures de termhood (termicité ou termitude) généralement calculées sur la fréquence d’occurrence des termes dans les collections, et 2) intégrer la formule v-tech dans le score d’un modèle de récupération de documents pertinents pour un travail de revue systématique dans le domaine médical.

Aménagement de la terminologie spontanée : un cas de collocation

Federica Vezzani
Book chapterIn "Convergences et divergences dans la pratique terminologique. De la terminologie spontanée à la terminologie aménagée", Délégation générale à la langue française et aux langues de France, Ministère de la Culture, pp. 163-173, 2019.

Abstract

Cette étude porte sur les critères de normalisation de certaines formes phraséologiques standardisées dans la terminologie médicale. Nous présenterons une méthodologie d’analyse basée sur la réalisation de fiches terminologiques à partir de la ressource multilingue TriMED afin d’identifier les termes techniques qui se sont cristallisés dans l’usage fréquent du langage médical, mais qui ne respectent pas nécessairement le critère de correction linguistique proposé par la norme ISO 704 : 2009.

TriMED: banca dati terminologica multilingue

Federica Vezzani Giorgio Maria Di Nunzio and Geneviève Henrot
Conference PaperIn Associazione per l'Informatica Umanistica e le Culture Digitali, Bari, Italy, 2018 (AIUCD 2018).

Abstract

Tre categorie di persone si confrontano con la complessità del linguaggio medico, ciascuna con le proprie esigenze di rimedio: medici, traduttori tecnico scientifici e pazienti. Il presente lavoro propone di elaborare uno strumento che contribuisca a porre rimedio all’opacità che caratterizza la comunicazione in ambito medico tra i suoi vari attori: soddisfare la comunicazione tra pari, fornire una risorsa regolarmente aggiornata ai traduttori tecnico scientifici e facilitare la comprensione delle informazioni da parte del grande pubblico: una risorsa terminologico-fraseologica multilingue. La banca dati si compone di schede terminologiche progettate per creare un ponte fra i vari registri individuati (specialistico, semi-specialistico, non specialistico) nelle lingue considerate. Limitatamente al settore oncologico dei trattamenti per il cancro al seno, i termini da trattare sono estratti da un corpus in lingua inglese, corredati di tutte le informazioni e le proprietà linguisticamente rilevanti, e ricollegati al loro equivalente pragmatico in italiano e in francese.

A Gamified Approach to Naïve Bayes Classification: A Case Study for Newswires and Systematic Medical Reviews

Giorgio Maria Di Nunzio, Maria Maistro and Federica Vezzani
Workshop PaperIn the proceedings of the Web Conference, Lyon, France, 2018 (WWW 2018).

Abstract

Supervised machine learning algorithms require a set of labelled examples to be trained; however, the labelling process is a costly and time consuming task which is carried out by experts of the domain who label the dataset by means of an iterative process to filter out non-relevant objects of the dataset. In this paper, we describe a set of experiments that use gamification techniques to transform this labelling task into an interactive learning process where users can cooperate in order to achieve a common goal. To this end, first we use a geometrical interpretation of Na\"ive Bayes (NB) classifiers in order to create an intuitive visualization of the current state of the system and let the user change some of the parameters directly as part of a game. We apply this visualization technique to the classification of newswire and we report the results of the experiments conducted with different groups of people: PhD students, Master Degree students and general public. Then, we present a preliminary experiment of query rewriting for systematic reviews in a medical scenario, which makes use of gamification techniques to collect different formulation of the same query. Both the experiments show how the exploitation of gamification approaches help to engage the users in abstract tasks that might be hard to understand and/or boring to perform.

TriMED: A Multilingual Terminological Database

Federica Vezzani, Giorgio Maria Di Nunzio and Geneviève Henrot
Conference PaperIn the proceedings of the 11th Edition of the Language Resources and Evaluation Conference, Miyazaky, Japan, 2018 (LREC 2018).

Abstract

Three precise categories of people are confronted with the complexity of medical language: physicians, patients and scientific translators. The purpose of this work is to develop a methodology for the implementation of a terminological tool that contributes to solve problems related to the opacity that characterizes communication in the medical field among its various actors. The main goals are: i) satisfy the peer-to-peer communication, ii) facilitate the comprehension of medical information by patients, and iii) provide a regularly updated resource for scientific translators. We illustrate our methodology and its application through the description of a multilingual terminological-phraseological resource named TriMED. This terminological database will consist of records designed to create a terminological bridge between the various registers (specialist, semi-specialist, non-specialist) as well as across the languages considered. In this initial analysis, we restricted to the field of breast cancer, and the terms to be analyzed will be extracted from a corpus in English, accompanied by all relevant linguistic information and properties, and re-attached to their pragmatic equivalent in Italian and French.

A Study on Manual Query Reformulation for Systematic Medical Reviews

Giorgio Maria Di Nunzio and Federica Vezzani
Workshop PaperIn the proceedings of the 9th edition of the Italian Information Retrieval Workshop, Rome, Italy, 2018 (IIR 2018).

Abstract

Technology-Assisted Review (TAR) approaches are essential to minimize the effort of the user during the search and collect all relevant documents. In this paper, we present a failure analysis based on terminological and linguistic aspects of a TAR system for systematic medical reviews. In particular, we analyze the results of the worst performing topics of the best experiments of the CLEF 2017 eHealth task on Technologically Assisted Reviews in Empirical Medicine. This is an extended abstract of the work presented in [2, 4].

(Not so) Elementary, my dear Watson! A different perspective on medical terminology

Federica Vezzani Giorgio Maria Di Nunzio and Geneviève Henrot
Conference PaperIn Bridging Gaps, Creating Links - The Qualitative-Quantitative Interface in the Study of Literature, Padua, Italy, 2018.

Abstract

This contribution proposes to provide an overview of the syntactic and semantic behavior of medical terms in the literary works of Conan Doyle. The object of study is the analysis of the scientific terms in the stories of Sherlock Holmes through the model of terminological record set out in a multilingual terminological database (TriMED) implemented for the linguistic analysis of technical medical terms. After the semi- automatic extraction of English technical terms and the realization of the terminological records for each of them, we have analyzed different aspects such as: the semantic variation due to temporal and historical factors, the difference of the context of use, the change of meaning based on the reference corpus, the variation of use depending on speakers/writers register and, finally, the relationship between terms and their collocations from the syntactic viewpoint. After presenting our methodology and discussing the results of this analysis, we will provide some preliminary insights related to a comparative study between the linguistic aspects of the English medical term and its equivalent in the Italian version.

Interactive Sampling for Systematic Reviews. IMS Unipd At CLEF 2018 eHealth Task 2

Giorgio Maria Di Nunzio, Giacomo Ciuffreda and Federica Vezzani
Workshop PaperIn Conference and Labs of the Evaluation Forum, (Working Notes), Avignon, France, 2018 (CLEF 2018).

Abstract

This is the second participation of the Information Management Systems (IMS) group at CLEF eHealth Task of Technologically Assisted Reviews in Empirical Medicine. This task focuses on the problem of medical systematic reviews, a problem which requires a recall close (if not equal) to 100%. Semi-Automated approaches are essential to support these type of searches when the amount of data exceed the limits of users, i.e. in terms of attention or patience. We present a variation of the two-dimensional approach which 1) sets the maximum amount of documents that the physician is willing to read, 2) takes into account a sampling strategy to estimate the 95% confidence interval of the number of relevant documents present in the collection.

Using R Markdown for Replicable Experiments in Evidence Based Medicine

Giorgio Maria Di Nunzio and Federica Vezzani
Conference PaperIn the proceedings of Conference and Labs of the Evaluation Forum, Avignon, France, 2018 (CLEF 2018).

Abstract

In this paper, we propose a methodology based on the R Markdown framework for replicating an experiment of query rewriting in the context of medical eHealth. We present a study on how to re-propose the same task of systematic medical reviews with the same conditions and methodologies to a larger group of participants. The task is the CLEF eHealth Task Technologically Assisted Reviews in Empirical Medicine which consists in finding all the most relevant medical documents, given an information need, with the least effort. We study how lay people, students of a master degree in languages in this case, can help the retrieval system in finding more relevant documents by means of a query rewriting approach.

A Linguistic Failure Analysis of Classification of Medical Publications: A Study on Stemming vs Lemmatization

Giorgio Maria Di Nunzio and Federica Vezzani
Conference PaperIn the proceedings of the 5th Italian Conference on Computational Linguistics, Turin, Italy, 2018 (CliC-it 2018).

Abstract

Technology-Assisted Review (TAR) systems are essential to minimize the effort of the user during the search and retrieval of relevant documents for a specific information need. In this paper, we present a failure analysis based on terminological and linguistic aspects of a TAR system for systematic medical reviews. In particular, we analyze the results of the worst performing topics in terms of recall using the dataset of the CLEF 2017 eHealth task on TAR in Empirical Medicine.

An Interactive Two-Dimensional Approach to Query Aspects Rewriting in Systematic Reviews. IMS Unipd At CLEF eHealth Task 2

Giorgio Maria Di Nunzio, Federica Beghini, Federica Vezzani and Geneviève Henrot
Workshop PaperIn Conference and Labs of the Evaluation Forum, (Working Notes), Dublin, Ireland, 2017 (CLEF 2017).

Abstract

In this paper, we describe the participation of the Information Management Systems (IMS) group at CLEF eHealth 2017 Task 2. This task focuses on the problem of systematic reviews, that is articles that summarise all evidence that is published regarding a certain medical topic. This task, known in Information Retrieval as the total recall problem, requires long and tedious search sessions by experts in the field of medicine. Automatic (or semi-automatic) approaches are essential to support these type of searches when the amount of data exceed the limits of users, i.e. in terms of attention or patience. We present the two-dimensional probabilistic version of BM25 with explicit relevance feedback together with a query aspect rewriting approach for both the simple evaluation and the cost-effective evaluation.

A Lexicon Based Approach to Classification of ICD10 Codes. IMS Unipd at CLEF eHealth Task 1

Giorgio Maria Di Nunzio, Federica Beghini, Federica Vezzani and Geneviève Henrot
Workshop PaperIn Conference and Labs of the Evaluation Forum, (Working Notes), Dublin, Ireland, 2017 (CLEF 2017).

Abstract

In this paper, we describe the participation of the Informa- tion Management Systems (IMS) group at CLEF eHealth 2017 Task 1. In this task, participants are required to extract causes of death from death reports (in French and in English) and label them with the correct Inter- national Classification Diseases (ICD10) code. We tackled this task by focusing on the replicability and reproducibility of the experiments and, in particular, on building a basic compact system that produces a clean dataset that can be used to implement more sophisticated approaches.

A Reproducible Approach with R Markdown to Automatic Classification of Medical Certificates in French

Giorgio Maria Di Nunzio, Federica Beghini, Federica Vezzani and Geneviève Henrot
Conference PaperIn the proceedings of the 4th Italian Conference on Computational Linguistics, Rome, Italy, 2017 (CliC-it 2017).

Abstract

In this paper, we report the ongoing developments of our first participation to the Cross-Language Evaluation Forum (CLEF) eHealth Task 1: “Multilingual Information Extraction - ICD10 coding” (Névéol et al., 2017). The task consists in labelling death certificates, in French with international standard codes. In particular, we wanted to accomplish the goal of the ‘Replication track’ of this Task which promotes the sharing of tools and the dissemination of solid, reproducible results.