Falls involve dynamic risk factors that change over time, but most studies on fall-risk factors are cross-sectional and do not capture this temporal aspect. The longitudinal clinical notes within electronic health records (EHR) provide an opportunity to analyse fall risk factor trajectories through Natural Language Processing techniques, specifically dynamic topic modelling (DTM). This study aims to uncover fall-related topics for new fallers and track their evolving trends leading up to falls.This case–cohort study utilised primary care EHR data covering information on older adults between 2016 and 2019. Cases were individuals who fell in 2019 but had no falls in the preceding three years (2016–18). The control group was randomly sampled individuals, with similar size to the cases group, who did not endure falls during the whole study follow-up period. We applied DTM on the clinical notes collected between 2016 and 2018. We compared the trend lines of the case and control groups using the slopes, which indicate direction and steepness of the change over time.A total of 2,384 fallers (cases) and an equal number of controls were included. We identified 25 topics that showed significant differences in trends between the case and control groups. Topics such as medications, renal care, family caregivers, hospital admission/discharge and referral/streamlining diagnostic pathways exhibited a consistent increase in steepness over time within the cases group before the occurrence of falls.Early recognition of health conditions demanding care is crucial for applying proactive and comprehensive multifactorial assessments that address underlying causes, ultimately reducing falls and fall-related injuries.
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models
Ilker Kesen , Andrea Pedrotti , Mustafa Dogan , Michele Cafagna , Emre Can Acikgoz , Letitia Parcalabescu , Iacer Calixto , Anette Frank , and 3 more authors
In The Twelfth International Conference on Learning Representations , Feb 2024
Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to fine-tune a chat summarization model, effectively transferring knowledge from the large LLM into a smaller specialized model. We demonstrate our method on the TWEETSUMM dataset, and show that using 10% of the original labelled data set we can achieve 65.9/57.0/61.0 ROUGE-1/-2/-L, whereas the current state-of-the-art trained on the entire training data set obtains 65.16/55.81/64.37 ROUGE-1/-2/-L. In other words, in the worst case (i.e., ROUGE-L) we still effectively retain 94.7% of the performance while using only 10% of the data.
Leveraging Multi-Word Concepts to Predict Acute Kidney Injury in Intensive Care
Lorenzo Brancato , Iacer Calixto , Ameen Abu-Hanna , and Iacopo Vagliano
Acute kidney injury (AKI) is an abrupt decrease in kidney function widespread in intensive care. Many AKI prediction models have been proposed, but only few exploit clinical notes and medical terminologies. Previously, we developed and internally validated a model to predict AKI using clinical notes enriched with single-word concepts from medical knowledge graphs. However, an analysis of the impact of using multi-word concepts is lacking. In this study, we compare the use of only the clinical notes as input to prediction to the use of clinical notes retrofitted with both single-word and multi-word concepts. Our results show that 1) retrofitting single-word concepts improved word representations and improved the performance of the prediction model; 2) retrofitting multi-word concepts further improves both results, albeit slightly. Although the improvement with multi-word concepts was small, due to the small number of multi-word concepts that could be annotated, multi-word concepts have proven to be beneficial.
Video-and-Language (VidL) models and their cognitive relevance
Anne Zonneveld , Albert Gatt , and Iacer Calixto
In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops , Oct 2023
In this paper we give a narrative review of multi-modal video-language (VidL) models. We introduce the current landscape of VidL models and benchmarks, and draw inspiration from neuroscience and cognitive science to propose avenues for future research in VidL models in particular and artificial intelligence (AI) in general. We argue that iterative feedback loops between AI, neuroscience, and cognitive science are essential to spur progress across these disciplines. We motivate why we focus specifically on VidL models and their benchmarks as a promising type of model to bring improvements in AI and categorise current VidL efforts across multiple’cognitive relevance axioms’. Finally, we provide suggestions on how to effectively incorporate this interdisciplinary viewpoint into research on VidL models in particular and AI in general. In doing so, we hope to create awareness of the potential of VidL models to narrow the gap between neuroscience, cognitive science, and AI.
Drug-related causes attributed to acute kidney injury and their documentation in intensive care patients
Rachel M. Murphy , Dave A. Dongelmans , Izak Yasrebi-de Kom , Iacer Calixto , Ameen Abu-Hanna , Kitty J. Jager , Nicolette F. de Keizer , and Joanna E. Klopotowska
Purpose To investigate drug-related causes attributed to acute kidney injury (DAKI) and their documentation in patients admitted to the Intensive Care Unit (ICU). Methods This study was conducted in an academic hospital in the Netherlands by reusing electronic health record (EHR) data of adult ICU admissions between November 2015 to January 2020. First, ICU admissions with acute kidney injury (AKI) stage 2 or 3 were identified. Subsequently, three modes of DAKI documentation in EHR were examined: diagnosis codes (structured data), allergy module (semi-structured data), and clinical notes (unstructured data). Results n total 8124 ICU admissions were included, with 542 (6.7%) ICU admissions experiencing AKI stage 2 or 3. The ICU physicians deemed 102 of these AKI cases (18.8%) to be drug-related. These DAKI cases were all documented in the clinical notes (100%), one in allergy module (1%) and none via diagnosis codes. The clinical notes required the highest time investment to analyze. Conclusions Drug-related causes comprise a substantial part of AKI in the ICU patients. However, current unstructured DAKI documentation practice via clinical notes hampers our ability to gain better insights about DAKI occurrence. Therefore, both automating DAKI identification from the clinical notes and increasing structured DAKI documentation should be encouraged.
Soft-Prompt Tuning to Predict Lung Cancer Using Primary Care Free-Text Dutch Medical Notes
We examine the use of large Transformer-based pretrained language models (PLMs) for the problem of early prediction of lung cancer using free-text patient medical notes of Dutch primary care physicians. Specifically, we investigate: 1) how soft prompt-tuning compares to standard model fine-tuning; 2) whether simpler static word embedding models (WEMs) can be more robust compared to PLMs in highly imbalanced settings; and 3) how models fare when trained on notes from a small number of patients. All our code is available open source in https://bitbucket.org/aumc-kik/prompt_tuning_cancer_prediction/.
SemEval-2023 Task 1: Visual Word Sense Disambiguation
Alessandro Raganato , Iacer Calixto , Asahi Ushio , Jose Camacho-Collados , and Mohammad Taher Pilehvar
In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023) , Jul 2023
This paper presents the Visual Word Sense Disambiguation (Visual-WSD) task. The objective of Visual-WSD is to identify among a set of ten images the one that corresponds to the intended meaning of a given ambiguous word which is accompanied with minimal context. The task provides datasets for three different languages: English, Italian, and Farsi.We received a total of 96 different submissions. Out of these, 40 systems outperformed a strong zero-shot CLIP-based baseline. Participating systems proposed different zero- and few-shot approaches, often involving generative models and data augmentation. More information can be found on the task’s website: }urlhttps://raganato.github.io/vwsd/.
Fixing confirmation bias in feature attribution methods via semantic match
Giovanni Cinà , Daniel Fernandez-Llaneza , Nishant Mishra , Tabea E Röber , Sandro Pezzelle , Iacer Calixto , Rob Goedhart , and Ş İlker Birbil
Feature attribution methods have become a staple method to disentangle the complex behavior of black box models. Despite their success, some scholars have argued that such methods suffer from a serious flaw: they do not allow a reliable interpretation in terms of human concepts. Simply put, visualizing an array of feature contributions is not enough for humans to conclude something about a model’s internal representations, and confirmation bias can trick users into false beliefs about model behavior. We argue that a structured approach is required to test whether our hypotheses on the model are confirmed by the feature attributions. This is what we call the "semantic match" between human concepts and (sub-symbolic) explanations. Building on the conceptual framework put forward in Cinà et al. [2023], we propose a structured approach to evaluate semantic match in practice. We showcase the procedure in a suite of experiments spanning tabular and image data, and show how the assessment of semantic match can give insight into both desirable (e.g., focusing on an object relevant for prediction) and undesirable model behaviors (e.g., focusing on a spurious correlation). We couple our experimental results with an analysis on the metrics to measure semantic match, and argue that this approach constitutes the first step towards resolving the issue of confirmation bias in XAI.
2022
Multi3Generation: Multitask, Multilingual, Multimodal Language Generation
Anabela Barreiro , José GC Souza , Albert Gatt , Mehul Bhatt , Elena Lloret , Aykut Erdem , Dimitra Gkatzia , Helena Moniz , and 7 more authors
In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation , Jun 2022
This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action – Multi3Generation (CA18231), an interdisciplinary network of research groups working on different aspects of language generation. This “meta-paper” will serve as reference for citations of the Action in future publications. It presents the objectives, challenges and a the links for the achieved outcomes.
Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning
Erkut Erdem , Menekse Kuyu , Semih Yagcioglu , Anette Frank , Letitia Parcalabescu , Barbara Plank , Andrii Babii , Oleksii Turuta , and 10 more authors
Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions.
Endowing language models with multimodal knowledge graph representations
Ningyuan Huang , Yash R Deshpande , Yibo Liu , Houda Alberts , Kyunghyun Cho , Clara Vania , and Iacer Calixto
We propose a method to make natural language understanding models more parameter efficient by storing knowledge in an external knowledge graph (KG) and retrieving from this KG using a dense index. Given (possibly multilingual) downstream task data, e.g., sentences in German, we retrieve entities from the KG and use their multimodal representations to improve downstream task performance. We use the recently released VisualSem KG as our external knowledge repository, which covers a subset of Wikipedia and WordNet entities, and compare a mix of tuple-based and graph-based algorithms to learn entity and relation representations that are grounded on the KG multimodal information. We demonstrate the usefulness of the learned entity representations on two downstream tasks, and show improved performance on the multilingual named entity recognition task by 0.3%–0.7% F1, while we achieve up to 2.5% improvement in accuracy on the visual sense disambiguation task. All our code and data are available in: \urlthis https URL.
Detecting Euphemisms with Literal Descriptions and Visual Imagery
This paper describes our two-stage system for the Euphemism Detection shared task hosted by the 3rd Workshop on Figurative Language Processing in conjunction with EMNLP 2022. Euphemisms tone down expressions about sensitive or unpleasant issues like addiction and death. The ambiguous nature of euphemistic words or expressions makes it challenging to detect their actual meaning within a context. In the first stage, we seek to mitigate this ambiguity by incorporating literal descriptions into input text prompts to our baseline model. It turns out that this kind of direct supervision yields remarkable performance improvement. In the second stage, we integrate visual supervision into our system using visual imageries, two sets of images generated by a text-to-image model by taking terms and descriptions as input. Our experiments demonstrate that visual supervision also gives a statistically significant performance boost. Our system achieved the second place with an F1 score of 87.2%, only about 0.9% worse than the best submission.
Natural language processing for mental disorders: an overview
Iacer Calixto , Viktoriya Yaneva , and Raphael Cardoso
In Natural Language Processing in Healthcare: A Special Focus on Low Resource Languages , Dec 2022
In recent years, there has been a surge in interest in using natural language processing (NLP) applications for clinical psychology and psychiatry. Despite the increased societal, economic, and academic interest, there has been no systematic critical analysis of the recent progress in NLP applications for mental disorders, or of the resources available for training and evaluating such systems. This chapter addresses this gap through two main contributions. First, it provides an overview of the NLP literature related to mental disorders, with a focus on autism, dyslexia, schizophrenia, depression and mental health in general. We discuss the strengths and shortcomings of current methodologies, specifically focusing on the challenges in obtaining large volumes of high-quality domain-specific data both for English and for lower-resource languages. We also provide a list of datasets publicly available for researchers who would like to develop NLP methods for specific mental disorders, categorized according to relevant criteria such as data source, language, annotation, and size. Our second contribution is a discussion on how to support the application of these methods to various languages and social contexts. This includes recommendations on conducting robust and ethical experiments from a machine learning perspective, and a discussion on how techniques such as cross-lingual transfer learning could be applied within this area.
VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
Letitia Parcalabescu , Michele Cafagna , Lilitta Muradjan , Anette Frank , Iacer Calixto , and Albert Gatt
In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , May 2022
We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena. VALSE offers a suite of six tests covering various linguistic constructs. Solving these requires models to ground linguistic phenomena in the visual modality, allowing more fine-grained evaluations than hitherto possible. We build VALSE using methods that support the construction of valid foils, and report results from evaluating five widely-used V&L models. Our experiments suggest that current models have considerable difficulty addressing most phenomena. Hence, we expect VALSE to serve as an important benchmark to measure future progress of pretrained V&L models from a linguistic perspective, complementing the canonical task-centred V&L evaluations.