Jonas Pfeiffer

I am a Research Scientist at Google Research working on Natural Language Processing, interested in modular representation learning in multi-task, multilingual, and multi-modal contexts, and in low-resource scenarios.

I am one of the main contributors of the AdapterHub.ml framework, which makes it very easy to add and train new parameters within pre-trained transformer-based language models.

news

Aug 22, 2022	I have started as a Research Scientist at Google Research in Zürich
May 15, 2022	Our paper IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages was accepted at ICML 2022.
Apr 7, 2022	I am happy to announce that my paper Lifting the Curse of Multilinguality by Pre-training Modular Transformers was accepted at NAACL 2022
Feb 21, 2022	I am happy to announce that 4 of my papers were accepted at ACL/TACL 2022: xGQA: Cross-Lingual Visual Question Answering (Findings) AdapterHub Playground: Simple and Flexible Few-Shot Learning with Adapters (Demo) Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval (TACL) UKP-SQUARE: An Online Platform for Question Answering Research (Demo): TBA
Jan 24, 2022	I am excited to announce that I have joined the New York University where I will be working with Kyunghyun Cho during my 6-Month research visit.

selected publications

Lifting the Curse of Multilinguality by Pre-training Modular Transformers

Pfeiffer, Jonas, Goyal, Naman, Lin, Xi Victoria, Li, Xian, Cross, James, Riedel, Sebastian, and Artetxe, Mikel

In NAACL 2022

Abs PDF

Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while maintaining the total number of trainable parameters per language. In contrast to prior work which learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Mod (X-Mod) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts

Pfeiffer, Jonas, Vulić, Ivan, Gurevych, Iryna, and Ruder, Sebastian

In Proceedings of EMNLP 2021

Abs PDF

Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks. However, due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages. The ultimate challenge is dealing with under-resourced languages not covered at all by the models, which are also written in scripts \textitunseen during pretraining. In this work, we propose a series of novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts. Relying on matrix factorization, our proposed methods capitalize on the existing latent knowledge about multiple languages already available in the pretrained model’s embedding matrix. Furthermore, we show that learning of the new dedicated embedding matrix in the target language can be improved by leveraging a small number of vocabulary items (i.e., the so-called \textitlexically overlapping tokens) shared between mBERT’s and target language vocabulary. Our adaptation techniques offer substantial performance gains for languages with unseen scripts. We also demonstrate that they can also yield improvements for low-resource languages written in scripts covered by the pretrained model.
AdapterFusion: Non-Destructive Task Composition for Transfer Learning

Pfeiffer, Jonas, Kamath, Aishwarya, Rücklé, Andreas, Cho, Kyunghyun, and Gurevych, Iryna

In Proceedings of EACL 2021

Abs PDF

Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in dataset balancing. To address these shortcomings, we propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks. First, in the knowledge extraction stage we learn task specific parameters called adapters, that encapsulate the task-specific information. We then combine the adapters in a separate knowledge composition step. We show that by separating the two stages, i.e., knowledge extraction and knowledge composition, the classifier can effectively exploit the representations learned from multiple tasks in a non-destructive manner. We empirically evaluate AdapterFusion on 16 diverse NLU tasks, and find that it effectively combines various types of knowledge at different layers of the model. We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning. Our code and adapters are available at AdapterHub.ml.
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer

Pfeiffer, Jonas, Vulić, Ivan, Gurevych, Iryna, and Ruder, Sebastian

In Proceedings of EMNLP 2020

Abs PDF

The main goal behind state-of-the-art pre-trained multilingual models such as multilingual BERT and XLM-R is enabling and bootstrapping NLP applications in low-resource languages through zero-shot or few-shot cross-lingual transfer. However, due to limited model capacity, their transfer performance is the weakest exactly on such low-resource languages and languages unseen during pre-training. We propose MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations. In addition, we introduce a novel invertible adapter architecture and a strong baseline method for adapting a pre-trained multilingual model to a new language. MAD-X outperforms the state of the art in cross-lingual transfer across a representative set of typologically diverse languages on named entity recognition and causal commonsense reasoning, and achieves competitive results on question answering. Our code and adapters are available at AdapterHub.ml
AdapterHub: A Framework for Adapting Transformers

Pfeiffer, Jonas, Rücklé, Andreas, Poth, Clifton, Kamath, Aishwarya, Vulić, Ivan, Ruder, Sebastian, Cho, Kyunghyun, and Gurevych, Iryna

In Proceedings of EMNLP - System Demonstrations 2020

Abs PDF

The current modus operandi in NLP involves downloading and fine-tuning pre-trained models consisting of millions or billions of parameters. Storing and sharing such large trained models is expensive, slow, and time-consuming, which impedes progress towards more general and versatile NLP methods that learn from and for many tasks. Adapters – small learnt bottleneck layers inserted within each layer of a pre-trained model – ameliorate this issue by avoiding full fine-tuning of the entire model. However, sharing and integrating adapter layers is not straightforward. We propose AdapterHub, a framework that allows dynamic "stitching-in" of pre-trained adapters for different tasks and languages. The framework, built on top of the popular HuggingFace Transformers library, enables extremely easy and quick adaptations of state-of-the-art pre-trained models (e.g., BERT, RoBERTa, XLM-R) across tasks and languages. Downloading, sharing, and training adapters is as seamless as possible using minimal changes to the training scripts and a specialized infrastructure. Our framework enables scalable and easy access to sharing of task-specific models, particularly in low-resource scenarios. AdapterHub includes all recent adapter architectures and can be found at https://AdapterHub.ml.