Adaptor: Objective-centric Adaptation library

Adaptor will help you to easily adapt a language model to your own data domain(s), task(s),
or custom objective(s).
If you want to jump right in, take a look at the tutorials.
Table of Content
Click to expand
Benefits of Task and Domain Adaptation
Both domain adaptation (e.g. Beltagy, 2019)
and task adaptation (e.g. Gururangan, 2020)
are reported to improve quality of the language models on end tasks,
and improve model's comprehension on more niche domains,
suggesting that it's usually a good idea to adapt pre-trained model before the final fine-tuning.
However, it is still not a common practice, maybe because it is still a tedious thing to do. In the model-centric training, the multi-step,
or multi-objective training requires a separate configuration of every training step due to the differences in the models'
architectures specific to the chosen training objective and data set.
How Adaptor handles training?
Adaptor framework abstracts the term of Objective away from the model.
With Adaptor, Any objective can be applied to any model, for as long as the trained model has some head of a compatible shape.
The ordering in which the Objectives are applied is determined by the given Schedule.
In conventional adaptation, the objectives are applied sequentially (that's what SequentialSchedule does),
but they might as well be applied in a combilation (ParallelSchedule), or balanced dynamically,
e.g. according to its objectives` losses.

In the Adaptor framework, instead of providing the Trainer with a model encoded dataset both compatible
with specific training task,
a user constructs a Schedule composed of the initialised Objectives, where each Objective performs its
dataset sampling and objective-specific feature alignment (compliant with objective.compatible_head).
When training classic transformers models, a selection of objectives is model-agnostic: each objective takes care
of resolving its own compatible head within given LangModule.
How Can Adaptor Help
Adaptor introduces objective-centric, instead of model-centric approach to the training process,
that makes it easier to experiment with multi-objective training, creating custom objectives. Thanks to that, you can do some things,
that are difficult, or impossible in other NLP frameworks (like HF Transformers, FairSeq or NLTK). For example:
- Domain adaptation or Task adaptation: you do not have to handle the model
between different training scripts, minimising a chance of error and improving reproducibility
- Seamlessly experiment with different schedule strategies, allowing you, e.g. to backpropagate based
on multiple objectives in every training step
- Track the progress of the model, concurrently on each relevant objective, allowing you to easier
recognise weak points of your model
- Easily perform Multi-task learning, which that can
improves model robustness
- Although Adaptor aims primarily for training the models of the transformer family, the library is designed to
work with any PyTorch model
Built upon the well-established and maintained 🤗 Transformers library, Adaptor will automatically support
future new NLP models out-of-box. The upgrade of Adaptor to a different version of Hugging Face Transformers library
should not take longer than a few minutes.
Usage
First, install the library:
pip install adaptor
If you clone it, you can also run and modify the provided example scripts.
git clone {this repo}
cd adaptor
python -m pip install -e .
You can also find and run full examples below with all the imports in
tests/end2end_usecases_test.py.
Adapted Named Entity Recognition
Say you have nicely annotated entities in a set of news articles, but eventually, you want to use the language model
to detect entities in office documents. You can either train the NER model on news articles, hoping that
it will not lose much accuracy on other domains. Or you can concurrently train on both data sets:
lang_module = LangModule("bert-base-multilingual-cased")
objectives = [MaskedLanguageModeling(lang_module,
batch_size=16,
texts_or_path="tests/mock_data/domain_unsup.txt"),
TokenClassification(lang_module,
batch_size=16,
texts_or_path="tests/mock_data/ner_texts_sup.txt",
labels_or_path="tests/mock_data/ner_texts_sup_labels.txt")]
schedule = ParallelSchedule(objectives, training_arguments)
adapter = Adapter(lang_module, schedule, training_arguments)
adapter.train()
adapter.save_model("entity_detector_model")
ner_model = AutoModelForTokenClassification.from_pretrained("entity_detector_model/TokenClassification")
tokenizer = AutoTokenizer.from_pretrained("entity_detector_model/TokenClassification")
inputs = tokenizer("Is there any Abraham Lincoln here?", return_tensors="pt")
outputs = ner_model(**inputs)
ner_tags = [ner_model.config.id2label[label_id.item()] for label_id in outputs.logits[0].argmax(-1)]
Try this example on real data, in tutorials/adapted_named_entity_recognition.ipynb
Adapted Machine Translation
Say you have a lot of clean parallel texts for news articles (like you can find on OPUS),
but eventually, you need to translate a different domain, for example chats with a lot of typos,
or medicine texts with a lot of latin expressions.
lang_module = LangModule("Helsinki-NLP/opus-mt-en-de")
seq2seq_evaluators = [BLEU(decides_convergence=True)]
objectives = [BackTranslation(lang_module,
batch_size=1,
texts_or_path="tests/mock_data/domain_unsup.txt",
back_translator=BackTranslator("Helsinki-NLP/opus-mt-de-en"),
val_evaluators=seq2seq_evaluators),
Sequence2Sequence(lang_module,
batch_size=1,
texts_or_path="tests/mock_data/seq2seq_sources.txt",
labels_or_path="tests/mock_data/seq2seq_targets.txt",
val_evaluators=seq2seq_evaluators,
source_lang_id="en", target_lang_id="cs")]
schedule = ParallelSchedule(objectives, adaptation_arguments)
adapter = Adapter(lang_module, schedule, adaptation_arguments)
adapter.train()
adapter.save_model("translator_model")
translator_model = AutoModelForSeq2SeqLM.from_pretrained("translator_model/Sequence2Sequence")
tokenizer = AutoTokenizer.from_pretrained("translator_model/Sequence2Sequence")
inputs = tokenizer("A piece of text to translate.", return_tensors="pt")
output_ids = translator_model.generate(**inputs)
output_text = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
print(output_text)
Try this example on real data, in tutorials/unsupervised_machine_translation.ipynb
Single-objective training
It also makes sense to use the comfort of adaptor high-level interface in simple use-cases
where it's enough to apply one objective.
lang_module = LangModule(test_base_models["sequence_classification"])
classification = SequenceClassification(lang_module=lang_module,
texts_or_path="tests/mock_data/supervised_texts.txt",
labels_or_path="tests/mock_data/supervised_texts_sequence_labels.txt",
batch_size=1)
schedule = SequentialSchedule(objectives=[classification], args=training_arguments)
adapter = Adapter(lang_module=lang_module, schedule=parallel_schedule, args=training_arguments)
adapter.train()
adapter.save_model("output_model")
classifier = AutoModelForSequenceClassification.from_pretrained("output_model/SequenceClassification")
tokenizer = AutoTokenizer.from_pretrained("output_model/SequenceClassification")
inputs = tokenizer("A piece of text to translate.", return_tensors="pt")
output = classifier(**inputs)
output_label_id = output.logits.argmax(-1)[0].item()
print("Your new model predicted class: %s" % classifier.config.id2label[output_label_id])
Try this example on real data in tutorials/simple_sequence_classification.ipynb
More examples
You can find more examples in tutorials. Your contributions are welcome :) (see CONTRIBUTING.md)
Motivation for objective-centric training
We've seen that transformers can outstandingly perform on relatively complicated tasks, which makes us
think that experimenting with custom objectives can also improve their desperately-needed
generalisation abilities (many studies report transformers inability to generalise the end task, e.g. on
language inference,
paraphrase detection, or
machine translation).
This way, we're also hoping to enable the easy use of the most accurate deep language models for more
specialised domains of application, where a little supervised data is available, but
much more unsupervised sources can be found (a typical Domain adaptation case).
Such applications include for instance machine translation of non-canonical domains (chats or expert texts) or personal names recognition in texts of a domain with none of its own labeled names, but the use-cases are limitless.
How can you contribute?
-
If you want to add a new objective or schedule, see CONTRIBUTING.md.
-
If you find an issue, please report it in this repository and if you'd
also be able to fix it, don't hesitate to contribute and create a PR.
-
If you'd just like to share your general impressions or personal experience with others,
we're happy to get into a discussion in the Discussions section.
Citing Adaptor
If you use Adaptor in your research, please cite it as follows.
Text
ŠTEFÁNIK, Michal, Vít NOVOTNÝ, Nikola GROVEROVÁ and Petr SOJKA. Adaptor: Objective-Centric Adaptation Framework for Language Models. In Proceedings of 60th Annual Meeting of the Association for Computational Linguistics: Demonstrations. ACL, 2022. 7 pp.
BibTeX
@inproceedings{stefanik2022adaptor,
author = {\v{S}tef\'{a}nik, Michal and Novotn\'{y}, V\'{i}t and Groverov{\'a}, Nikola and Sojka, Petr},
title = {Adapt\$\mathcal\{O\}\$r: Objective-Centric Adaptation Framework for Language Models},
booktitle = {Proceedings of 60th Annual Meeting of the Association for Computational Linguistics: Demonstrations},
publisher = {ACL},
numpages = {7},
url = {https://aclanthology.org/2022.acl-demo.26},
}
If you have any other question(s), feel free to create an issue.