LLMLingua Series | Effectively Deliver Information to LLMs via Prompt Compression
| Project Page |
LLMLingua |
LongLLMLingua |
LLMLingua-2 |
LLMLingua Demo |
LLMLingua-2 Demo |
https://github.com/microsoft/LLMLingua/assets/30883354/eb0ea70d-6d4c-4aa7-8977-61f94bb87438
News
- 🦚 We're excited to announce the release of LLMLingua-2, boasting a 3x-6x speed improvement over LLMLingua! For more information, check out our paper, visit the project page, and explore our demo.
- 👾 LLMLingua has been integrated into LangChain and LlamaIndex, two widely-used RAG frameworks.
- 🤳 Talk slides are available in AI Time Jan, 24.
- 🖥 EMNLP'23 slides are available in Session 5 and BoF-6.
- 📚 Check out our new blog post discussing RAG benefits and cost savings through prompt compression. See the script example here.
- 🎈 Visit our project page for real-world case studies in RAG, Online Meetings, CoT, and Code.
- 👨🦯 Explore our './examples' directory for practical applications, including LLMLingua-2, RAG, Online Meeting, CoT, Code, and RAG using LlamaIndex.
TL;DR
LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.
LongLLMLingua mitigates the 'lost in the middle' issue in LLMs, enhancing long-context information processing. It reduces costs and boosts efficiency with prompt compression, improving RAG performance by up to 21.4% using only 1/4 of the tokens.
LLMLingua-2, a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in task-agnostic compression. It surpasses LLMLingua in handling out-of-domain data, offering 3x-6x faster performance.
- LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression (Under Review)
Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruhle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang
🎥 Overview
- Ever encountered the token limit when asking ChatGPT to summarize lengthy texts?
- Frustrated with ChatGPT forgetting previous instructions after extensive fine-tuning?
- Experienced high costs using GPT3.5/4 API for experiments despite excellent results?
While Large Language Models like ChatGPT and GPT-4 excel in generalization and reasoning, they often face challenges like prompt length limits and prompt-based pricing schemes.
Now you can use LLMLingua, LongLLMLingua, and LLMLingua-2!
These tools offer an efficient solution to compress prompts by up to 20x, enhancing the utility of LLMs.
- 💰 Cost Savings: Reduces both prompt and generation lengths with minimal overhead.
- 📝 Extended Context Support: Enhances support for longer contexts, mitigates the "lost in the middle" issue, and boosts overall performance.
- ⚖️ Robustness: No additional training needed for LLMs.
- 🕵️ Knowledge Retention: Maintains original prompt information like ICL and reasoning.
- 📜 KV-Cache Compression: Accelerates inference process.
- 🪃 Comprehensive Recovery: GPT-4 can recover all key information from compressed prompts.
PS: This demo is based on the alt-gpt project. Special thanks to @Livshitz for their valuable contribution.
If you find this repo helpful, please cite the following papers:
@inproceedings{jiang-etal-2023-llmlingua,
title = "{LLML}ingua: Compressing Prompts for Accelerated Inference of Large Language Models",
author = "Huiqiang Jiang and Qianhui Wu and Chin-Yew Lin and Yuqing Yang and Lili Qiu",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.825",
doi = "10.18653/v1/2023.emnlp-main.825",
pages = "13358--13376",
}
@article{jiang-etal-2023-longllmlingua,
title = "{L}ong{LLML}ingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression",
author = "Huiqiang Jiang and Qianhui Wu and and Xufang Luo and Dongsheng Li and Chin-Yew Lin and Yuqing Yang and Lili Qiu",
url = "https://arxiv.org/abs/2310.06839",
journal = "ArXiv preprint",
volume = "abs/2310.06839",
year = "2023",
}
@article{wu2024llmlingua2,
title = "{LLML}ingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression",
author = "Zhuoshi Pan and Qianhui Wu and Huiqiang Jiang and Menglin Xia and Xufang Luo and Jue Zhang and Qingwei Lin and Victor Ruhle and Yuqing Yang and Chin-Yew Lin and H. Vicky Zhao and Lili Qiu and Dongmei Zhang",
url = "https://arxiv.org/abs/2403.12968",
journal = "ArXiv preprint",
volume = "abs/2403.12968",
year = "2024",
}
🎯 Quick Start
1. Installing LLMLingua:
To get started with LLMLingua, simply install it using pip:
pip install llmlingua
2. Using LLMLingua Series Methods for Prompt Compression:
With LLMLingua, you can easily compress your prompts. Here’s how you can do it:
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor()
compressed_prompt = llm_lingua.compress_prompt(prompt, instruction="", question="", target_token=200)
llm_lingua = PromptCompressor("microsoft/phi-2")
llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"})
To try LongLLMLingua in your scenorias, you can use
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor()
compressed_prompt = llm_lingua.compress_prompt(
prompt_list,
question=question,
ratio=0.55,
condition_in_question="after_condition",
reorder_context="sort",
dynamic_context_compression_ratio=0.3,
condition_compare=True,
context_budget="+100",
rank_method="longllmlingua",
)
To try LLMLingua-2 in your scenorias, you can use
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor(
model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank",
use_llmlingua2=True,
)
compressed_prompt = llm_lingua.compress_prompt(prompt, rate=0.33, force_tokens = ['\n', '?'])
llm_lingua = PromptCompressor(
model_name="microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank",
use_llmlingua2=True,
)
3. Advanced usage - Structured Prompt Compression:
Split text into sections, decide on whether to compress and its rate. Use <llmlingua></llmlingua>
tags for context segmentation, with optional rate and compress parameters.
structured_prompt = """<llmlingua, compress=False>Speaker 4:</llmlingua><llmlingua, rate=0.4> Thank you. And can we do the functions for content? Items I believe are 11, three, 14, 16 and 28, I believe.</llmlingua><llmlingua, compress=False>
Speaker 0:</llmlingua><llmlingua, rate=0.4> Item 11 is a communication from Council on Price recommendation to increase appropriation in the general fund group in the City Manager Department by $200 to provide a contribution to the Friends of the Long Beach Public Library. Item 12 is communication from Councilman Super Now. Recommendation to increase appropriation in the special advertising and promotion fund group and the city manager's department by $10,000 to provide support for the end of summer celebration. Item 13 is a communication from Councilman Austin. Recommendation to increase appropriation in the general fund group in the city manager department by $500 to provide a donation to the Jazz Angels . Item 14 is a communication from Councilman Austin. Recommendation to increase appropriation in the general fund group in the City Manager department by $300 to provide a donation to the Little Lion Foundation. Item 16 is a communication from Councilman Allen recommendation to increase appropriation in the general fund group in the city manager department by $1,020 to provide contribution to Casa Korero, Sew Feria Business Association, Friends of Long Beach Public Library and Dave Van Patten. Item 28 is a communication. Communication from Vice Mayor Richardson and Council Member Muranga. Recommendation to increase appropriation in the general fund group in the City Manager Department by $1,000 to provide a donation to Ron Palmer Summit. Basketball and Academic Camp.</llmlingua><llmlingua, compress=False>
Speaker 4:</llmlingua><llmlingua, rate=0.6> We have a promotion and a second time as councilman served Councilman Ringa and customers and they have any comments.</llmlingua>"""
compressed_prompt = llm_lingua.structured_compress_prompt(structured_prompt, instruction="", question="", rate=0.5)
print(compressed_prompt['compressed_prompt'])
4. Learning More:
To understand how to apply LLMLingua and LongLLMLingua in real-world scenarios like RAG, Online Meetings, CoT, and Code, please refer to our examples. For detailed guidance, the documentation provides extensive recommendations on effectively utilizing LLMLingua.
5. Data collection and model training of LLMLingua-2:
To train the compressor on your custom data, please refer to our data_collection and model_training.
Frequently Asked Questions
For more insights and answers, visit our FAQ section.
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct.
For more information see the Code of Conduct FAQ or
contact opencode@microsoft.com with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
Microsoft's Trademark & Brand Guidelines.
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.