Socket
Socket
Sign inDemoInstall

paddlenlp

Package Overview
Dependencies
18
Maintainers
1
Alerts
File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

    paddlenlp

Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system.


Maintainers
1

Readme

็ฎ€ไฝ“ไธญๆ–‡๐Ÿ€„ | English๐ŸŒŽ


Features | Installation | Quick Start | API Reference | Community

PaddleNLP is a NLP library that is both easy to use and powerful. It aggregates high-quality pretrained models in the industry and provides a plug-and-play development experience, covering a model library for various NLP scenarios. With practical examples from industry practices, PaddleNLP can meet the needs of developers who require flexible customization.

News ๐Ÿ“ข

  • 2024.01.04 PaddleNLP v2.7: The LLM experience is fully upgraded, and the tool chain LLM entrance is unified. Unify the implementation code of pre-training, fine-tuning, compression, inference and deployment to the PaddleNLP/llm directory. The new LLM Toolchain Documentation provides one-stop guidance for users from getting started with LLM to business deployment and launch. The full breakpoint storage mechanism Unified Checkpoint greatly improves the versatility of LLM storage. Efficient fine-tuning upgrade supports the simultaneous use of efficient fine-tuning + LoRA, and supports QLoRA and other algorithms.

  • 2023.08.15 PaddleNLP v2.6: Release Full-process LLM toolchain , covering all aspects of pre-training, fine-tuning, compression, inference and deployment, providing users with end-to-end LLM solutions and one-stop development experience; built-in 4D parallel distributed Trainer, Efficient fine-tuning algorithm LoRA/Prefix Tuning, Self-developed INT8/INT4 quantization algorithm, etc.; fully supports LLaMA 1/2, BLOOM, ChatGLM 1/2, GLM, OPT and other mainstream LLMs.

Installation

Prerequisites

  • python >= 3.7
  • paddlepaddle >= 2.6.0

More information about PaddlePaddle installation please refer to PaddlePaddle's Website.

Python pip Installation

pip install --upgrade paddlenlp

or you can install the latest develop branch code with the following command:

pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html

Features

๐Ÿ“ฆ Out-of-Box NLP Toolset
๐Ÿค— Awesome Chinese Model Zoo
๐ŸŽ›๏ธ Industrial End-to-end System
๐Ÿš€ High Performance Distributed Training and Inference

Out-of-Box NLP Toolset

Taskflow aims to provide off-the-shelf NLP pre-built task covering NLU and NLG technique, in the meanwhile with extremely fast inference satisfying industrial scenario.

taskflow1

For more usage please refer to Taskflow Docs.

Awesome Chinese Model Zoo

๐Ÿ€„ Comprehensive Chinese Transformer Models

We provide 45+ network architectures and over 500+ pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of the high-quality Chinese pretrained model developed by other organizations. Use AutoModel API to โšกSUPER FASTโšก download pretrained models of different architecture. We welcome all developers to contribute your Transformer models to PaddleNLP!

from paddlenlp.transformers import *

ernie = AutoModel.from_pretrained('ernie-3.0-medium-zh')
bert = AutoModel.from_pretrained('bert-wwm-chinese')
albert = AutoModel.from_pretrained('albert-chinese-tiny')
roberta = AutoModel.from_pretrained('roberta-wwm-ext')
electra = AutoModel.from_pretrained('chinese-electra-small')
gpt = AutoModelForPretraining.from_pretrained('gpt-cpm-large-cn')

Due to the computation limitation, you can use the ERNIE-Tiny light models to accelerate the deployment of pretrained models.

# 6L768H
ernie = AutoModel.from_pretrained('ernie-3.0-medium-zh')
# 6L384H
ernie = AutoModel.from_pretrained('ernie-3.0-mini-zh')
# 4L384H
ernie = AutoModel.from_pretrained('ernie-3.0-micro-zh')
# 4L312H
ernie = AutoModel.from_pretrained('ernie-3.0-nano-zh')

Unified API experience for NLP task like semantic representation, text classification, sentence matching, sequence labeling, question answering, etc.

import paddle
from paddlenlp.transformers import *

tokenizer = AutoTokenizer.from_pretrained('ernie-3.0-medium-zh')
text = tokenizer('natural language processing')

# Semantic Representation
model = AutoModel.from_pretrained('ernie-3.0-medium-zh')
sequence_output, pooled_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
# Text Classificaiton and Matching
model = AutoModelForSequenceClassification.from_pretrained('ernie-3.0-medium-zh')
# Sequence Labeling
model = AutoModelForTokenClassification.from_pretrained('ernie-3.0-medium-zh')
# Question Answering
model = AutoModelForQuestionAnswering.from_pretrained('ernie-3.0-medium-zh')
Wide-range NLP Task Support

PaddleNLP provides rich examples covering mainstream NLP task to help developers accelerate problem solving. You can find our powerful transformer Model Zoo, and wide-range NLP application examples with detailed instructions.

Also you can run our interactive Notebook tutorial on AI Studio, a powerful platform with FREE computing resource.

PaddleNLP Transformer model summary (click to show details)
ModelSequence ClassificationToken ClassificationQuestion AnsweringText GenerationMultiple Choice
ALBERTโœ…โœ…โœ…โŒโœ…
BARTโœ…โœ…โœ…โœ…โŒ
BERTโœ…โœ…โœ…โŒโœ…
BigBirdโœ…โœ…โœ…โŒโœ…
BlenderBotโŒโŒโŒโœ…โŒ
ChineseBERTโœ…โœ…โœ…โŒโŒ
ConvBERTโœ…โœ…โœ…โŒโœ…
CTRLโœ…โŒโŒโŒโŒ
DistilBERTโœ…โœ…โœ…โŒโŒ
ELECTRAโœ…โœ…โœ…โŒโœ…
ERNIEโœ…โœ…โœ…โŒโœ…
ERNIE-CTMโŒโœ…โŒโŒโŒ
ERNIE-Docโœ…โœ…โœ…โŒโŒ
ERNIE-GENโŒโŒโŒโœ…โŒ
ERNIE-Gramโœ…โœ…โœ…โŒโŒ
ERNIE-Mโœ…โœ…โœ…โŒโŒ
FNetโœ…โœ…โœ…โŒโœ…
Funnel-Transformerโœ…โœ…โœ…โŒโŒ
GPTโœ…โœ…โŒโœ…โŒ
LayoutLMโœ…โœ…โŒโŒโŒ
LayoutLMv2โŒโœ…โŒโŒโŒ
LayoutXLMโŒโœ…โŒโŒโŒ
LUKEโŒโœ…โœ…โŒโŒ
mBARTโœ…โŒโœ…โŒโœ…
MegatronBERTโœ…โœ…โœ…โŒโœ…
MobileBERTโœ…โŒโœ…โŒโŒ
MPNetโœ…โœ…โœ…โŒโœ…
NEZHAโœ…โœ…โœ…โŒโœ…
PP-MiniLMโœ…โŒโŒโŒโŒ
ProphetNetโŒโŒโŒโœ…โŒ
Reformerโœ…โŒโœ…โŒโŒ
RemBERTโœ…โœ…โœ…โŒโœ…
RoBERTaโœ…โœ…โœ…โŒโœ…
RoFormerโœ…โœ…โœ…โŒโŒ
SKEPโœ…โœ…โŒโŒโŒ
SqueezeBERTโœ…โœ…โœ…โŒโŒ
T5โŒโŒโŒโœ…โŒ
TinyBERTโœ…โŒโŒโŒโŒ
UnifiedTransformerโŒโŒโŒโœ…โŒ
XLNetโœ…โœ…โœ…โŒโœ…

For more pretrained model usage, please refer to Transformer API Docs.

Industrial End-to-end System

We provide high value scenarios including information extraction, semantic retrieval, question answering high-value.

For more details industrial cases please refer to Applications.

๐Ÿ” Neural Search System

For more details please refer to Neural Search.

โ“ Question Answering System

We provide question answering pipeline which can support FAQ system, Document-level Visual Question answering system based on ๐Ÿš€RocketQA.

For more details please refer to Question Answering and Document VQA.

๐Ÿ’Œ Opinion Extraction and Sentiment Analysis

We build an opinion extraction system for product review and fine-grained sentiment analysis based on SKEP Model.

For more details please refer to Sentiment Analysis.

๐ŸŽ™๏ธ Speech Command Analysis

Integrated ASR Model, Information Extraction, we provide a speech command analysis pipeline that show how to use PaddleNLP and PaddleSpeech to solve Speech + NLP real scenarios.

For more details please refer to Speech Command Analysis.

High Performance Distributed Training and Inference

โšก FastTokenizer: High Performance Text Preprocessing Library
AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_fast=True)

Set use_fast=True to use C++ Tokenizer kernel to achieve 100x faster on text pre-processing. For more usage please refer to FastTokenizer.

โšก FastGeneration: High Performance Generation Library
model = GPTLMHeadModel.from_pretrained('gpt-cpm-large-cn')
...
outputs, _ = model.generate(
    input_ids=inputs_ids, max_length=10, decode_strategy='greedy_search',
    use_fast=True)

Set use_fast=True to achieve 5x speedup for Transformer, GPT, BART, PLATO, UniLM text generation. For more usage please refer to FastGeneration.

๐Ÿš€ Fleet: 4D Hybrid Distributed Training

For more super large-scale model pre-training details please refer to GPT-3.

Quick Start

Taskflow aims to provide off-the-shelf NLP pre-built task covering NLU and NLG scenario, in the meanwhile with extremely fast inference satisfying industrial applications.

from paddlenlp import Taskflow

# Chinese Word Segmentation
seg = Taskflow("word_segmentation")
seg("็ฌฌๅๅ››ๅฑŠๅ…จ่ฟไผšๅœจ่ฅฟๅฎ‰ไธพๅŠž")
>>> ['็ฌฌๅๅ››ๅฑŠ', 'ๅ…จ่ฟไผš', 'ๅœจ', '่ฅฟๅฎ‰', 'ไธพๅŠž']

# POS Tagging
tag = Taskflow("pos_tagging")
tag("็ฌฌๅๅ››ๅฑŠๅ…จ่ฟไผšๅœจ่ฅฟๅฎ‰ไธพๅŠž")
>>> [('็ฌฌๅๅ››ๅฑŠ', 'm'), ('ๅ…จ่ฟไผš', 'nz'), ('ๅœจ', 'p'), ('่ฅฟๅฎ‰', 'LOC'), ('ไธพๅŠž', 'v')]

# Named Entity Recognition
ner = Taskflow("ner")
ner("ใ€Šๅญคๅฅณใ€‹ๆ˜ฏ2010ๅนดไนๅทžๅ‡บ็‰ˆ็คพๅ‡บ็‰ˆ็š„ๅฐ่ฏด๏ผŒไฝœ่€…ๆ˜ฏไฝ™ๅ…ผ็พฝ")
>>> [('ใ€Š', 'w'), ('ๅญคๅฅณ', 'ไฝœๅ“็ฑป_ๅฎžไฝ“'), ('ใ€‹', 'w'), ('ๆ˜ฏ', '่‚ฏๅฎš่ฏ'), ('2010ๅนด', 'ๆ—ถ้—ด็ฑป'), ('ไนๅทžๅ‡บ็‰ˆ็คพ', '็ป„็ป‡ๆœบๆž„็ฑป'), ('ๅ‡บ็‰ˆ', 'ๅœบๆ™ฏไบ‹ไปถ'), ('็š„', 'ๅŠฉ่ฏ'), ('ๅฐ่ฏด', 'ไฝœๅ“็ฑป_ๆฆ‚ๅฟต'), ('๏ผŒ', 'w'), ('ไฝœ่€…', 'ไบบ็‰ฉ็ฑป_ๆฆ‚ๅฟต'), ('ๆ˜ฏ', '่‚ฏๅฎš่ฏ'), ('ไฝ™ๅ…ผ็พฝ', 'ไบบ็‰ฉ็ฑป_ๅฎžไฝ“')]

# Dependency Parsing
ddp = Taskflow("dependency_parsing")
ddp("9ๆœˆ9ๆ—ฅไธŠๅˆ็บณ่พพๅฐ”ๅœจไบš็‘Ÿยท้˜ฟไป€็ƒๅœบๅ‡ป่ดฅไฟ„็ฝ—ๆ–ฏ็ƒๅ‘˜ๆข…ๅพท้Ÿฆๆฐๅคซ")
>>> [{'word': ['9ๆœˆ9ๆ—ฅ', 'ไธŠๅˆ', '็บณ่พพๅฐ”', 'ๅœจ', 'ไบš็‘Ÿยท้˜ฟไป€็ƒๅœบ', 'ๅ‡ป่ดฅ', 'ไฟ„็ฝ—ๆ–ฏ', '็ƒๅ‘˜', 'ๆข…ๅพท้Ÿฆๆฐๅคซ'], 'head': [2, 6, 6, 5, 6, 0, 8, 9, 6], 'deprel': ['ATT', 'ADV', 'SBV', 'MT', 'ADV', 'HED', 'ATT', 'ATT', 'VOB']}]

# Sentiment Analysis
senta = Taskflow("sentiment_analysis")
senta("่ฟ™ไธชไบงๅ“็”จ่ตทๆฅ็œŸ็š„ๅพˆๆต็•…๏ผŒๆˆ‘้žๅธธๅ–œๆฌข")
>>> [{'text': '่ฟ™ไธชไบงๅ“็”จ่ตทๆฅ็œŸ็š„ๅพˆๆต็•…๏ผŒๆˆ‘้žๅธธๅ–œๆฌข', 'label': 'positive', 'score': 0.9938690066337585}]

API Reference

  • Support LUGE dataset loading and compatible with Hugging Face Datasets. For more details please refer to Dataset API.
  • Using Hugging Face style API to load 500+ selected transformer models and download with fast speed. For more information please refer to Transformers API.
  • One-line of code to load pre-trained word embedding. For more usage please refer to Embedding API.

Please find all PaddleNLP API Reference from our readthedocs.

Community

Slack

To connect with other users and contributors, welcome to join our Slack channel.

WeChat

Scan the QR code below with your Wechatโฌ‡๏ธ. You can access to official technical exchange group. Look forward to your participation.

Citation

If you find PaddleNLP useful in your research, please consider cite

@misc{=paddlenlp,
    title={PaddleNLP: An Easy-to-use and High Performance NLP Library},
    author={PaddleNLP Contributors},
    howpublished = {\url{https://github.com/PaddlePaddle/PaddleNLP}},
    year={2021}
}

Acknowledge

We have borrowed from Hugging Face's Transformers๐Ÿค— excellent design on pretrained models usage, and we would like to express our gratitude to the authors of Hugging Face and its open source community.

License

PaddleNLP is provided under the Apache-2.0 License.

FAQs


Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with โšก๏ธ by Socket Inc