Socket
Book a DemoInstallSign in
Socket

bert4torch

Package Overview
Dependencies
Maintainers
1
Versions
55
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

bert4torch

an elegant bert4torch

pipPyPI
Version
0.6.0
Maintainers
1

bert4torch

licence GitHub release PyPI PyPI - Downloads GitHub stars GitHub Issues contributions welcome Generic badge

Documentation | Torch4keras | Examples | build_MiniLLM_from_scratch | bert4vector

目录

1. 下载安装

安装稳定版

pip install bert4torch

安装最新版

pip install git+https://github.com/Tongjilibo/bert4torch
  • 注意事项:pip包的发布慢于git上的开发版本,git clone注意引用路径,注意权重是否需要转换
  • 测试用例git clone https://github.com/Tongjilibo/bert4torch,修改example中的预训练模型文件路径和数据路径即可启动脚本
  • 自行训练:针对自己的数据,修改相应的数据处理代码块
  • 开发环境:原使用 torch==1.10版本进行开发,现已切换到 torch2.0开发,如其他版本遇到不适配,欢迎反馈

2. 功能

  • LLM模型: 加载chatglm、llama、 baichuan、ziya、bloom等开源大模型权重进行推理和微调,命令行一行部署大模型

  • 核心功能:加载bert、roberta、albert、xlnet、nezha、bart、RoFormer、RoFormer_V2、ELECTRA、GPT、GPT2、T5、GAU-alpha、ERNIE等预训练权重继续进行finetune、并支持在bert基础上灵活定义自己模型

  • 丰富示例:包含llmpretrainsentence_classficationsentence_embeddingsequence_labelingrelation_extractionseq2seqserving等多种解决方案

  • 实验验证:已在公开数据集实验验证,使用如下examples数据集实验指标

  • 易用trick:集成了常见的trick,即插即用

  • 其他特性加载transformers库模型一起使用;调用方式简洁高效;有训练进度条动态展示;配合torchinfo打印参数量;默认Logger和Tensorboard简便记录训练过程;自定义fit过程,满足高阶需求

  • 训练过程

    训练过程

功能bert4torchtransformers备注
训练进度条进度条打印loss和定义的metrics
分布式训练dp/ddptorch自带dp/ddp
各类callbacks日志/tensorboard/earlystop/wandb等
大模型推理,stream/batch输出各个模型是通用的,无需单独维护脚本
大模型微调lora依赖peft库,pv2自带
丰富tricks对抗训练等tricks即插即用
代码简洁易懂,自定义空间大代码复用度高, keras代码训练风格
仓库的维护能力/影响力/使用量/兼容性目前仓库个人维护
一键部署大模型

3. 快速上手

3.1 上手教程

3.2 命令行快速部署大模型服务

  • 本地 / 联网加载
    # 联网下载全部文件
    bert4torch serve --checkpoint_path Qwen2-0.5B-Instruct
    
    # 加载本地大模型,联网下载bert4torch_config.json
    bert4torch serve --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct --config_path Qwen/Qwen2-0.5B-Instruct
    
    # 加载本地大模型,且bert4torch_config.json已经下载并放于同名目录下
    bert4torch serve --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct
    
  • 命令行 / gradio网页 / openai_api
    # 命令行
    bert4torch serve --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct --mode cli
    
    # gradio网页
    bert4torch serve --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct --mode gradio
    
    # openai_api
    bert4torch serve --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct --mode openai
    
  • 命令行聊天示例 命令行聊天

4. 版本和更新历史

4.1 版本历史

更新日期bert4torchtorch4keras版本说明
202509250.6.00.3.2增加 Qwen3-moe, 支持 gptqawq等主流量化方式,其他代码优化
202507210.5.9.post20.3.1增加 Ernie4_5, 修复hub下载bug, 拆分出 openai_client
202506210.5.80.3.0增加 Qwen3-EmbeddingQwen3-Reranker, 支持 temperature设置为0, 修复 sdpaglobal_point的bug, 拆分 attention_utils
202505110.5.70.2.9.post2命令行参数修改为 bert4torch serve, 增加 Qwen3
202504010.5.60.2.9命令行支持图片输入, 修复rope在batch推理和超长时候的bug
202502150.5.50.2.8增加 deepseek-r1, internvl, internlm3, glm4v, modernbert, mllama, qwen2vl, qwenvl
202409280.5.40.2.7【新功能】增加 deepseekMiniCPMMiniCPMVllama3.2Qwen2.5;支持device_map=auto;【修复】修复batch_generate和n>1的bug
202408140.5.30.2.6【新功能】增加 llama3.1/Yi1.5;自动选择从hfmirror下载;支持命令行参数 bert4torch serve

更多版本

4.2 更新历史

更多历史

5. 预训练权重

  • 预训练模型支持多种代码加载方式

    from bert4torch.models import build_transformer_model
    
    # 1. 仅指定config_path: 从头初始化模型结构, 不加载预训练模型
    model = build_transformer_model('./model/bert4torch_config.json')
    
    # 2. 仅指定checkpoint_path: 
    ## 2.1 文件夹路径: 自动寻找路径下的*.bin/*.safetensors权重文件 + 需把bert4torch_config.json下载并放于该目录下
    model = build_transformer_model(checkpoint_path='./model')
    
    ## 2.2 文件路径/列表: 文件路径即权重路径/列表, bert4torch_config.json会从同级目录下寻找
    model = build_transformer_model(checkpoint_path='./pytorch_model.bin')
    
    ## 2.3 model_name: hf上预训练权重名称, 会自动下载hf权重以及bert4torch_config.json文件
    model = build_transformer_model(checkpoint_path='google-bert/bert-base-chinese')
    
    # 3. 同时指定config_path和checkpoint_path(本地路径名或model_name排列组合): 
    #    本地路径从本地加载,pretrained_model_name会联网下载
    config_path = './model/bert4torch_config.json'  # 或'google-bert/bert-base-chinese'
    checkpoint_path = './model/pytorch_model.bin'  # 或'google-bert/bert-base-chinese'
    model = build_transformer_model(config_path, checkpoint_path)
    
  • 预训练权重链接和bert4torch_config.json

模型分类模型名称权重来源权重链接/checkpoint_pathconfig_path
bertbert-base-chinesegoogle-bertgoogle-bert/bert-base-chinesegoogle-bert/bert-base-chinese
chinese_L-12_H-768_A-12谷歌tf权重<br>Tongjilibo/bert-chinese_L-12_H-768_A-12
chinese-bert-wwm-extHFLhfl/chinese-bert-wwm-exthfl/chinese-bert-wwm-ext
bert-base-multilingual-casedgoogle-bertgoogle-bert/bert-base-multilingual-casedgoogle-bert/bert-base-multilingual-cased
bert-base-casedgoogle-bertgoogle-bert/bert-base-casedgoogle-bert/bert-base-cased
bert-base-uncasedgoogle-bertgoogle-bert/bert-base-uncasedgoogle-bert/bert-base-uncased
MacBERTHFLhfl/chinese-macbert-base<br>hfl/chinese-macbert-largehfl/chinese-macbert-base<br>hfl/chinese-macbert-large
WoBERT追一科技junnyu/wobert_chinese_basejunnyu/wobert_chinese_plus_basejunnyu/wobert_chinese_base<br>junnyu/wobert_chinese_plus_base
robertachinese-roberta-wwm-extHFLhfl/chinese-roberta-wwm-ext<br>hfl/chinese-roberta-wwm-ext-large<br>(large的mlm权重是随机初始化)hfl/chinese-roberta-wwm-ext<br>hfl/chinese-roberta-wwm-ext-large
roberta-small/tiny追一科技Tongjilibo/chinese_roberta_L-4_H-312_A-12<br>Tongjilibo/chinese_roberta_L-6_H-384_A-12
roberta-baseFacebookAIFacebookAI/roberta-baseFacebookAI/roberta-base
guwenbertethanytethanyt/guwenbert-baseethanyt/guwenbert-base
albertalbert_zh<br>albert_pytorchbrightmartvoidful/albert_chinese_tiny<br>voidful/albert_chinese_small<br>voidful/albert_chinese_base<br>voidful/albert_chinese_large<br>voidful/albert_chinese_xlarge<br>voidful/albert_chinese_xxlargevoidful/albert_chinese_tiny<br>voidful/albert_chinese_small<br>voidful/albert_chinese_base<br>voidful/albert_chinese_large<br>voidful/albert_chinese_xlarge<br>voidful/albert_chinese_xxlarge
nezhaNEZHA<br>NeZha_Chinese_PyTorchhuawei_noahsijunhe/nezha-cn-base<br>sijunhe/nezha-cn-large<br>sijunhe/nezha-base-wwm<br>sijunhe/nezha-large-wwmsijunhe/nezha-cn-base<br>sijunhe/nezha-cn-large<br>sijunhe/nezha-base-wwm<br>sijunhe/nezha-large-wwm
nezha_gpt_dialogbojoneTongjilibo/nezha_gpt_dialog
xlnetChinese-XLNetHFLhfl/chinese-xlnet-basehfl/chinese-xlnet-base
tranformer_xlhuggingfacetransfo-xl/transfo-xl-wt103transfo-xl/transfo-xl-wt103
debertaErlangshen-DeBERTa-v2IDEAIDEA-CCNL/Erlangshen-DeBERTa-v2-97M-Chinese<br>IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese<br>IDEA-CCNL/Erlangshen-DeBERTa-v2-710M-ChineseIDEA-CCNL/Erlangshen-DeBERTa-v2-97M-Chinese<br>IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese<br>IDEA-CCNL/Erlangshen-DeBERTa-v2-710M-Chinese
electraChinese-ELECTRAHFLhfl/chinese-electra-base-discriminatorhfl/chinese-electra-base-discriminator
ernieernie百度文心nghuyong/ernie-1.0-base-zh<br>nghuyong/ernie-3.0-base-zhnghuyong/ernie-1.0-base-zh<br>nghuyong/ernie-3.0-base-zh
roformerroformer追一科技junnyu/roformer_chinese_basejunnyu/roformer_chinese_base
roformer_v2追一科技junnyu/roformer_v2_chinese_char_basejunnyu/roformer_v2_chinese_char_base
simbertsimbert追一科技Tongjilibo/simbert-chinese-base<br>Tongjilibo/simbert-chinese-small<br>Tongjilibo/simbert-chinese-tiny
simbert_v2/roformer-sim追一科技junnyu/roformer_chinese_sim_char_base<br>junnyu/roformer_chinese_sim_char_ft_base<br>junnyu/roformer_chinese_sim_char_small<br>junnyu/roformer_chinese_sim_char_ft_smalljunnyu/roformer_chinese_sim_char_base<br>junnyu/roformer_chinese_sim_char_ft_base<br>junnyu/roformer_chinese_sim_char_small<br>junnyu/roformer_chinese_sim_char_ft_small
gauGAU-alpha追一科技Tongjilibo/chinese_GAU-alpha-char_L-24_H-768
ModernBERTModernBERTanswerdotaianswerdotai/ModernBERT-base<br>answerdotai/ModernBERT-largeanswerdotai/ModernBERT-base<br>answerdotai/ModernBERT-large
uieuie<br>uie_pytorch百度Tongjilibo/uie-base
gptCDial-GPTthu-coaithu-coai/CDial-GPT_LCCC-base<br>thu-coai/CDial-GPT_LCCC-largethu-coai/CDial-GPT_LCCC-base<br>thu-coai/CDial-GPT_LCCC-large
cmp_lm(26亿)清华TsinghuaAI/CPM-GenerateTsinghuaAI/CPM-Generate
nezha_genhuawei_noahTongjilibo/chinese_nezha_gpt_L-12_H-768_A-12
gpt2-chinese-cluecorpussmallUERuer/gpt2-chinese-cluecorpussmalluer/gpt2-chinese-cluecorpussmall
gpt2-mlimcasparTongjilibo/gpt2-ml_15g_corpus<br>Tongjilibo/gpt2-ml_30g_corpus<br>torch, BaiduYun(84dh)
bartbart_base_chinese复旦fnlpfnlp/bart-base-chinese<br>v1.0fnlp/bart-base-chinese<br>fnlp/bart-base-chinese-v1.0
t5t5UERuer/t5-small-chinese-cluecorpussmall<br>uer/t5-base-chinese-cluecorpussmalluer/t5-base-chinese-cluecorpussmall<br>uer/t5-small-chinese-cluecorpussmall
mt5谷歌google/mt5-basegoogle/mt5-base
t5_pegasus追一科技Tongjilibo/chinese_t5_pegasus_small<br>Tongjilibo/chinese_t5_pegasus_base
chatyuanclue-aiClueAI/ChatYuan-large-v1<br>ClueAI/ChatYuan-large-v2ClueAI/ChatYuan-large-v1<br>ClueAI/ChatYuan-large-v2
PromptCLUEclue-aiClueAI/PromptCLUE-baseClueAI/PromptCLUE-base
chatglmChatGLM-6BTHUDMTHUDM/chatglm-6b<br>THUDM/chatglm-6b-int8<br>THUDM/chatglm-6b-int4<br>v0.1.0THUDM/chatglm-6b<br>THUDM/chatglm-6b-int8<br>THUDM/chatglm-6b-int4<br>THUDM/chatglm-6b-v0.1.0
ChatGLM2-6BTHUDMTHUDM/chatglm2-6b<br>THUDM/chatglm2-6b-int4<br>THUDM/chatglm2-6b-32kTHUDM/chatglm2-6b<br>THUDM/chatglm2-6b-int4<br>THUDM/chatglm2-6b-32k
ChatGLM3THUDMTHUDM/chatglm3-6b<br>THUDM/chatglm3-6b-32kTHUDM/chatglm3-6b<br>THUDM/chatglm3-6b-32k
GLM-4THUDMTHUDM/glm-4-9b<br>THUDM/glm-4-9b-chat<br>THUDM/glm-4-9b-chat-1m<br>THUDM/glm-4v-9b<br>THUDM/GLM-4-9B-0414<br>THUDM/GLM-Z1-9B-0414THUDM/glm-4-9b<br>THUDM/glm-4-9b-chat<br>THUDM/glm-4-9b-chat-1m<br>THUDM/glm-4v-9b<br><br>``<br>
llamallamametameta-llama/llama-7b<br>meta-llama/llama-13b
llama-2metameta-llama/Llama-2-7b-hf<br>meta-llama/Llama-2-7b-chat-hf<br>meta-llama/Llama-2-13b-hf<br>meta-llama/Llama-2-13b-chat-hfmeta-llama/Llama-2-7b-hf<br>meta-llama/Llama-2-7b-chat-hf<br>meta-llama/Llama-2-13b-hf<br>meta-llama/Llama-2-13b-chat-hf
llama-3metameta-llama/Meta-Llama-3-8B<br>meta-llama/Meta-Llama-3-8B-Instructmeta-llama/Meta-Llama-3-8B<br>meta-llama/Meta-Llama-3-8B-Instruct
llama-3.1metameta-llama/Meta-Llama-3.1-8B<br>meta-llama/Meta-Llama-3.1-8B-Instructmeta-llama/Meta-Llama-3.1-8B<br>meta-llama/Meta-Llama-3.1-8B-Instruct
llama-3.2metameta-llama/Llama-3.2-1B<br>meta-llama/Llama-3.2-1B-Instruct<br>meta-llama/Llama-3.2-3B<br>meta-llama/Llama-3.2-3B-Instructmeta-llama/Llama-3.2-1B<br>meta-llama/Llama-3.2-1B-Instruct<br>meta-llama/Llama-3.2-3B<br>meta-llama/Llama-3.2-3B-Instruct
llama-3.2-visionmetameta-llama/Llama-3.2-11B-Vision<br>meta-llama/Llama-3.2-11B-Vision-Instructmeta-llama/Llama-3.2-11B-Vision<br>meta-llama/Llama-3.2-11B-Vision-Instruct
llama-seriesChinese-LLaMA-AlpacaHFLhfl/chinese-alpaca-plus-lora-7b<br>hfl/chinese-llama-plus-lora-7b<br>(使用前需要合并lora权重)hfl/chinese-alpaca-plus-7b<br>hfl/chinese-llama-plus-7b
Chinese-LLaMA-Alpaca-2HFL待添加
Chinese-LLaMA-Alpaca-3HFL待添加
Belle_llamaLianjiaTechBelleGroup/BELLE-LLaMA-7B-2M-enc合成说明BelleGroup/BELLE-LLaMA-7B-2M-enc
ZiyaIDEA-CCNLIDEA-CCNL/Ziya-LLaMA-13B-v1<br>IDEA-CCNL/Ziya-LLaMA-13B-v1.1<br>IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1IDEA-CCNL/Ziya-LLaMA-13B-v1<br>IDEA-CCNL/Ziya-LLaMA-13B-v1.1
vicunalmsyslmsys/vicuna-7b-v1.5lmsys/vicuna-7b-v1.5
BaichuanBaichuanbaichuan-incbaichuan-inc/Baichuan-7B<br>baichuan-inc/Baichuan-13B-Base<br>baichuan-inc/Baichuan-13B-Chatbaichuan-inc/Baichuan-7B<br>baichuan-inc/Baichuan-13B-Base<br>baichuan-inc/Baichuan-13B-Chat
Baichuan2baichuan-incbaichuan-inc/Baichuan2-7B-Base<br>baichuan-inc/Baichuan2-7B-Chat<br>baichuan-inc/Baichuan2-13B-Base<br>baichuan-inc/Baichuan2-13B-Chatbaichuan-inc/Baichuan2-7B-Base<br>baichuan-inc/Baichuan2-7B-Chat<br>baichuan-inc/Baichuan2-13B-Base<br>baichuan-inc/Baichuan2-13B-Chat
YiYi01-ai01-ai/Yi-6B<br>01-ai/Yi-6B-200K<br>01-ai/Yi-9B<br>01-ai/Yi-9B-200K01-ai/Yi-6B<br>01-ai/Yi-6B-200K<br>01-ai/Yi-9B<br>01-ai/Yi-9B-200K
Yi-1.501-ai01-ai/Yi-1.5-6B<br>01-ai/Yi-1.5-6B-Chat<br>01-ai/Yi-1.5-9B<br>01-ai/Yi-1.5-9B-32K<br>01-ai/Yi-1.5-9B-Chat<br>01-ai/Yi-1.5-9B-Chat-16K01-ai/Yi-1.5-6B<br>01-ai/Yi-1.5-6B-Chat<br>01-ai/Yi-1.5-9B<br>01-ai/Yi-1.5-9B-32K<br>01-ai/Yi-1.5-9B-Chat<br>01-ai/Yi-1.5-9B-Chat-16K
bloombloombigsciencebigscience/bloom-560m<br>bigscience/bloomz-560mbigscience/bloom-560m<br>bigscience/bloomz-560m
QwenQwen阿里云Qwen/Qwen-1_8B<br>Qwen/Qwen-1_8B-Chat<br>Qwen/Qwen-7B<br>Qwen/Qwen-7B-Chat<br>Qwen/Qwen-14B<br>Qwen/Qwen-14B-ChatQwen/Qwen-1_8B<br>Qwen/Qwen-1_8B-Chat<br>Qwen/Qwen-7B<br>Qwen/Qwen-7B-Chat<br>Qwen/Qwen-14B<br>Qwen/Qwen-14B-Chat
Qwen1.5阿里云Qwen/Qwen1.5-0.5B<br>Qwen/Qwen1.5-0.5B-Chat<br>Qwen/Qwen1.5-1.8B<br>Qwen/Qwen1.5-1.8B-Chat<br>Qwen/Qwen1.5-7B<br>Qwen/Qwen1.5-7B-Chat<br>Qwen/Qwen1.5-14B<br>Qwen/Qwen1.5-14B-ChatQwen/Qwen1.5-0.5B<br>Qwen/Qwen1.5-0.5B-Chat<br>Qwen/Qwen1.5-1.8B<br>Qwen/Qwen1.5-1.8B-Chat<br>Qwen/Qwen1.5-7B<br>Qwen/Qwen1.5-7B-Chat<br>Qwen/Qwen1.5-14B<br>Qwen/Qwen1.5-14B-Chat
Qwen2阿里云Qwen/Qwen2-0.5B<br>Qwen/Qwen2-0.5B-Instruct<br>Qwen/Qwen2-1.5B<br>Qwen/Qwen2-1.5B-Instruct<br>Qwen/Qwen2-7B<br>Qwen/Qwen2-7B-InstructQwen/Qwen2-0.5B<br>Qwen/Qwen2-0.5B-Instruct<br>Qwen/Qwen2-1.5B<br>Qwen/Qwen2-1.5B-Instruct<br>Qwen/Qwen2-7B<br>Qwen/Qwen2-7B-Instruct
Qwen2-VL阿里云Qwen/Qwen2-VL-2B-Instruct<br>Qwen/Qwen2-VL-7B-InstructQwen/Qwen2-VL-2B-Instruct<br>Qwen/Qwen2-VL-7B-Instruct
Qwen2.5阿里云Qwen/Qwen2.5-0.5B<br>Qwen/Qwen2.5-0.5B-Instruct<br>Qwen/Qwen2.5-1.5B<br>Qwen/Qwen2.5-1.5B-Instruct<br>Qwen/Qwen2.5-3B<br>Qwen/Qwen2.5-3B-Instruct<br>Qwen/Qwen2.5-7B<br>Qwen/Qwen2.5-7B-Instruct<br>Qwen/Qwen2.5-14B<br>Qwen/Qwen2.5-14B-InstructQwen/Qwen2.5-0.5B<br>Qwen/Qwen2.5-0.5B-Instruct<br>Qwen/Qwen2.5-1.5B<br>Qwen/Qwen2.5-1.5B-Instruct<br>Qwen/Qwen2.5-3B<br>Qwen/Qwen2.5-3B-Instruct<br>Qwen/Qwen2.5-7B<br>Qwen/Qwen2.5-7B-Instruct<br>Qwen/Qwen2.5-14B<br>Qwen/Qwen2.5-14B-Instruct
Qwen2.5-VL阿里云Qwen/Qwen2.5-VL-3B-Instruct<br>Qwen/Qwen2.5-VL-7B-InstructQwen/Qwen2.5-VL-3B-Instruct<br>Qwen/Qwen2.5-VL-7B-Instruct
Qwen3阿里云Qwen/Qwen3-0.6B-Base<br>Qwen/Qwen3-0.6B<br>Qwen/Qwen3-0.6B-GPTQ-Int8<br>Qwen/Qwen3-1.7B-Base<br>Qwen/Qwen3-1.7B<br>Qwen/Qwen3-4B-Base<br>Qwen/Qwen3-4B<br>Qwen/Qwen3-4B-AWQ<br>Qwen/Qwen3-8B-Base<br>Qwen/Qwen3-8B<br>Qwen/Qwen3-14B-Base<br>Qwen/Qwen3-14B<br>Qwen/Qwen3-32B<br>Qwen/Qwen3-4B-Instruct-2507<br>Qwen/Qwen3-4B-Thinking-2507<br>Qwen/Qwen3-30B-A3B-Instruct-2507<br>Qwen/Qwen3-30B-A3B-Thinking-2507Qwen/Qwen3-0.6B-Base<br>Qwen/Qwen3-0.6B<br>Qwen/Qwen3-0.6B-GPTQ-Int8<br>Qwen/Qwen3-1.7B-Base<br>Qwen/Qwen3-1.7B<br>Qwen/Qwen3-4B-Base<br>Qwen/Qwen3-4B<br>Qwen/Qwen3-4B-AWQ<br>Qwen/Qwen3-8B-Base<br>Qwen/Qwen3-8B<br>Qwen/Qwen3-14B-Base<br>Qwen/Qwen3-14B<br>Qwen/Qwen3-32B<br>Qwen/Qwen3-4B-Instruct-2507<br>Qwen/Qwen3-4B-Thinking-2507<br>Qwen/Qwen3-30B-A3B-Instruct-2507<br>Qwen/Qwen3-30B-A3B-Thinking-2507
Qwen3-Embedding阿里云Qwen/Qwen3-Embedding-0.6B<br>Qwen/Qwen3-Embedding-4B<br>Qwen/Qwen3-Embedding-8BQwen3-Embedding-0.6B<br>Qwen3-Embedding-4B<br>Qwen3-Embedding-8B<br>
Qwen3-Reranker阿里云Qwen/Qwen3-Reranker-0.6B<br>Qwen/Qwen3-Reranker-4B<br>Qwen/Qwen3-Reranker-8BQwen/Qwen3-Reranker-0.6B<br>Qwen/Qwen3-Reranker-4B<br>Qwen/Qwen3-Reranker-8B
InternLMInternLM上海人工智能实验室internlm/internlm-7b<br>internlm/internlm-chat-7binternlm/internlm-7b<br>internlm/internlm-chat-7b
InternLM2上海人工智能实验室internlm/internlm2-1_8b<br>internlm/internlm2-chat-1_8b<br>internlm/internlm2-7b<br>internlm/internlm2-chat-7b<br>internlm/internlm2-20b<br>internlm/internlm2-chat-20binternlm/internlm2-1_8b<br>internlm/internlm2-chat-1_8b<br>internlm/internlm2-7b<br>internlm/internlm2-chat-7b
InternLM2.5上海人工智能实验室internlm/internlm2_5-7b<br>internlm/internlm2_5-7b-chat<br>internlm/internlm2_5-7b-chat-1minternlm/internlm2_5-7b<br>internlm/internlm2_5-7b-chat<br>internlm/internlm2_5-7b-chat-1m
InternLM3上海人工智能实验室internlm/internlm3-8b-instructinternlm/internlm3-8b-instruct
InternVLInternVL 1.0-1.5上海人工智能实验室OpenGVLab/Mini-InternVL-Chat-4B-V1-5<br>OpenGVLab/Mini-InternVL-Chat-2B-V1-5待添加
InternVL 2.0上海人工智能实验室OpenGVLab/InternVL2-1B<br>OpenGVLab/InternVL2-2B<br>OpenGVLab/InternVL2-4B<br>OpenGVLab/InternVL2-8B待添加
InternVL 2.5上海人工智能实验室OpenGVLab/InternVL2_5-1B<br>OpenGVLab/InternVL2_5-2B<br>OpenGVLab/InternVL2_5-4B<br>OpenGVLab/InternVL2_5-8BOpenGVLab/InternVL2_5-1B<br>待添加<br>待添加<br>待添加
FalconFalcontiiuaetiiuae/falcon-rw-1b<br>tiiuae/falcon-7b<br>tiiuae/falcon-7b-instructtiiuae/falcon-rw-1b<br>tiiuae/falcon-7b<br>tiiuae/falcon-7b-instruct
DeepSeekDeepSeek-MoE深度求索deepseek-ai/deepseek-moe-16b-base<br>deepseek-ai/deepseek-moe-16b-chatdeepseek-ai/deepseek-moe-16b-base<br>deepseek-ai/deepseek-moe-16b-chat
DeepSeek-LLM深度求索deepseek-ai/deepseek-llm-7b-base<br>deepseek-ai/deepseek-llm-7b-chatdeepseek-ai/deepseek-llm-7b-base<br>deepseek-ai/deepseek-llm-7b-chat
DeepSeek-V2深度求索deepseek-ai/DeepSeek-V2-Lite<br>deepseek-ai/DeepSeek-V2-Lite-Chatdeepseek-ai/DeepSeek-V2-Lite<br>deepseek-ai/DeepSeek-V2-Lite-Chat
DeepSeek-Coder深度求索deepseek-ai/deepseek-coder-1.3b-base<br>deepseek-ai/deepseek-coder-1.3b-instruct<br>deepseek-ai/deepseek-coder-6.7b-base<br>deepseek-ai/deepseek-coder-6.7b-instruct<br>deepseek-ai/deepseek-coder-7b-base-v1.5<br>deepseek-ai/deepseek-coder-7b-instruct-v1.5deepseek-ai/deepseek-coder-1.3b-base<br>deepseek-ai/deepseek-coder-1.3b-instruct<br>deepseek-ai/deepseek-coder-6.7b-base<br>deepseek-ai/deepseek-coder-6.7b-instruct<br>deepseek-ai/deepseek-coder-7b-base-v1.5<br>deepseek-ai/deepseek-coder-7b-instruct-v1.5
DeepSeek-Coder-V2深度求索deepseek-ai/DeepSeek-Coder-V2-Lite-Base<br>deepseek-ai/DeepSeek-Coder-V2-Lite-Instructdeepseek-ai/DeepSeek-Coder-V2-Lite-Base<br>deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
DeepSeek-Math深度求索deepseek-ai/deepseek-math-7b-base<br>deepseek-ai/deepseek-math-7b-instruct<br>deepseek-ai/deepseek-math-7b-rldeepseek-ai/deepseek-math-7b-base<br>deepseek-ai/deepseek-math-7b-instruct<br>deepseek-ai/deepseek-math-7b-rl
DeepSeek-R1深度求索deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B<br>deepseek-ai/DeepSeek-R1-Distill-Qwen-7B<br>deepseek-ai/DeepSeek-R1-Distill-Llama-8B<br>deepseek-ai/DeepSeek-R1-Distill-Qwen-14B<br>deepseek-ai/DeepSeek-R1-Distill-Qwen-32B<br>deepseek-ai/DeepSeek-R1-0528-Qwen3-8Bdeepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B<br>deepseek-ai/DeepSeek-R1-Distill-Qwen-7B<br>deepseek-ai/DeepSeek-R1-Distill-Llama-8B<br>deepseek-ai/DeepSeek-R1-Distill-Qwen-14B<br>deepseek-ai/DeepSeek-R1-Distill-Qwen-32B<br>deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
Seed-OSSSeed-OSSByteDanceByteDance-Seed/Seed-OSS-36B-Instruct<br>ByteDance-Seed/Seed-OSS-36B-Base<br>ByteDance-Seed/Seed-OSS-36B-Base-woSyn
Ernie4_5Ernie4_5百度baidu/ERNIE-4.5-0.3B-Base-PT<br>baidu/ERNIE-4.5-0.3B-PT<br>baidu/ERNIE-4.5-21B-A3B-Base-PT<br>baidu/ERNIE-4.5-21B-A3B-PT<br>baidu/ERNIE-4.5-VL-28B-A3B-Base-PT<br>baidu/ERNIE-4.5-VL-28B-A3B-PTbaidu/ERNIE-4.5-0.3B-Base-PT<br>baidu/ERNIE-4.5-0.3B-PT<br><br>``<br><br>``<br>
MiniCPMMiniCPMOpenBMBopenbmb/MiniCPM-2B-sft-bf16<br>openbmb/MiniCPM-2B-dpo-bf16<br>openbmb/MiniCPM-2B-128k<br>openbmb/MiniCPM-1B-sft-bf16<br>openbmb/MiniCPM3-4B<br>openbmb/MiniCPM4-0.5B<br>openbmb/MiniCPM4-8Bopenbmb/MiniCPM-2B-sft-bf16<br>openbmb/MiniCPM-2B-dpo-bf16<br>openbmb/MiniCPM-2B-128k<br>openbmb/MiniCPM-1B-sft-bf16<br>待添加<br>待添加<br>待添加
MiniCPM-oOpenBMBopenbmb/MiniCPM-Llama3-V-2_5<br>openbmb/MiniCPM-V-2_6<br>openbmb/MiniCPM-o-2_6<br>openbmb/MiniCPM-V-4openbmb/MiniCPM-Llama3-V-2_5<br>openbmb/MiniCPM-V-2_6<br>待添加<br>待添加
embeddingtext2vec-base-chineseshibing624shibing624/text2vec-base-chineseshibing624/text2vec-base-chinese
m3emoka-aimoka-ai/m3e-basemoka-ai/m3e-base
bgeBAAIBAAI/bge-large-en-v1.5<br>BAAI/bge-large-zh-v1.5<br>BAAI/bge-base-en-v1.5<br>BAAI/bge-base-zh-v1.5<br>BAAI/bge-small-en-v1.5<br>BAAI/bge-small-zh-v1.5BAAI/bge-large-en-v1.5<br>BAAI/bge-large-zh-v1.5<br>BAAI/bge-base-en-v1.5<br>BAAI/bge-base-zh-v1.5<br>BAAI/bge-small-en-v1.5<br>BAAI/bge-small-zh-v1.5
gtethenlperthenlper/gte-large-zh<br>thenlper/gte-base-zhthenlper/gte-base-zh<br>thenlper/gte-large-zh

*注:

  • 高亮格式(如 bert-base-chinese)的表示可直接 build_transformer_model()联网下载

  • 国内镜像网站加速下载

    • HF_ENDPOINT=https://hf-mirror.com python your_script.py
    • export HF_ENDPOINT=https://hf-mirror.com后再执行python代码
    • 在python代码开头如下设置
    import os
    os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"
    

6. 鸣谢

  • 感谢苏神实现的bert4keras,本实现有不少地方参考了bert4keras的源码,在此衷心感谢大佬的无私奉献;
  • 其次感谢项目bert4pytorch,也是在该项目的指引下给了我用pytorch来复现bert4keras的想法和思路。

7. 引用

@misc{bert4torch,
  title={bert4torch},
  author={Bo Li},
  year={2022},
  howpublished={\url{https://github.com/Tongjilibo/bert4torch}},
}

8. 其他

  • Wechat & Star History Chart
  • 微信群人数超过200个(有邀请限制),可添加个人微信拉群
pic
微信号
pic
微信群
pic
Star History Chart

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts