Security News
tea.xyz Spam Plagues npm and RubyGems Package Registries
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Readme
# 安装基本功能
pip install nlp-data
# 安装全部功能
pip install nlp-data[all]
pip install nlp-data --upgrade -i http://192.168.130.5:5002/simple/ --trusted-host 192.168.130.5 --extra-index-url https://mirrors.aliyun.com/pypi/simple
# Store相当于是S3对象存储的一个Bucket的封装,每个数据类型对应一个Bucket
from nlp_data import NLUDocStore
# 查看文档
NLUDocStore.list()
# 获取文档
docs = NLUDocStore.pull('xxx')
# 推送文档
NLUDocStore.push(docs=docs, name='xxx')
# Doc是nlp-data的一个存储结构,可以用来存储该格式的数据,以及对数据进行一些操作
# DocList是Doc的集合,可以用来存储多个Doc,相当于一个python List,有几本的append,extend等类方法, 但不同的DocList有特定的方法用来处理# 该数据类型
# 以NLUDoc为例,该文档里面有domain,slots,intention等字段,可以用来存储NLU的结果
from nlp_data import NLUDoc, NLUDocList
# 创建一个NLUDoc
doc = NLUDoc(text='添加明天上午跟张三开会的提醒')
doc.set_domain('schedule_cmn')
doc.set_intention('add_schedule')
doc.set_slot(text='明天上午', label='date')
doc.set_slot(text='跟张三开会', label='title')
# 创建一个NLUDocList,并添加doc
docs = NLUDocList()
docs.append(doc)
# 从abnf句式输出文件中批量初始化
docs = NLUDocList.from_abnf_output(output_dir='your/dir', domain='schedule_cmn')
# 上传到bucket
from nlp_data import NLUDocStore
NLUDocStore.push(docs=docs, name='xxx')
# Augmentor是nlp-data的一个数据增强工具,可以用来对数据进行增强
from nlp_data import GPTAugmentor, NLUDocStore, DialogueDocList, DialogueDoc
# 创建一个Augmentor
augmentor = GPTAugmentor(api_key='xxx')
# 广东话或者四川话增强NLUDoc
docs = NLUDocStore.pull('xxx')
aug_docs = augmentor.augment_nlu_by_localism(docs, '广东话')
# 根据主题和情景生成多轮对话
dialogue_docs = augmentor.generate_dialogue_docs(theme='添加日程', situation='用户正在驾驶车辆与车机系统丰田进行语音交互')
# 对多轮对话数据增强
dialogue_docs = DialogueDocList()
dialogue_docs.quick_add(theme='添加日程', situation='用户正在驾驶车辆与车机系统丰田进行交互', conversation=['你好,丰田', '在呢,有什么可以帮助你的', '我要添加一个明天上午跟张三开会的日程', '好的已为您添加成功'])
aug_dialogue_docs = augmentor.augment_dialogue(dialogue_docs)
s3是基础的S3对象存储的封装,可以用来创建bucket,上传下载文件等
# 初始化
s3 = S3Storage()
# 列出所有bucket
s3.list_buckets()
# 创建bucket
s3.create_bucket('test')
# 列出bucket下所有文件
s3.list_files('test')
# 上传文件
s3.upload_file(file_path='./test.txt', bucket_name='test')
# 下载文件
s3.download_file(object_name='./test.txt', bucket_name='test')
# 删除文件
s3.delete_file(bucket_name='test', file_name='test.txt')
# 上传文件夹
s3.upload_dir(bucket_name='test', dir='./tests')
# 下载文件夹
s3.download_dir(bucket_name='test', object_name='./tests', save_dir='./')
# 删除文件夹
s3.delete_dir(bucket_name='test', dir_name='tests')
# 删除bucket
s3.delete_bucket('test')
# 查看帮助
nlp-data --help
# 下载文件,当xxx为一个s3中的文件夹时,会下载该文件夹下所有文件
nlp-data download xxx.xxx --bucket xxx --save_path xxx
# 上传文件, 当xxx为一个文件夹时,会上传该文件夹下所有文件
nlp-data upload xxx --bucket xxx
# 删除文件, 当xxx为一个文件夹时,会删除该文件夹下所有文件
nlp-data delete xxx --bucket xxx
examples文件夹下有一些示例代码,可以参考,下面是翻译中文nlu文档然后保存英文nlu的示例
python examples/translate_nlu.py --api_key xxx --doc_name schedule/train --save_name schedule/train --num_samples 5000
上述代码将nlu bucket里面的schedule/train文档翻译成英文,nlu-en bucket中
FAQs
普强内部NLP数据存储分享处理工具
We found that nlp-data demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.
Security News
UnitedHealth Group disclosed that the ransomware attack on Change Healthcare compromised protected health information for millions in the U.S., with estimated costs to the company expected to reach $1 billion.