nlp-synt-data

Synthesized Data for NLP Tasks

0.0.11
PyPI

Maintainers: 1

nlp-synt-data

Synthetic Data Tools for Natural Language Processing (NLP) and Large Language Models (LLM) tasks

generate prompts (and prompt ids)
generate synthetic data (and data ids)
retrieve prompts and data from ids (to reduce generated dataset size)

Installation

pip install nlp-synt-data

Quickstart

An example of this library with ollama

from nlp_synt_data import *
import ollama

# generate prompts
prompts_dict = {
    "a": ["promptA0", "promptA1"],
    "b": ["promptB0", "promptB1"],
    "c": ["promptC0", "promptC1"],
    "d": ["promptD0", "promptD1"],
    "e": ["promptE0", "promptE1"],
}
prompts = PromptGenerator.generate(prompts_dict, [["c","e"],["a","b","d"]])

# generate texts
texts_with_keys = [
    ("[PERSON]","label0"),
    ("[PERSON] is working as a [JOB] in [POS]","label1"),
    ]
substitutions = {
    "JOB": [("job0","labeljob0"), ("job1","labeljob1")],
    "PERSON": [("person0","labelperson0"), ("person1","labelperson1")],
    "POS": [("pos0","labelpos0"), ("pos1","labelpos1")]
}
texts = DataGenerator.generate(texts_with_keys, substitutions)

# generate responses
model_func = lambda prompt, text: ollama.chat(model='llama3:instruct', messages=[
                { 'role': 'system', 'content': prompt, },
                { 'role': 'user', 'content': text, },
            ])['message']['content']
ResponseGenerator.generate("results.csv", texts, prompts, model_func)

results.csv

prompt_id	text_id	text_labels	response	text_PERSON_value	text_JOB_value	text_POS_value	text_PERSON_label	text_JOB_label	text_POS_label
c#0_e#0	t#0_PERSON#0	label0	response	person0			labelperson0
c#0_e#0	t#0_PERSON#1	label0	response	person1			labelperson1
c#0_e#0	t#1_JOB#0_PERSON#0_POS#0	label1	response	person0	job0	pos0	labelperson0	labeljob0	labelpos0
c#0_e#0	t#1_JOB#0_PERSON#0_POS#1	label1	response	person0	job0	pos1	labelperson0	labeljob0	labelpos1
c#0_e#0	t#1_JOB#0_PERSON#1_POS#0	label1	response	person1	job0	pos0	labelperson1	labeljob0	labelpos0
...	...	...	...	...	...	...	...	...	...

FAQs

What is nlp-synt-data?

Is nlp-synt-data well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

nlp-synt-data

Installation

Quickstart

Related posts

Threat Actor Exposes Playbook for Exploiting npm to Build Blockchain-Powered Botnets

NVD Backlog Tops 20,000 CVEs Awaiting Analysis as NIST Prepares System Updates