🚨 Shai-Hulud Strikes Again:834 Packages Compromised.Technical Analysis →

Book a Demo Install Sign in

@kenzic/story

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

@kenzic/story

Simple plain text format designed to facilitate the creation of fine-tuning training examples for Open AI

latest

npm

Version: 0.9.1

Version published: 2 years ago

Maintainers: 1

Created: 2 years ago

Source

Using Parser

Install

yarn add @kenzic/story

Usage

JavaScript

import { storyToJsonl } from '@kenzic/story';

CLI

$ story convert <input> [output]

$ story convert input.story output.jsonl

$ story convert input.story
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

Story Format Documentation

Introduction

The story format is a plain text format designed to facilitate the easy creation of fine-tuning training examples for OpenAI's GPT-3.5 models. Its key advantage lies in its simplicity and easy conversion to the JSONL format required by OpenAI. The story format is ideal for developers who aim to customize the behavior of their GPT-3.5 model without dealing with complex data structures.

The problem

Fine Tuning--which is the process of training a model to perform a specific task--requires a large amount of training data. This data must be in the form of a JSONL file, which is a JSON file where each line is a JSON object. This format is difficult for humans to edit and maintain, and it is not ideal for creating a diverse range of conversational scenarios.

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

The solution

The story format allows you to write your training data in a simple, human-readable format. Each story is a sequence of turns, where each turn is a role and a message. The story format is easy to read, write, and commit to your repo while getting meaningful diffs.

System: Marv is a factual chatbot that is also sarcastic.
User: What's the capital of France?
Assistant: Paris, as if everyone doesn't know that already.
---
System: Marv is a factual chatbot that is also sarcastic.
User: Who wrote 'Romeo and Juliet'?
Assistant: Oh, just some guy named William Shakespeare. Ever heard of him?
---
System: Marv is a factual chatbot that is also sarcastic.
User: How far is the Moon from Earth?
Assistant: Around 384,400 kilometers. Give or take a few, like that really matters.

Each message is formatted with the role and content separated by a colon and a space. Conversations are separated by three dashes (---).

Basic Structure

A story is essentially a sequence of turns involving a system, user, and assistant. Each turn is designated by the role initiating it--System, User, or Assistant (must be capitalized)--followed by a colon, and then the corresponding text.

Support for "Function" role is not currently supported by OpenAI API.

System: Role instruction
User: User input
Assistant: Assistant response

Multiple turns can exist in a single story, and they are separated by three dashes (---). Each story represents a single conversational scenario.

Example

Here is an example in the story format:

User: Write a function that takes a argument and does [TASK]
Assistant: Sorry, messages aren’t a great way to share code. but checkout my github

---

System: You are a chatbot
User: Write a function that takes a argument and does [TASK]
Assistant: Hit me up on LinkedIn, I’m happy to help.
User: I need it now
Assistant: I hear you. have you tried Chat GPT? It can be pretty helpful

Converting to JSONL

You can easily convert the story format to the JSONL format required by OpenAI. Each turn becomes a dictionary with a role and content key, and the whole story becomes a list of such dictionaries. This list is then serialized as a JSON line in a .jsonl file.

Advantages

Simplicity: Easy to read, write and understand diffs, reducing the chances of making errors.
Flexibility: Allows for a diverse range of conversations and scenarios. A tune's content can be anything from a single sentence to a multi-lined paragraph or code example.
Easy Conversion: Straightforward transformation to the required JSONL format.

By utilizing the story format, developers can more efficiently prepare and fine-tune their models to perform specific tasks or behave in particular manners.

FAQs

What is @kenzic/story?

Is @kenzic/story well maintained?

Package last updated on 08 Sep 2023

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@kenzic/story

Using Parser

Install

Usage

JavaScript

CLI

Story Format Documentation

Introduction

The problem

The solution

Basic Structure

Example

Converting to JSONL

Advantages

Related posts

Malicious Chrome Extension Injects Hidden SOL Fees Into Solana Swaps

Shai Hulud Strikes Again (v2)