Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
useful utilities for prompt engineering
pip install promptools # or any other dependency manager you like
Note that the validation features use pydantic>=2
as an optional dependencies. You can use pip install promptools[validation]
to install it by the way.
extract_json
parse JSON from raw LLM response text
Detect and parse the last JSON block from input string.
def extract_json(text: str, /, fallback: F) -> JSON | F:
It will return fallback
if it fails to detect / parse JSON.
Note that the default value of fallback
is None
.
def extract_json(text: str, /, fallback: F, expect: Type[M]) -> M | F:
You can provide a pydantic.BaseModel
or a TypeAlias
in the expect
parameter and pydantic
will validate it.
Imagine that you are using LLM on a classification task.
from promptools.extractors import extract_json
from typing import TypedDict
class Item(TypedDict):
index: int
label: str
original_text = """
The result is:
```json
[
{"index": 0, "label": "A"},
{"index": 1, "label": "B"}
]
```
"""
print(extract_json(original_text, [], list[Item]))
The output will be:
[{'index': 0, 'label': 'A'}, {'index': 1, 'label': 'B'}]
Imagine that you are trying to parse a malformed JSON:
from promptools.extractors import extract_json
from pydantic import BaseModel
original_text = '{"results": [{"index": 1}, {'
print(extract_json(original_text))
The output will be:
{'results': [{'index': 1}, {}]}
count_token
count number of tokens in prompt
def count_token(prompt: str | list[str], enc: Encoding | None = None) -> int:
Provide your prompt / a list of prompts, get its token count. The second parameter is the tiktoken.Encoding
instance, will default to get_encoding("cl100k_base")
if not provided. The default tiktoken.Encoding
instance is cached, and will not be re-created every time.
def count_token(prompt: dict | list[dict], enc: Encoding | None = None) -> int:
Note that it can also be a single message / a list of messages. Every message should be a dict in the schema below:
class Message(TypedDict):
role: str
content: str
name: NotRequired[str]
from tiktoken import encoding_for_model
from promptools.openai import count_token
print(count_token("hi", encoding_for_model("gpt-3.5-turbo")))
The output will be:
1
from promptools.openai import count_token
print(count_token(["hi", "hello"]))
The output will be:
2
from promptools.openai import count_token
count_token({"role": "user", "content": "hi"})
The output will be:
5
from promptools.openai import count_token
count_token([
{"role": "user", "content": "hi"},
{"role": "assistant", "content": "Hello! How can I assist you today?"},
])
The output will be:
21
promplate
from promplate.prompt.chat import U, A, S
from promptools.openai import count_token
count_token([
S @ "background" > "You are a helpful assistant.",
U @ "example_user" > "hi",
A @ "example_assistant" > "Hello! How can I assist you today?",
])
The output will be:
40
FAQs
useful utilities for prompt engineering
We found that promptools demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.