Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
For those who don't know, llama.cpp
is a port of Facebook's LLaMA model in pure C/C++:
- Without dependencies
- Apple silicon first-class citizen - optimized via ARM NEON
- AVX2 support for x86 architectures
- Mixed F16 / F32 precision
- 4-bit quantization support
- Runs on the CPU
pip install pyllamacpp
However, the compilation process of llama.cpp
is taking into account the architecture of the target CPU
,
so you might need to build it from source:
pip install git+https://github.com/abdeladim-s/pyllamacpp.git
:warning: Note
This PR introduced some breaking changes.
If you want to use older models, use version 2.2.0
:
pip install pyllamacpp==2.2.0
You can run the following simple command line interface to test the package once it is installed:
pyllamacpp path/to/model.bin
pyllamacpp -h
usage: pyllamacpp [-h] [--n_ctx N_CTX] [--n_parts N_PARTS] [--seed SEED] [--f16_kv F16_KV] [--logits_all LOGITS_ALL]
[--vocab_only VOCAB_ONLY] [--use_mlock USE_MLOCK] [--embedding EMBEDDING] [--n_predict N_PREDICT] [--n_threads N_THREADS]
[--repeat_last_n REPEAT_LAST_N] [--top_k TOP_K] [--top_p TOP_P] [--temp TEMP] [--repeat_penalty REPEAT_PENALTY]
[--n_batch N_BATCH]
model
This is like a chatbot, You can start the conversation with `Hi, can you help me ?` Pay attention though that it may hallucinate!
positional arguments:
model The path of the model file
options:
-h, --help show this help message and exit
--n_ctx N_CTX text context
--n_parts N_PARTS
--seed SEED RNG seed
--f16_kv F16_KV use fp16 for KV cache
--logits_all LOGITS_ALL
the llama_eval() call computes all logits, not just the last one
--vocab_only VOCAB_ONLY
only load the vocabulary, no weights
--use_mlock USE_MLOCK
force system to keep model in RAM
--embedding EMBEDDING
embedding mode only
--n_predict N_PREDICT
Number of tokens to predict
--n_threads N_THREADS
Number of threads
--repeat_last_n REPEAT_LAST_N
Last n tokens to penalize
--top_k TOP_K top_k
--top_p TOP_P top_p
--temp TEMP temp
--repeat_penalty REPEAT_PENALTY
repeat_penalty
--n_batch N_BATCH batch size for prompt processing
from pyllamacpp.model import Model
model = Model(model_path='/path/to/model.bin')
for token in model.generate("Tell me a joke ?\n"):
print(token, end='', flush=True)
You can set up an interactive dialogue by simply keeping the model
variable alive:
from pyllamacpp.model import Model
model = Model(model_path='/path/to/model.bin')
while True:
try:
prompt = input("You: ", flush=True)
if prompt == '':
continue
print(f"AI:", end='')
for token in model.generate(prompt):
print(f"{token}", end='', flush=True)
print()
except KeyboardInterrupt:
break
The following is an example showing how to "attribute a persona to the language model" :
from pyllamacpp.model import Model
prompt_context = """Act as Bob. Bob is helpful, kind, honest,
and never fails to answer the User's requests immediately and with precision.
User: Nice to meet you Bob!
Bob: Welcome! I'm here to assist you with anything you need. What can I do for you today?
"""
prompt_prefix = "\nUser:"
prompt_suffix = "\nBob:"
model = Model(model_path='/path/to/model.bin',
n_ctx=512,
prompt_context=prompt_context,
prompt_prefix=prompt_prefix,
prompt_suffix=prompt_suffix)
while True:
try:
prompt = input("User: ")
if prompt == '':
continue
print(f"Bob: ", end='')
for token in model.generate(prompt,
antiprompt='User:',
n_threads=6,
n_batch=1024,
n_predict=256,
n_keep=48,
repeat_penalty=1.0, ):
print(f"{token}", end='', flush=True)
print()
except KeyboardInterrupt:
break
from pyllamacpp.langchain_llm import PyllamacppLLM
llm = PyllamacppLLM(
model="path/to/ggml/model",
temp=0.75,
n_predict=50,
top_p=1,
top_k=40
)
template = "\n\n##Instruction:\n:{question}\n\n##Response:\n"
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What are large language models?"
answer = llm_chain.run(question)
print(answer)
All models supported by llama.cpp
should be supported basically:
Supported models:
For advanced users, you can access the llama.cpp C-API functions directly to make your own logic.
All functions from llama.h
are exposed with the binding module _pyllamacpp
.
You can check the API reference documentation for more details.
If you find any bug, please open an issue.
If you have any feedback, or you want to share how you are using this project, feel free to use the Discussions and open a new topic.
This project is licensed under the same license as llama.cpp (MIT License).
FAQs
Python bindings for llama.cpp
We found that pyllamacpp demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.