Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

llama-cpp-cffi

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

llama-cpp-cffi

Python binding for llama.cpp using cffi

0.2.2
PyPI

Maintainers: 2

llama-cpp-cffi

Python binding for llama.cpp using cffi. Supports CPU, Vulkan 1.x and CUDA 12.6 runtimes, x86_64 and aarch64 platforms.

NOTE: Currently supported operating system is Linux (manylinux_2_28 and musllinux_1_2), but we are working on both Windows and MacOS versions.

News

Dec 9 2024, v0.2.0: Support for low-level and high-level APIs: llama, llava, clip and ggml API
Nov 27 2024, v0.1.22: Support for Multimodal models such as llava and minicpmv.

Install

Basic library install:

pip install llama-cpp-cffi

IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .

GPU Compute Capability: compute_61, compute_70, compute_75, compute_80, compute_86, compute_89 covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.

LLM Example

from llama import Model


#
# first define and load/init model
#
model = Model(
    creator_hf_repo='HuggingFaceTB/SmolLM2-1.7B-Instruct',
    hf_repo='bartowski/SmolLM2-1.7B-Instruct-GGUF',
    hf_file='SmolLM2-1.7B-Instruct-Q4_K_M.gguf',
)

model.init(ctx_size=8192, predict=1024, gpu_layers=99)

#
# messages
#
messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': '1 + 1 = ?'},
    {'role': 'assistant', 'content': '2'},
    {'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]

for chunk in model.completions(messages=messages, temp=0.7, top_p=0.8, top_k=100):
    print(chunk, flush=True, end='')

#
# prompt
#
for chunk in model.completions(prompt='Evaluate 1 + 2 in Python. Result in Python is', temp=0.7, top_p=0.8, top_k=100):
    print(chunk, flush=True, end='')

VLM Example

from llama import Model


#
# first define and load/init model
#
model = Model( # 1.87B
    creator_hf_repo='vikhyatk/moondream2',
    hf_repo='vikhyatk/moondream2',
    hf_file='moondream2-text-model-f16.gguf',
    mmproj_hf_file='moondream2-mmproj-f16.gguf',
)

model.init(ctx_size=8192, predict=1024, gpu_layers=99)

#
# prompt
#
for chunk in model.completions(prompt='Describe this image.', image='examples/llama-1.png'):
    print(chunk, flush=True, end='')

References

examples/llm.py
examples/vlm.py

FAQs

What is llama-cpp-cffi?

Is llama-cpp-cffi well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

llama-cpp-cffi

llama-cpp-cffi

News

Install

LLM Example

VLM Example

References

Related posts

Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm

Malicious npm Package Typosquats Popular TypeScript ESLint Plugin, Exfiltrates Data and Enables Remote Exploitation