
Research
Malicious npm Packages Impersonate Flashbots SDKs, Targeting Ethereum Wallet Credentials
Four npm packages disguised as cryptographic tools steal developer credentials and send them to attacker-controlled Telegram infrastructure.
llama-index-llms-llama-cpp
Advanced tools
To get the best performance out of LlamaCPP
, it is recommended to install the package so that it is compiled with GPU support. A full guide for installing this way is here.
Full MACOS instructions are also here.
In general:
CuBLAS
if you have CUDA and an NVidia GPUMETAL
if you are running on an M1/M2 MacBookCLBLAST
if you are running on an AMD/Intel GPUThem, install the required llama-index packages:
pip install llama-index-embeddings-huggingface
pip install llama-index-llms-llama-cpp
Set up the model URL and initialize the LlamaCPP LLM:
from llama_index.llms.llama_cpp import LlamaCPP
from transformers import AutoTokenizer
model_url = "https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/resolve/main/qwen2.5-7b-instruct-q3_k_m.gguf"
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
def messages_to_prompt(messages):
messages = [{"role": m.role.value, "content": m.content} for m in messages]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
return prompt
def completion_to_prompt(completion):
messages = [{"role": "user", "content": completion}]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
return prompt
llm = LlamaCPP(
# You can pass in the URL to a GGML model to download it automatically
model_url=model_url,
# optionally, you can set the path to a pre-downloaded model instead of model_url
model_path=None,
temperature=0.1,
max_new_tokens=256,
# llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
context_window=16384,
# kwargs to pass to __call__()
generate_kwargs={},
# kwargs to pass to __init__()
# set to at least 1 to use GPU
model_kwargs={"n_gpu_layers": -1},
# transform inputs into Llama2 format
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=True,
)
Use the complete
method to generate a response:
response = llm.complete("Hello! Can you tell me a poem about cats and dogs?")
print(response.text)
You can also stream completions for a prompt:
response_iter = llm.stream_complete("Can you write me a poem about fast cars?")
for response in response_iter:
print(response.delta, end="", flush=True)
Change the global tokenizer to match the LLM:
from llama_index.core import set_global_tokenizer
from transformers import AutoTokenizer
set_global_tokenizer(
AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct").encode
)
Set up the embedding model and load documents:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
documents = SimpleDirectoryReader(
"../../../examples/paul_graham_essay/data"
).load_data()
Create a vector store index from the loaded documents:
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
Set up the query engine with the LlamaCPP LLM:
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What did the author do growing up?")
print(response)
https://docs.llamaindex.ai/en/stable/examples/llm/llama_cpp/
FAQs
llama-index llms llama cpp integration
We found that llama-index-llms-llama-cpp demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Four npm packages disguised as cryptographic tools steal developer credentials and send them to attacker-controlled Telegram infrastructure.
Security News
Ruby maintainers from Bundler and rbenv teams are building rv to bring Python uv's speed and unified tooling approach to Ruby development.
Security News
Following last week’s supply chain attack, Nx published findings on the GitHub Actions exploit and moved npm publishing to Trusted Publishers.