Security News
pnpm 10.0.0 Blocks Lifecycle Scripts by Default
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.
llama-index-packs-code-hierarchy
Advanced tools
A node parser which can create a hierarchy of all code scopes in a directory.
# install
pip install llama-index-packs-code-hierarchy
# download source code
llamaindex-cli download-llamapack CodeHierarchyAgentPack -d ./code_hierarchy_pack
The CodeHierarchyAgentPack
is useful to split long code files into more reasonable chunks, while creating an agent on top to navigate the code. What this will do is create a "Hierarchy" of sorts, where sections of the code are made more reasonable by replacing the scope body with short comments telling the LLM to search for a referenced node if it wants to read that context body.
Nodes in this hierarchy will be split based on scope, like function, class, or method scope, and will have links to their children and parents so the LLM can traverse the tree.
from llama_index.core.text_splitter import CodeSplitter
from llama_index.llms.openai import OpenAI
from llama_index.packs.code_hierarchy import (
CodeHierarchyAgentPack,
CodeHierarchyNodeParser,
)
llm = OpenAI(model="gpt-4", temperature=0.2)
documents = SimpleDirectoryReader(
input_files=[
Path("../llama_index/packs/code_hierarchy/code_hierarchy.py")
],
file_metadata=lambda x: {"filepath": x},
).load_data()
split_nodes = CodeHierarchyNodeParser(
language="python",
# You can further parameterize the CodeSplitter to split the code
# into "chunks" that match your context window size using
# chunck_lines and max_chars parameters, here we just use the defaults
code_splitter=CodeSplitter(
language="python", max_chars=1000, chunk_lines=10
),
).get_nodes_from_documents(documents)
pack = CodeHierarchyAgentPack(split_nodes=split_nodes, llm=llm)
pack.run(
"How does the get_code_hierarchy_from_nodes function from the code hierarchy node parser work? Provide specific implementation details."
)
A full example can be found here in combination with `.
The pack contains a CodeHierarchyKeywordQueryEngine
that uses a CodeHierarchyNodeParser
to generate a map of a repository's structure and contents. This is useful for the LLM to understand the structure of a codebase, and to be able to reference specific files or directories.
For example:
You can create a tool for any agent using the nodes from the node parser:
from llama_index.agent.openai import OpenAIAgent
from llama_index.core.tools import QueryEngineTool
from llama_index.packs.code_hierarchy import CodeHierarchyKeywordQueryEngine
query_engine = CodeHierarchyKeywordQueryEngine(
nodes=split_nodes,
)
tool = QueryEngineTool.from_defaults(
query_engine=query_engine,
name="code_lookup",
description="Useful for looking up information about the code hierarchy codebase.",
)
agent = OpenAIAgent.from_tools(
[tool], system_prompt=query_engine.get_tool_instructions(), verbose=True
)
To add a new language you need to edit _DEFAULT_SIGNATURE_IDENTIFIERS
in code_hierarchy.py
.
The docstrings are infomative as how you ought to do this and its nuances, it should work for most languages.
Please test your new language by adding a new file to tests/file/code/
and testing all your edge cases.
People often ask "how do I find the Node Types I need for a new language?" The best way is to use breakpoints.
I have added a comment TIP: This is a wonderful place to put a debug breakpoint
in the code_hierarchy.py
file, put a breakpoint there, input some code in the desired language, and step through it to find the name
of the node you want to capture.
The code as it is should handle any language which:
I'm considering adding all the languages from aider
by incorporating .scm
files instead of _SignatureCaptureType
, _SignatureCaptureOptions
, and _DEFAULT_SIGNATURE_IDENTIFIERS
You will need to set your OPENAI_API_KEY
in your env to run the notebook or test the pack.
You can run tests with pytest tests
in the root directory of this pack.
FAQs
A node parser which can create a hierarchy of all code scopes in a directory.
We found that llama-index-packs-code-hierarchy demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.
Product
Socket now supports uv.lock files to ensure consistent, secure dependency resolution for Python projects and enhance supply chain security.
Research
Security News
Socket researchers have discovered multiple malicious npm packages targeting Solana private keys, abusing Gmail to exfiltrate the data and drain Solana wallets.