Security News
Introducing the Socket Python SDK
The initial version of the Socket Python SDK is now on PyPI, enabling developers to more easily interact with the Socket REST API in Python projects.
RAG-LAB is an open-source lighter, faster and cheaper RAG toolkit supported by Target Pilot, designed to transform the latest RAG concepts into stable and practical engineering tools. The project currently supports GraphRAG and HybridRAG.
RAG-LAB is an open-source lighter, faster and cheaper RAG toolkit supported by TargetPilot, designed to transform the latest RAG concepts into stable and practical engineering tools. The project currently supports GraphRAG and HybridRAG. Welcome to star our RAG-Lab!
To install: pip install raglab2
TargetPilot is a company focused on empowering the e-commerce sector with artificial intelligence. TargetPilot OnlineAssistant has an industry leading RAG technology solution, feel free to click on the link.
The primary goal of RAG-LAB is to explore the latest RAG technologies and convert them into the most stable engineering tools. We aim to:
Proposed by Microsoft, GraphRAG integrates graph-based approaches into RAG, offering several key advantages:
Proposed by Intel, HybridRAG combines different RAG methodologies to enhance performance and flexibility. Its advantages include:
This quick start guide walks you through the process of chunking text, generating expert descriptions, detecting language, creating and disambiguating entity and relationship graphs, generating community reports, saving the graph to a file, and visualizing the knowledge graph. Follow these steps to efficiently process and visualize your data.
For your reference, you can find the code example in:
Import tools from raglab
from raglab.graphrag import (
disambiguate_entity_executor,
disambiguate_relationship_executor,
generate_community_reports_executor,
generate_entire_chunk_graph_executor,
detect_text_language,
generate_expert,
graph_save_json,
)
from raglab.graphrag.visual import (
visualize_knowledge_graph_echart,
visualize_knowledge_graph_network_x
)
# the fast and light text spilter with regex, which is powered by JinaAI. You can explore it in https://jina.ai/segmenter/
# Also you can use Unstructured, Langchain, LlamaIndex to replace it.
from raglab.chunk import (
chuncking_executor, # for English
character_chunking_executor # for languages exclude English
)
# import llm from `raglab.llms` or `langchain.llms`.
# Or You can implement the `llm.invoke` method yourself by inheriting the `LLMBase` class.
from raglab.llms import (
AzureOpenAILLM,
LLMBase
)
# Also, you can implement the `embed.embed_query` method yourself by inheriting the `EmbeddingBase` class. Or just import it from `raglab.embeddings` or `langchain.embeddings`
from raglab.embeddings import (
AzureOpenAIEmbedding,
EmbeddingBase
)
Chunking the Text
the fast and light text spilter with regex, which is powered by JinaAI. You can explore it in https://jina.ai/segmenter/
# for English, you can use the function `chuncking_executor`
chunks = chuncking_executor(text=entire_document, max_chunk_size=1000, remove_line_breaks=True)
chunk_ids = [str(uuid.uuid4()) for _ in range(len(chunks))]
# for Chinese, you can use the function `chuncking_executor`
chunks = character_chunking_executor(text=entire_document, max_chunk_size=500, remove_line_breaks=True)
[Options] Generating Expert Description
expert = generate_expert(aoai_llm, chunks)
[Options] Detecting Language
language = detect_text_language(aoai_llm, chunks)
Generating Entity and Relationship Graph
entities, relations = generate_entire_chunk_graph_executor(aoai_llm, chunks, chunk_ids, expert, language, strategy, muti_thread)
Disambiguating Entities and Relationships
entities, relations = disambiguate_entity_executor(aoai_llm, entities, relations, expert, language, strategy)
relations = disambiguate_relationship_executor(aoai_llm, relations, expert, language, strategy)
Generating Community Reports
community_reports = generate_community_reports_executor(aoai_llm, entities, relations, expert, language, strategy, 5, muti_thread)
Generating Embeddings for Entities and Communities
entities = update_graph_embeddings_executor(aoai_embed, entities, num_threads=muti_thread)
community_reports = update_graph_embeddings_executor(aoai_embed, community_reports, num_threads=muti_thread)
Saving the Graph to a Local File
## save graph to local as json file
graph_save_json(entities, relations, community_reports, os.path.join(graph_filepath, "Gullivers-travels.json"))
## or you can convert them to DataFrame, and save them as any table format, like csv, excel and so on.
entities_df, relations_df, community_reports_df = convert_to_dataframe(entities), convert_to_dataframe(relations), convert_to_dataframe(community_reports)
entities_df.to_csv(os.path.join(graph_filepath, "Gullivers-travels-entities.csv"), index=False)
relations_df.to_csv(os.path.join(graph_filepath, "Gullivers-travels-relationships.csv"), index=False)
community_reports_df.to_csv(os.path.join(graph_filepath, "Gullivers-travels-communities.csv"), index=False)
[Options] Visualizing the Knowledge Graph
visualize_knowledge_graph_echart(entities, relations)
visualize_knowledge_graph_network_x(entities, relations)
Import search tools from raglab
from raglab.graphrag import (
graph_load_json
)
from raglab.graphrag.search_functions import (
generate_final_answer_prompt,
select_community,
select_entities,
select_relations
)
# import llm from `raglab.llms` or `langchain.llms`.
# Or You can implement the `llm.invoke` method yourself by inheriting the `LLMBase` class.
from raglab.llms import AzureOpenAILLM
# Also, you can implement the `embed.embed_query` method yourself by inheriting the `EmbeddingBase` class. Or just import it from `raglab.embeddings` or `langchain.embeddings`
from raglab.embeddings import AzureOpenAIEmbedding
Load graph objects
graph_filepath = "./examples/graphfiles/Gullivers-travels.json"
entities, relations, communities = graph_load_json(graph_filepath)
entity_select_num = 5
Embed the query and select the most similar search entities
query = "Who is the king of Lilliput?"
query_embed = aoai_embed.embed_query(query)
selected_entity = select_entities(query_embed, entities)
Select all relationships from selected entities
selected_relations = select_relations(selected_entity, relations)
Select correct community from selected entities
selected_commnity = select_community(query_embed, selected_entity, communities)
Generate the final answer
prompt = generate_final_answer_prompt(query, selected_entity, selected_relations, selected_commnity)
final_answer = aoai_llm.invoke(prompt)
print(f"Final answer: {final_answer}")
We welcome contributions from the community.
This project is licensed under the Apache 2.0 License.
For more information, please contact us at pxgong@targetpilot.ai, vincentpo@targetpilot.ai.
FAQs
RAG-LAB is an open-source lighter, faster and cheaper RAG toolkit supported by Target Pilot, designed to transform the latest RAG concepts into stable and practical engineering tools. The project currently supports GraphRAG and HybridRAG.
We found that raglab2 demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The initial version of the Socket Python SDK is now on PyPI, enabling developers to more easily interact with the Socket REST API in Python projects.
Security News
Floating dependency ranges in npm can introduce instability and security risks into your project by allowing unverified or incompatible versions to be installed automatically, leading to unpredictable behavior and potential conflicts.
Security News
A new Rust RFC proposes "Trusted Publishing" for Crates.io, introducing short-lived access tokens via OIDC to improve security and reduce risks associated with long-lived API tokens.