About Python Chat PDF (GPT Index) Project
Load your PDFs data folder and ask questions via llama_index and GPT.
What is LlamaIndex
LlamaIndex (GPT Index) is a data framework for your LLM application.
Context
- LLMs are a phenomenonal piece of technology for knowledge generation and reasoning. They
are pre-trained on large amounts of publicly available data.
- The best approach to augment LLMs with our own private data, we need a comprehensive toolkit
to help perform this data augmentation for LLMs.
Proposed Solution
That’s where LlamaIndex comes in. LlamaIndex is a “data framework” to help you build LLM apps.
It provides the following tools:
-
Offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.)
-
Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.
-
Provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get
back retrieved context and knowledge-augmented output.
-
Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker,
ChatGPT, anything else).
LlamaIndex provides tools for both beginner users and advanced users. Our high-level API allows
beginner users to use LlamaIndex to ingest and query their data in 5 lines of code. Our lower-level
APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers,
query engines, reranking modules), to fit their needs.
What does load_index_from_storage do and how does it work?
load_index_from_storage is a function that loads an index from a
StorageContext object. It takes in a StorageContext object and an optional
index_id as parameters. If the index_id is not specified, it assumes there is
only one index in the index store and loads it. It then passes the index_ids
and any additional keyword arguments to the load_indices_from_storage
function. This function then retrieves the index structs from the index store
and creates a list of BaseGPTIndex objects. If the index_ids are specified, it
will only load the indices with the specified ids. Finally, the function
returns the list of BaseGPTIndex objects.
Getting Started
Instructions
pip install -r requirements.txt
python3 chat_with_pdfs.py <"data_folder_path"> <"open_api_key">
- Ask any questions about the content of the PDF.
How to use our package:
from chat_pdf.chat_with_pdfs import ask_a_question
import sys
folder_name = sys.argv[1]
api_key = sys.argv[2]
print(ask_a_question(folder_name, api_key))